Unidata Developer's Blog

Spring 2023 SBN Ground Station Repointing Guide

2023-01-30T14:01:39+00:00

By Stonie Cooper, Unidata

On January 24, 2023, NOAA released Service Change Notice 22-77, describing the change of relay satellite for all SBN services (NOAAPort and NWWS) from Intelsat's Galaxy 28 to the newly launched Galaxy 31. For ground station operators, this move is non-trivial in that the location of Galaxy 28 is geostationary over 89°W, and Galaxy 31 is geostationary over 121°W.

The current transition schedule requires ground stations to switch to the new satellite on a relatively tight schedule:

5 Feb 2023: Galaxy 31 illuminated
10 Feb 2023: Ground Station Transition Begin
31 Mar 2023: Ground Station Transition End
3 Apr 2023: Galaxy 28 decommissioned

This guide is intended to help operators who are not familiar with repointing a satellite dish to provide the information to a professional satellite technician.

Note: This guide was prepared for SBN ground station operators to be able to provide information to their professional satellite technician to repoint a satellite reflector for SBN reception. UCAR is in no way responsible for injuries, damages, loss of data, loss of income, loss of life, or other issues that may arise from the use of the information herein. Lightning, wind, heavy equipment, heights, and all other dangers associated with construction, satellite communications, and satellite reflector maintenance are a serious threat to life and potential injury. UCAR in no way indorses anyone not trained and experienced with to perform any of the tasks outlined in this document, rather, this document is meant to act as the starting point for a Statement of Work for an operator to hire a professional to perform such work.

Assumptions

Assumptions made in this guide include the following:

The reflector is attached to the mount in an "Az/El" mount paradigm
The mount is anchored without the use of a motorized actuator
The reflector is pole-mounted
The Novra S300N has been configured for the new satellite (see Novra S300N configuration, below).

Example

Figure 1: UCAR SBN ground station reflector.
(Click to enlarge.)

In this example, we step through the process of repointing the current SBN reflector at UCAR's Foothills Laboratory in Boulder, Colorado (Figure 1).

Using a GPS, cell phone, or Google Maps, determine the latitude and longitude of the satellite reflector that will be repointed.

For our example, the latitude is found to be 40.03721°N and the longitude is -105.24192°E.

Searching the Internet will yield several on-line satellite pointing calculators. For this example, we use the Satellite Signals home page, which can calculate results for all aspects of the repointing.

Providing our latitude and longitude as inputs, and using the current satellite, Galaxy 28 at 89°W, as our control, the current look angle and polarity tilt will be used to make relative changes.

  Guide Location:  40.03721°N latitude.  -105.24192°E longitude.    Galaxy 28:  89°W longitude (geostationary).    Current Azimuth:  155.64° from true North.  150.19° from magnetic North.    Current Elevation:  40.73° above the horizon (or "level").    Current Polarization tilt at peak signal:  -18.41° from LNB housing horizontal (clockwise) if viewed from the front.    Galaxy 31:  121°W longitude (geostationary).    Current Azimuth:  203.68° from true North.  198.24° from magnetic North.    Current Elevation:  40.9° above the horizon (or "level").    Current Polarization tilt at peak signal:  17.91° from LNB housing horizontal (counterclockwise) if viewed from the front.

Note: For determining the change in azimuth, it does not matter if you use true North or magnetic North, but pick one and always use the North reference selected.

In this example, the repositioning data above require the following changes:

  Azimuth delta: 48.04° to the west  Elevation delta: rise 0.17° above horizon  Polarization tilt delta: 36.32°   (counterclockwise, viewed from facing the reflector)

If there is more than 1° change in elevation, it is best to start by changing the elevation first. Select a section of the mount or part of the dish that is flat, and using a magnetic declinometer, note the starting angle before loosening any hardware.

Figure 2: Protractor set for elevation adjustment.

On most pole mounted reflectors, the elevation is changed by loosening and rotating nuts on an elevation rod. It may be necessary to loosen bolts and nuts very slightly at pivot points, but only enough so the dish will move. It is important to loosen the nut in the direction of the adjustment, first. In the case of raising the dish, loosen the bottom nut, and then tighten the top nut to follow the bottom nut. The goal is to change the elevation by the delta as calculated from the current position and the position needed to point to Galaxy 31.

Figure 3: Elevation adjustment key points.

Conversely, lowering the pointing angle entails loosening the top nut, and then turning the bottom nut to follow the top nut. Once you have changed the angle as seen on the protractor by the desired delta, snug, but not tighten, the adjustment nuts and any loose pivot points.

Pole mounted reflectors normally adjust azimuth by rotating the entire dish on the pole after loosening the collar bolts. Before committing to the azimuth change, take the time to make witness marks of the current reflector position (figure 4).

Figure 4: Make initial witness marks.

Next, measure the circumference of the pole (not the collar). Most pole mounted collars are 12" or 18" long, although the Comtech 3-piece reflector has a collar that is 48" long. Regardless, the circumference needed is of the pole that the collar sits on, which, on the Comtech, will be close to the ground.

Figure 5: Measure pole circumference.

In this example, the pole has a circumference of 536mm. Knowing the angle of the direction change needed to repoint to Galaxy 31, the arc distance change on the pole can be calculated:

Rotating the collar to repoint to Galaxy 31, west of the current satellite, results in an arc distant change of 71.53mm.

Figure 6: Measure to the new location.

Final azimuth preparation entails scribing a second witness mark on the pole, to which the collar (and thus whole dish) will be rotated to acquire Galaxy 31.

Figure 7: Apply target witness mark.

Mount collars will have three or four sets of bolt pairs equally spaced around the collar that secure the dish at the correct azimuth. In this example, there are three sets of two bolts. We suggested that only one set of bolts be loosened, and only loosened enough to allow the dish to rotate, but not rotate freely.

Figure 8: Bolts for adjusting azimuth.

With assistance, push the reflector in the direction that will bring the collar mark in line with the 121°W pole witness mark. Snug the bolt only slightly. With assistance, use Novra monitoring software tool (see Novra S300N configuration below for details), and note the Carrier to Noise ratio (C/N). Very slightly, push the reflector to the left (as viewed from behind the dish), and note changes to the C/N. If the C/N decreases, push the reflector back to the right in small increments. Allow the Novra statistics to sample over 10 seconds at a minimum with each slight move before pushing the reflector again.

Continue moving the reflector in small motions, noting an increase/decrease in the C/N, with the goal to move the reflector to achieve the highest C/N. Once the highest C/N has been located, make witness marks on the collar and pole, separate from other witness marks, as the final azimuth position for Galaxy 31.

Figure 9: Nudge reflector to the right before tightening.

Stonie's Tip: On pole mounted reflectors, tightening the collar bolts to lock down the reflector will result in the reflector moving from the intended "peaked" position. This is due to the bolts being tightened "biting" into the pole and walking the collar away from the intended location. To overcome this potentially frustrating aspect, once you achieve the peak C/N from the Novra monitoring tool, push the reflector slightly to the right, viewed from behind the reflector, before starting the tightening process. When the bolts are tightened, they will bite and walk the collar to the left, and with practice, will put the collar back to the optimal witness mark.

Once all collar bolts are confirmed to be tight, return to the elevation rod peaking process. Lower the elevation angle very slightly, and again, note the C/N reading from the Novra monitoring tool. Adjust the nuts on the elevation rod until the highest C/N reading is achieved. At that point, tighten all nuts and bolts, both on the elevator rod itself, as well as all pivot points. This concludes the base reflector repointing and attention will now turn to the polarization tilt.

Not all reflector feedhorn/scalar ring assemblies allow for adjustment in polarity other than horizontal to vertical, but in discrete relation to the reflector. However, for those scalar ring/feedhorn assemblies that allow adjustment, additional peaking should be considered.

Figure 10: Measuring initial polarization tilt angle.

As done previously, it is best to start with a reference of what the LNB's polarization tilt angle is initially. Use the bolt heads on the LNB as stops for the edge of the protractor to insure an accurate reading of the angle.

Loosen the set screw(s) for the feedhorn and rotate counterclockwise (viewing from within the reflector) the delta angle calculated earlier. For Boulder, this was 36.32° in total from the initial position. Snug the set screw(s), and with assistance, record the C/N reading from the Novra monitoring tool. Note that while a person is standing or in front of the reflector, the C/N will be lower than it was when the azimuth and elevation were adjusted. Maintain the same physical location of the operator adjusting the polarity until final peak.

Slightly rotate the feed to the right, note the increase or decrease in the C/N. If increasing, continue to rotate, incrementally, in that direction until the C/N decreases. Likewise, if the C/N decreases, go the opposite direction. Once the C/N has reached the highest level, secure the set screw(s) without overtightening.

On aftermarket feedhorns, it is also possible to adjust the focal length of the feed by sliding it forward or out of the scalar ring with the set screw(s) only snug (but loose enough to allow feedhorn movement). This process is the same as the other peaking processes and can help attain a further gain in signal strength or decrease in noise.

Once satisfied that the peak C/N has been obtained, confirm the coaxial connections are watertight, all items loosened during the repointing are retightened, and replace zip ties with fresh zip ties or Velcro to prevent movement of cables on or around the dish. Wind induced movement in cables produces inadvertent RF noise, and cable hygiene goes a long way to further reduce noise sources.

Novra S300N configuration

The following information for configuring the Novra S300N satellite receiver has been provided by NOAA.

  NOAAPort ingest:  -- Symbol Rate: 30 Msps  -- RF Frequency: 1130 MHz  -- PID(s): 	101          102          103          104          105          106          107          108          150          151  -- FEC Type: DVB-S2  -- Modulation/Coding: 16PSK 2/3  -- ISI: 18

  NWWS ingest:  -- Symbol Rate: 30 Msps  -- RF Frequency: 1130 MHz  -- PID(s): 	201  -- FEC Type: DVB-S2  -- Modulation/Coding: QPSK 1/3  -- ISI: 2

One method to reconfigure the Novra is to save off the current Novra configuration, edit the resulting xml file with the items that need changed, and then reload. The Novra supplied Linux "cmcs" utility can be downloaded here:

https://novra.com/downloads?type=All&search=&nid=&_escaped_fragment_=&page=8

Using the 'cmcs' utility:

  $ cmcs -list  S300N   IP address: 192.168.0.1 MAC: jj-bb-aa-zz-yy-xx    $ cmcs -ip 192.168.0.1 -pw  -save mynovra.xml

Edit the resulting mynovra.xml file for RF Frequency:

    <S300N_CONFIG>      <NETWORK ReceiverIP="192.168.0.1" SubnetMask="255.255.255.0" DefaultGateway="192.168.0.254" StatusDestinationIP="255.255.255.255" StatusDestinationPort="6516" IGMPFilter="OFF" />      <SATELLITE>          <SIGNAL RF="1130" SymbolRate="30000" AutoSR="OFF" GoldCode="0" SearchType="DVBS2" MODCOD="2/3 16APSK" ISI="18" />          <LNB>              <LNBSpec LOFrequency="0" PolaritySwitchingVoltages="13 &amp; 18" HiLowBandTone="22" />              <LNBControl Power="ON" Polarization="Horizontal/Left" Band="High" LongLineCompensation="ON" />          </LNB>          <CARRIER_TO_LO>Carrier Freq. &gt;= L.O. Freq.</CARRIER_TO_LO>      </SATELLITE>      <CONTENT>          <TRANSPORT_STREAM PIDS="Selected">              <PID Number="101" Processing="MPE" />              <PID Number="102" Processing="MPE" />              <PID Number="103" Processing="MPE" />              <PID Number="104" Processing="MPE" />              <PID Number="105" Processing="MPE" />              <PID Number="106" Processing="MPE" />              <PID Number="107" Processing="MPE" />              <PID Number="108" Processing="MPE" />              <PID Number="150" Processing="MPE" />              <PID Number="151" Processing="MPE" />              <PID Number="NULL" Processing="RAW" />          </TRANSPORT_STREAM>          <IP_REMAP_TABLE Enabled="false" RemapSourceIP="false" />      </CONTENT>  </S300N_CONFIG>

Then reload the configuration:

    $ cmcs -ip 192.168.0.1 -pw  -load mynovra.xml

Note that since the only parameter being changed here is the signal RF frequency, you could skip editing the configuration XML file by hand and use the cmcs command to change the parameter directly:

$ cmcs -ip 192.168.0.1 -pw  -rfreq 1130

(The method you choose to change this value is up to you. There is no need to both edit the XML file and use the command with the -rfreq parameter.)

Finally, to monitor the signal strength, use the cmcs command:

    $ cmcs -ip 192.168.0.1 -pw <password> -shsat        Satellite Interface Settings:          Receiver MAC Address:   jj-bb-aa-zz-yy-xx        Receiver Mode:          DVBS2        Frequency:              1110.0 MHz        Symbol Rate:            30.000 Msps        ModCod:                 2/3 16APSK        Gold code:              0        Input Stream Filter:    On        Input Stream ID:        18          Signal Lock:            On        Data Lock:              On        Uncorrectable Rate:     0/Second        Packet Error Rate:      0.0000e+00          Carrier to Noise C/N:   15.2dB        Signal Strength:        -53 dBm

If you have questions about this example satellite receiver repointing process, please contact support@unidata.ucar.edu.

Contributor License Agreement for Unidata Projects

2017-09-28T15:18:03+00:00

Unidata hosts a variety of Open Source software projects on GitHub. We use the Open Source model because we believe strongly that broad participation in all aspects of Unidata's work is essential to achieving the Unidata community's goals. Developing software that focuses on community needs is one of our main objectives, and participation by community members in all aspects of the development process — from coding to testing, documenting, and commenting — is incredibly valuable.

As community participation in Unidata's Open Source efforts grows, we are facing increasingly complex situations surrounding contributions made to Unidata-hosted projects. As a result, we have decided to begin requiring that community members who wish to contribute code to Unidata projects on GitHub agree to the Unidata Contributor License Agreement (CLA). This agreement is based on a template from the Harmony Agreements project, whose goal is to standardize CLA's within the Open Source community.

What is a Contributor License Agreement?

A Contributor License Agreement (CLA) defines the terms under which intellectual property is contributed to a project. In Unidata's case, the CLA covers software code contributed to Unidata-hosted projects distributed under Open Source licenses. While the Unidata CLA is a legal document, and you should read it carefully before agreeing to it, the two main ideas are:

You retain the ownership of the copyright to your contribution
You grant Unidata the right to use your contribution in perpetuity

We feel it is important to use a CLA on Unidata-hosted projects for several reasons. Most important among them is the idea that once a Unidata software product has been released to users, the users can be confident that they have the right to continue using the software. Adding some formality to the software contribution process helps the Unidata community make well-informed choices about the benefits of using Open Source software.

The CLA process Unidata is adopting is becoming more and more common in the Open Source community. We don't think that asking contributors to sign the agreement will present a significant barrier to participation, and we believe that adding this layer of formality will benefit the Unidata community as a whole.

How it Works

When contributing using a Pull Request on GitHub, the following message will present itself, using CLA Assistant, as a comment on the pull request:

Contributors can click on the yellow "CLA not signed yet" badge, which will take them to a copy of the CLA. Contributors are asked to provide a little bit of information about themselves (for legal purposes):

Once the "I Agree" button is clicked, the browser will return to the original pull request page, but now the comment has been updated:

Contributors will only be asked to electronically sign once (unless the CLA is updated), and the agreement applies to all GitHub repositories hosted under the Unidata organization.

For more information about CLAs, see these resources:

Harmony Agreements
"Producing OSS" has a section on Contributor Agreements
A more detailed blog post by Julien Ponge about CLAs

About the Contributor License Agreement

Unidata's CLA comes from Project Harmony, which is a community-centered group focused on contributor agreements for free and open source software.

The document you are reading now is not a legal analysis of the CLA. If you want one of those, please talk to your lawyer. This is a description of the purpose of the CLA.

Why is a signed CLA required?

The license agreement is a legal document in which you state you are entitled to contribute the code/documentation to Unidata and are willing to have it used in distributions and derivative works. This means that should there be any kind of legal issue in the future as to the origins and ownership of any particular piece of code, we have the necessary forms on file from the contributor(s) saying they were permitted to make this contribution.

The CLA also ensures that once a contribution has been made, contributors cannot try to withdraw permission for its use at a later date. People and companies can therefore use Unidata open source projects, confident that they will not be asked to stop using pieces of the code at a later date.

Lastly, the CLA gives Unidata permission to change the license under which the project, including the various contributions from many developers, is distributed in the future. The CLA states that this license needs to be one that has been approved by the Open Source Initiative, including both copyleft and permissive licenses. This gives Unidata the freedom to adjust licenses in the future if needed (e.g. some clause of the current license is found to be invalid; change to a standard license), so long as the license remains open source.

Am I giving away the copyright to my contributions?

No. This is a pure license agreement, not a copyright assignment. You still maintain the full copyright for your contributions. You are only providing a license to Unidata to distribute your code without further restrictions. This is not the case for all CLA's, but it is the case for the one we are using.

Can I withdraw permission to use my contributions at a later date?

No. This is one of the reasons we require a CLA. No individual contributor can hold such a threat over the entire community of users. Once you make a contribution, you are saying we can use that piece of code forever.

Can I submit patches without having signed the CLA?

No. We will be asking all new contributors and patch submitters to sign before they submit anything.

This CLA explanation is based on Django Contributor License Agreement Frequently Asked Questions (copyright Django Software Foundation. CC-BY) The content has been modified slightly to reflect situations specific to Unidata.

DAP4 Commentary: DDX Lexical Elements

2012-03-27T18:00:50+00:00

This document describes the lexical elements that occur in the DAP4 grammar.

Within the Relax-NG (rng) DAP4 grammar, there are markers for occurrences of primitive type such as integers, floats, or strings. The markers typically look like this when defining an attribute that can occur in the DAP4 DDX.

<attribute name="namespace"><data type="string"/></attribute>

The "<data type="string"/>" specifies the lexical class for the values that this attribute can have. In this case, the namespace attribute is defined to have a String value. Similar notation is used for values occurring as text within an xml element. The lexical specification later in this document defines the legal lexical structure for such lexical items. Specifically, it defines the format of the following lexical items.

Constants, namely: string, float, integer, and character.
Identifiers

The specification is written using the ISO/IEC 9945-2:2003 Information technology -- Portable Operating System Interface (POSIX) -- Part 2: System Interfaces. This is the extended Posix regular expression specification.

I have augmented it in the following ways.

Names are assigned to regular expressions using the notation
```
name = regular-expression
```
Named expressions can be used in subsequent regular expressions by using the notation {name}. Such occurrences are equivalent to textually substituting the expression associated with name for the {name} occurrence: More or less like a macro.

DAP4 Lexical elements

Notes:

The definition of {UTF8} is deferred to the next section.
Comments are indicated using the "//" notation.
Standard xml escape formats (&xDD) are assumed to be allowed anywhere.

Basic character set definitions

CONTROLS   = [x00-x1F] // ASCII control characters
WHITESPACE = [  	f]+
HEXCHAR    = [0-9a-zA-Z]
// ASCII printable characters
ASCII      = [0-9a-zA-Z !"#$%&'()*+,-./:;<=>?@[]^_`|{}~]

Ascii characters that may appear unescaped in Identifiers
This is assumed to be basically all ASCII printable characters except the characters ' ', '.', '/', '"', ''', and '&'. Occurrences of these characters are assumed to be representable using the standard xml '&xx;' notation.

IDASCII    = [0-9a-zA-Z!#$%'()*+,-:;<=>?@[]^_`|{}~]

The numeric classes: integer and float

INTEGER    = {INT}|{UINT}|{HEXINT}
INT        = [+-][0-9]+{INTTYPE}?
UINT       = [0-9]+{INTTYPE}?
HEXINT     = {HEXSTRING}{INTTYPE}?
INTTYPE    = ([BbSsLl]|"ll"|"LL")
HEXSTRING  = (0[xX]{HEXCHAR}+)

FLOAT      = ({MANTISSA}{EXPONENT}?)|{NANINF}
EXPONENT   = ([eE][+-]?[0-9]+)
MANTISSA   = [+-]?[0-9]*.[0-9]*
NANINF     = (-?inf|nan|NaN)

The Character classes

STRING     = ([^"&]|{XMLESCAPE})*
CHARACTER  = ([^'&]|{XMLESCAPE})

Note that the character type only supports ASCII characters because it can only hold a single 8-bit byte.

The Identifier class

ID         = {IDCHAR}+
IDCHAR     = ({IDASCII}|{XMLESCAPE}|{UTF8})
XMLESCAPE  = &x{HEXCHAR}{HEXCHAR};

Note that the above lexical element classes are not disjoint. For example, the sequence of characters 1234 can be either an identifer,a float, or an integer. So the order of testing is assumed to be this.

INTEGER
FLOAT
ID
STRING

UTF-8 Character Encodings

We discuss UTF-8 character encoding in the context of this document. https://www.w3.org/2005/03/23-lex-U.

The most correct (validating) version of UTF8 character set is as follows.

UTF8 =   ([xC2-xDF][x80-xBF])     
       | (xE0[xA0-xBF][x80-xBF])     
       | ([xE1-xEC][x80-xBF][x80-xBF])     
       | (xED[x80-x9F][x80-xBF])     
       | ([xEE-xEF][x80-xBF][x80-xBF])     
       | (xF0[x90-xBF][x80-xBF][x80-xBF])     
       | ([xF1-xF3][x80-xBF][x80-xBF][x80-xBF])     
       | (xF4[x80-x8F][x80-xBF][x80-xBF])

The lines of the expression cover the UTF8 characters as follows:

non-overlong 2-byte
excluding overlongs
straight 3-byte
excluding surrogates
straight 3-byte
planes 1-3
planes 4-15
plane 16

Note that ASCII and control characters are not included.

The above reference also defines some alternative regular expressions.

The most relaxed version of UTF8 is this.

UTF8 = ([xC0-xD6].)
      |([xE0-xEF]..)
      |([xF0-xF7]...)

The partially relaxed version of UTF8 is this.

UTF8    = ([xC0-xD6][x80-xBF])        
        | ([xE0-xEF][x80-xBF][x80-xBF])        
        | ([xF0-xF7][x80-xBF][x80-xBF][x80-xBF])

We deem it acceptable to use this last relaxed expression for validating UTF-8 character strings.

DAP4 Commentary: DAP4 Grammar

2012-03-27T16:50:24+00:00

[Version: 1.0]

At the end of this document are instructions for accessing and testing a formal grammar for the DAP4 DDX using the Relax-NG schema language. I constructed it initially without any reference to any other explicit or implicit grammars so I could record my ideas. I have since modified it based on examining James' implied grammar and from comments from others and from a comparison with the xsd grammar.

Differences with DAP4 xsd Grammar

I converted the xsd grammar to an equivalent relax-ng (rng) grammar.

One major difference I see is in dimension handling.

I just used the name "dimension" rather than "shareddimension". For me, all dimensions (except anonymous ones) are shared.
The xsd separates out scalars from arrays. I always allowed the dimensions for a variable to be optional to handle the scalar case.
I attempted to be as consistent as possible, so I allowed any type including sequences and structures to be dimensioned. (but see previous commentary).
The dimensions of a variable are currently specified in the rng grammar as a sequence of elements named "Dimension" contained in the "variables" element type.

Other differences:

The Dataset element in the xsd has a couple of extra attributes. I added these.
The xsd appears to allow attributes to themselves have attributes. This needs discussion.
The URL basetype is in the xsd. But I do not see the justification for keeping it.
It appears that the Dataset contains a top level declaration. I chose to treat the Dataset itself as the top-level group.
Attribute declarations appear to have their own "namespace" attribute. Not sure why this is needed.
I do not understand the purpose of the "NewAttribute" attribute.
There may still be some minor differences in representing coordinate variables.
The xsd represents attribute values thus:
```
<attribute name="a"><value>...</value><value>...</value></attribute>
```
I chose to use attributes in the multi-valued case because I prefer not to use elements with content unless really necessary. So I represented the above as this.
```
<attribute name="a"><value value="..."/><value value="..."/></attribute>
```
There is an issue of interleaving of definitions, or equivalently, what elements must occur in a fixed order.
Where should attributes be legal? I think the rng grammar and the xsd grammar agree on this: putting them almost everywhere, but it needs discussion.

I dropped Blobtype. I fail to see the need for this.

Testing the Relax-NG Grammar

You will need to copy three files:

dap4.rng - this is the grammar file. it uses the Relax-NG schema language This grammar file can be obtained from https://dl.dropbox.com/u/53929684/dap4.rng.
test.xml - this is a test file that I am growing to cover the whole grammar. This can be obtained from https://dl.dropbox.com/u/53929684/test.xml.
jing.jar - Jing is a validator that takes the grammar and a test file and checks that the test file conforms to the grammar. This can be obtained from https://dl.dropbox.com/u/53929684/jing.jar.

To use this jar file, do the command:

java -jar jing.jar dap4.rng test.xml

No output is produced if the validation succeeds, otherwise, error messages are produced.

DAP4 Commentary: Possible Notation for Server Commands

2012-03-27T16:06:28+00:00

Looking to the future, it is clear that eventually our query language, or more generically our previous discussion of URL Annotations must encompass three classes of computations.

Queries in the DAP2 sense,
Commands to control the client-side processing of requests on the server (i.e. thing like caching),
Server-side processing.

I want to propose a notation for everything in the URL after the "?". I think this notation has ability to represent a wide variety of features without, I hope, being too generic.

The notation is basically nested functions combined with single assignment variables. A semantically nonsensical, but grammatical example would look something like this: "?_x=f(17,g(h(12))),f2(_x,[0:3:10])".

Everything past the "?" is in the form of a comma separated list of nested function invocations. Anything that begins with an underscore is considered a local, temporary, variable, anything that does not look like a function call (i.e. that is not a name followed immediately by a left parenthesis) is assumed to be a string constant. Each function has an arbitrary number of argument expressions separated by commas.

There would be several semantic rules.

A variable may only be assigned to once (single assignment), but may be referenced as many times as desired after that.
All functions have a defined "return type", which looks like a legal DDX minus certain things like groups, enumeration declarations, and dimension declarations. In addition, a function may be defined to have a "void" return type, which means it is executed for its side-effects on the server.
Any expression that is not assigned to a variable and does not have a void return type will have its return value returned to the caller as part of a DATADDX.

Notes

My hypothesis is that this notation should also be able to handle most kinds of server side processing by defining and composing functions.

The standard projection+selection constraints of DAP2 can be represented using a special query() function whose argument is the standard DAP2 constraint, or alternatively, one could define a collection of nested functions to do the same thing, or alternatively, we could split the query part into two pieces separated by a semicolon. The first piece would be a constraint expression and the second piece (after the semicolon) would be in the nest function call form defined above.

An important aspect has to do with the construction of what may be referred to as a DATADDX. It defines the structure of a DDX that is the composition of the return types of the invoked functions that will return a (possibly structured) value. I need to work this out. BUT, in any case, the resulting DATADDX may have only have a loose relation to any DDX representing the raw dataset. This is because server-side computations will not have been represented in the original DDX, but only in the DATADDX.

I also hypothesize that Ferret notations

 https://.../thredds/dodsC/hfrnet/agg/6km_expr_{}{let deq1ubar=u[d=1,l=1:24@ave]}

could be represented in my proposed function notation without having to clutter up the URL format.

DAP4 Commentary: Sequences and Vlens

2012-03-27T15:52:23+00:00

"Oh what a tangled web we weave when first we practice to build a type system."

Recently, I made the claim to James that the Sequence construct could serve to represent CDM/HDF5 vlen constructs.

His reply was as follows:

In my opinion we will want vlens to be in the data model, if not as a type, then as a feature of arrays - see the schema and my text on the Data Model page. The reason for that [is that] I have already tried using Sequence for vlen (in hdf4) and it was a failure. Not a failure from the technical POV but because neither server writers nor client writers nor client users ever 'got it.' I think one part of the problem for those people was that vlens in HDF4 do not support relational operators via the API while Sequences are supposed to. So the conceptual mismatch doomed the idea.

This is a very important observation, and has caused me to re-think how sequences and vlens should be handled in DAP4.

Vlens

Let us start by addressing the addition of vlens to the DAP4 data model.

Some possible ways to insert vlens into the dap4 data model include the following.

In CDM, a vlen is marked by a "*" as a dimension name in the set of dimensions associated with a variable. the list of dimensions is allowed to have any number of occurrences of "*" [Aside, I will note that "unlimited" is also an option for CDM, but should not be needed for DAP4 because at the time of a request for data, the size of the unlimited dimension is known].
James has proposed something similar except that the "*" is restricted to occurring as the last dimension.
Another possibility is to create a new container object, call it Vlen, that (like Sequence) is inherently of variable length and is not dimensionable. [Aside: The term "vlen" is kind of odd. the term "list" would actually make more sense]

If we were to choose option 2 (terminal * only), then we must address the question of translation between CDM and DAP4. Going from DAP4 to CDM is straightforward because having the last dimension be "*" is legal in CDM. Going from CDM to DAP4 requires the introduction of a number of Structure elements.

Consider the following CDM example (using a pseudo-syntax)

 Int32 v[d1][*][d2][*][d3].

This would have to be represented in DAP4 something like this.

  Structure v_1 {
    Structure v_2 {
      Int32 v[d3];
    }[d2][*];
  }[d1][*];

In the lucky case that the last dimension is a already a "*":

  Int32 v[d1][*][d2][*];

then we have a simpler representation.

  Structure v_1 {
      Int32 v[d2][*];
  }[d1][*];

Commentary

As a personal matter, I would prefer to use the CDM representation. Adding a new container type (Vlen),while appealing semantically, only complicates the model more. James' approach requires the use of additional Structure definitions which, in my opinion, obscures the underlying semantics.
One thing that I need to check is how this affects the proposed on the wire format.

Sequences

Nathan Potter noted the following.

At one time we considered using ''nested'' sequences as a representation of an SQL database, where essentially the keys that link the tables in the DB define the structure of some nested sequence thing. I think it's a useful idea, but there is no implementation of it to play with. [Aside: I could swear that someone told me that they actually used a relational database as the back end to a sequence]

And later Nathan Potter noted the following:

I mentioned (in the same email that contained the nested sequence comment quoted above) that we wrote and released a server that represents a single database table or view as a single sequence. This is quite different from the point that I was making in the section quoted above that there may be a use case for nested sequences. (which was in response to your question "Are there any other legitimate examples for using NESTED sequences.") [Ndp 12:08, 26 February 2012 (PST)]

It is this ability to have selection constraints applied that separates sequences from vlens. It is clear that in the absence of selection constraints, there is no essential difference between sequences and vlens. Further, in the few examples I could find or were sent to me, it seemed that nested sequences were being used as, in effect, vlens.

So, we seem to have two very similar concepts (vlen and sequence), which complicates the DAP4 model. The question for me is:

Do we get rid of the Sequence concept, or at least define it as equivalent to the following?: Structure {...} [*].

My current belief is that we should keep sequences but with the following restrictions:

Sequences can only occur as top-level, scalar, variables within a group.
Sequences may not be nested in any other container (i.e. other sequences or structure)

This keeps sequences for the original purpose of acting as "relations". All other places where we might use a sequence before will now use a vlen.

As with vlens, the translation between DAP4 and CDM needs to be addressed.

The conversion from DAP4 to CDM can be addressed using the rule above, namely that a sequence is, in CDM, represented as
```
Structure {...} [*]
```
.
Translation from CDM to DAP4 allows for the option of never using sequences, but always using vlens. An alternate translation might be to say that if you have a top-level CDM structure whose only dimension is a vlen, then translate that to a sequence (in effect inverting the DAP4->CDM translation).

DAP4 Commentary: Characterization of URL Annotations

2012-03-27T15:33:58+00:00

Characterization of URL Annotations

Requests for data using the DAP4 protocol will require a significant number of annotations specifying what is to be retrieved, commands to the server, and commands to the client.

This document is intended to just describe the information with which URLs need to annotated based on past experience. It also enumerates the possible URL components that can be used to encode the annotations. I will consider a specific encoding in a separate document.

Looking at the DAP2 URLs, we see three classes of annotations: protocol, server commands, client commands, and queries (aka constraints).

Protocol

For DAP2, the fact that the DAP2 protocol being used is inferred from context. In netcdf-C, for example, the fact that the dataset name is a URL is sufficient to indicate the use of the DAP2 protocol (although that will change). For some servers, such as TDS, the protocol is also inferrable from elements of the URL path. For example, in the URL

http://.../thredds/dodsC/...

the "dodsC" indicates the use of the DAP2 protocol. TDS also supports a schema called "dods:" that also indicates the use of DAP2.

Server Commands

Server commands in DAP2 are appended to the dataset URL to indicate attributes of the request. For example:

http://test.opendap.org/dataset.nc.dds

The defined kinds of server commands for DAP2 are as follows:

Component requests: ".dds", ".das"
Data requests with format: ".dods", ".asc"
Miscellaneous: ".html", ".ver"

Client commands

Client commands are interpreted by the client-side library to specify actions to be performed by the library. The existence of client commands is important because we want to communicate from the user to the library without requiring any knowledge by intermediate code layers. For example, netcdf C tools such as ncdump send URLs to the underlying netcdf DAP2 library without having to be cognizant of their structure. Currently, the primary use for client commands is caching, to indicate the degree of caching and prefetch to be used with a given request to the server.

Currently, client commands are represented as "name=value" pairs or just "name" enclosed in square braces: "[nocache]", for example. These commands are prefixed to the URL such as this.

[show=fetch]http://test.opendap.org/dataset.nc

The legal set of client commands is client library specific.

One notable problem with this form of client command is that it prevents generic URL parsers from parsing the URL because, of course, the square bracket notation is non-standard.

It should be noted that an alternative to using client commands in the URL is to use a configuration file (often referred to as the ".rc" file such as ".dodsrc"). This configuration file is assumed to be either in the caller's home directory or in the current working directory. It contains the necessary client commands to be applied. It is mildly less convenient for the user to use a .rc file than to embed a client command in the URL.

Queries

The third class of URL annotations specifies some form of query to control the information to be extracted from a dataset on the server. This information is then passed back to the client.

In DAP2, queries consisted of projections and selections specifying a subset of the data in a dataset.

A projection represents a path through the DDS parse tree annotated with constraints on the dimensions. For example, this query: "?P1.P2[0:2:10].F[1:3][4:4]".

A selection represents a boolean expression to be applied to the records of a sequence. Syntactically, a selection could cross sequences, thus implying a join of the sequences, but in practice this diss not allowed.

DAP2 queries also allowed the use of functions in the projections and selections to compute, for example, sums or averages. But the semantics was never very well defined. The set of allowable functions is server dependent.

Annotation Mechanisms

DAP4 will need to support at least the three classes of annotations described above. Whatever annotation mechanisms are chosen, the following properties seem desirable.

The resulting URL should be parseable by generic URL parsers =>Client commands should be embedded at the end of URLs, not the beginning.
Whatever annotation encoding is used, it is desirable if it is as uniform as possible.

As mechanisms, we have the following available to us:

The URL schema -- "http:" for example, or the TDS "dods:" schema. Using this is somewhat undesirable because it would need to encode also an underlying encrypted protocol like https: (versus http:).
URL path elements such as the current use of e.g. http://host/../dodsC/... by TDS.
URL query -- everything after the first '?' in the URL. URL queries technically have a defined form as name=value pairs, but in practice are pretty much free form.
URL fragment -- everything after the last '#'. Again these are pretty much free form.
Filename extensions -- everything between the data set name in the path and the start of the query. The DAP2 ".dds" and ".dods" are examples of this.
Alternate extension formats. Ethan Davis has proposed the use of a "+" notation instead of filename extensions: "+ddx+ascii", for example. This has the advantage of clearly not being confused with filename extensions while also making clear the additive nature of such annotation.

I should note that the Ferret server has taken to seriously abusing the URL format with URLs like this.

http://.../thredds/dodsC/hfrnet/agg/6km_expr_{}{let deq1ubar=u[d=1,l=1:24@ave]}

so we have much to aspire to :-)

DAP4 Commentary: The on-the-wire format

2012-03-27T15:12:33+00:00

Background

The current DAP2 clients, use two different approaches to managing the packet of data that is sent by the server.

The C++ libdap library uses what I will call an "eager" evaluation method. By this I mean that the whole packet is processed when received, is decomposed into its constituent parts (e.g. data arrays, sequence records, etc) and those parts are used to annotate the parsed DDS.

In contrast, the oc library uses a "lazy" evaluation method. That is, the incoming packet is sent immediately into a file or into a chunk of heap memory. Almost no preproccessing occurs. Data extraction occurs only when requested by the user code through the API.

Problem addressed

The relative merits and demerits of lazy versus eager are well known and will not be repeated here. Lazy evaluation of the DAP2 packet is hampered by the inlining of variable length data: sequences and strings specifically. If it were not for those, the lazy evaluator could compute directly the location of the desired subset of data as requested by the user, and do so without having to read any intermediate information. But when, for example, Strings are inlined, then it is necessary to walk the packet piece by piece to step over the strings. I plan to use lazy evaluation for my implementations of DAP4, and propose here the outline of a format for the on-the-wire data packet that makes lazy operation fast and simple without, I believe, interferring with eager evaluation.

Proposed solution

Since we have previously agreed on the use of multipart-mime, the incoming data is presumed to be sequence of variable length packets with a known length (for each packet) and a unique id for each packet.

Under these assumptions, I propose the following format.

The initial packet is of known computable length, aka "fixed length" for short. That is, its size can be computed solely knowing the DXD for the incoming data. This means that strings and sequences are not represented inline, but instead are represented by fixed-size "pointers" into other, following packets that contain the sequence and/or string data.
Each element in a string array in the initial packet is represented by three pieces of fixed size info:
1. the unique id of the packet containing the contents of the string.
2. the offset in the packet defined in (a).
3. the length of the string in bytes (assuming utf-8 encoding).
As an optimization, the string packet can be directly appended to the fixed size initial packet, in which case, the first item is not strictly necessary.
Given a sequence object either a scalar or as an array of sequences, the sequence is replaced by the following fixed size item:
- The unique id of the packet containing the sequence records
Further, each record of the sequence packet is assumed to be "fixed length" by applying the rules above. This means that knowing the total size of the packet containing the sequence records, it is possible to know the exact number of records in the packet without actually having to walk the sequence packet to count them.

Rationale for the solution

The above representation makes lazy evaluation very simple and a given item in a packet can be reached in o(1) time. Even with the case of nested sequences/vlens, the proper item can be reached in o(log n) time where n is the depth of the nesting. The cost is that a hash map is needed to map unique id's to offsets in the file or heap memory. The lazy versus eager cases also apply on the server side. Currently, for example, the opendap code on the thredds server takes the underlying data source (.nc file for example), converts it to DAP2 and annotates the DDS with the data. Then as a second pass, the annotations are converted as needed and sent out over the wire. A lazy version would associate elements of the underlying source with the DDS. Transfer of the data to the wire would then occur directly from the original source to the wire format a needed. As an aside, I have a (untested and unverified) hypothesis is that the proposed encoding will also simplify the use of lazy evaluation on the server side.

Updates

2012-02-20: The above encoding has as one consequence that all embedded counts that currently exist in DAP2 are superfluous. Ditto for the sequence record markers. It may still be desirable to include the counts for purposes of error checking, but they are not strictly necessary.

First two weeks!

2011-06-07T14:27:15+00:00

_{Image taken from Jeff Weber's t-shirt today! Talk the talk, walk the walk, and wear the schwag!}

As I begin my third week at Unidata, I thought I would take a moment to reflect on my experience. It would be quite a task to try to summarize my experience in one post, so I'll start off simply reflecting on the phrase "community driven", as it's a phrase I've heard for years but never fully appreciated.

"Community driven" was a phrase I heard tossed around several times during my stint on the Unidata Users Committee as the student representative, but it is hard to actually grasp what that means until you've been put into a position to see colleagues put that phrase into action. Sure, Unidata has Policy and Users committees, so they must care about what the community thinks, right? Right?!? I mean, I was there – I felt like they cared what the Users Committee thought while we were there, in town, in person. During the Users Committee meetings, we would bring up issues and needs that we, or others in the community, felt needed to be addressed. Six months later we would return to find that those concerns were not just being listened to, but being acted upon. But what was going on behind the scenes during those six months? Did the staff address our concerns in the one week following the meeting, only to forget about the community until our next Users Committee meeting six months later? I never really thought this was the case, but the cynical side of me (thanks, grad school!) is always drawn to this possibility (with any organization claiming to be community oriented, not just Undiata). Well, now that I've toured backstage area of Foothills Lab 4, I have an idea of what happens during those periods.

Observation 1: One of my first tasks as Unidata as a Software Engineer has been to dig into support. While I always heard that support was a big deal within Unidata, now that I am 'on the other side', so to say, I have to agree that not only is support a big deal, it's an unbelievably huge deal! It's clear that addressing community needs is not something that happens only a few times a year during the governing committee meetings and then gets put on the back burner, but it's a DAILY thing. Quite honestly, I think it's fun (yeah, I've logged into support at home during the weekends – it's kind of addicting...I may need a support group in a few months). I've had the opportunity to answer questions regarding the specifics of operating systems I've never used and compilers I've never touched. While Google has been my friend, a even bigger help has been looking at the wealth contained within previous support questions and answers (these are indexed by Google, BTW, and are available to all). The support tickets I've reviewed show that the Unidata staff puts quite a bit of thought and time into fully answering the issues submitted to support. On one hand, the detailed replies make support easier for us in the future (especially since those replies show up on through internet searches), but it also helps educate the users. On the other hand, however, it sets the bar high – a bar that I can only hope to meet!

Observation 2: Another task I've been assigned is to go through the workshop / online training materials to look for areas where steps that are obvious to the developers do not quite translate to the end users. It was always annoying in school to hear a prof say "it's obvious that...", and even more annoying to read it in a book (at least I could question the prof in class!). What was even more annoying, and infuriating, is when steps were skipped and it was never even hinted that they were skipped (at least the 'it's obvious' statement would clue us students into the fact that a leap of faith had occurred). Ok, so how does this relate to the concept of community driven / community oriented? Well, I could have been told to go through these materials on my own time at home, as it's part of my job to know this stuff (and not my job to learn the basics at work). I could have been told that users who don't understand basic computing should not being using our packages. I could have been told those things. Instead, it was clear that if the community cannot use our packages due to 'leaps of faith' in the documentation, then what is the point? If we aren't serving our community in every area (support, documentation, and software), then we might as well pack-up and go home. So, let's take the time to allow someone who knows a bit about computing and a bit about meteorology (as well as someone who grasps, at some level, the general computing abilities of those in an academic setting) to take a look at our training material in hopes that these documents can be further clarified! An added bonus for me – I may get to help out during the training workshops! I always enjoyed teaching, so this will be an opportunity to get back into a classroom-like setting.

Observation 3: I've had the chance to sit through a few staff meetings so far, and I've been impressed at the number of times I've heard "what does the community think?" or "this brought up at a Users or Policy committee meeting.". I could play devil's advocate and say "well, strategic planning for the next proposal is spinning up, so maybe that is why there is so much focus on the community", but the feel of these conversations has been very much "business as usual". Bottom line – these folks care about your thoughts, needs, concerns, etc.!

In the end, those of us at Unidata can care all we want about your needs and concerns, but we cannot help you if we don't hear from you! If the lines of communication break down, we all lose :-( So, how can you keep in touch with us? Well, there are several ways, and here are a few:

Contact those awesome individuals serving on our governing committees (maybe even consider serving!) (https://www.unidata.ucar.edu/community/index.html#governance)!
Want to get involved with Unidata but don't have the financial resources in your department? Well have we the deal for you: Unidata Equipment Awards! (https://www.unidata.ucar.edu/community/equipaward/)
Check us out on Twitter or Facebook.
Subscribe to our "News from Unidata" blog (https://www.unidata.ucar.edu/blogs/news/feed/entries/atom) or the "Developer's Blog" (https://www.unidata.ucar.edu/blogs/developer/) (both are open for comments, and comments - we love em').
Subscribe to our users mailing lists and participate (https://www.unidata.ucar.edu/support/#mailinglists)!
If you're in Boulder for a workshop or meeting, come say hello (https://www.unidata.ucar.edu/about/index.html#visit)!
Visit us at the Fall AGU or Annual AMS meetings (https://www.unidata.ucar.edu/events/index.html#conferences).
Hang out with us virtually and see what's going on in the community by checking out a seminar through the Unidata Seminar Series (https://www.unidata.ucar.edu/community/seminars/)
Consider attending a training workshop (https://www.unidata.ucar.edu/events/2011TrainingWorkshop/).
Consider attending the 2012 Triennial Workshop (organized by the Users Committee and Unidata!). More details to come, but see: https://www.unidata.ucar.edu/events/index.html#triennial.

Which of these do you use? Which are most appealing to you? What can we do to better keep those lines of communication open?