Text File adapter

To: visad-list@xxxxxxxxxxxxx
Subject: Text File adapter
From: Tom Whittaker <tomw@xxxxxxxxxxxxx>
Date: Mon, 29 Jan 2001 10:43:27 -0600
In the latest release of VisAD, I've added a TextFile adapter.  With very
helpful input from Don Murray, I've focused this on reading tab-, comma-,
semicolon- and blank-separated files, with a minimum of editting (I hope ;-). 
In addition to the generic constructor, I've also added a signature that allows
you to define the metadata as parameters, rather than having it read from the
text file.  Here is the README.text that is contained in the source code:


                          The VisAD TextAdapter
                             January, 2001

The VisAD TextAdapter is designed to allow you to quickly read in data
that are in the form of an ASCII text file.  We fully expect this
class to continue to grow to accommodate other, common variations of
text file formats that might be encountered.  

Two example files are also contained in the release.  It is most
convenient to test these using the VisAD Spreadsheet or the Jython
(Python) interface.  Fire up either the visad.python.JPythonFrame, or
simply start it from the command line and use a sequence like:

  >>> from visad.python.JPythonMethods import *
  >>> a = load("example1.txt")
  >>> plot(a)
  >>> clearplot()
  >>> b = load("example2.csv")
  >>> plot(b)
  >>> clearplot()

Or simply load these files into the SpreadSheet, and then experiment
with the mappings!


I. Text file formats

The text files usually consist of 2 header lines and then data.
Optional comment lines may be interspersed throughout.  The data
portion of the file may be either blank-, comma-, semicolon- or
tab-separated values.  At present only numeric data can be read.  The
model for these files is spreadsheet (Excel, etc) output -- that is,
column-oriented values.

Comment lines are any line that starts with either #, !, or %.

The file extensions recognized by the VisAD DefaultFamily and the
TextAdapter:  

  .bsv -> blank-separated values
  .tsv -> tab-separated values
  .csv -> comma-separated values

In addition, the VisAD DefaultFamily will recognize the extension .txt
and invoke TextAdapter.  In this case, however, the TextAdapter
attempts to sense the delimiter using the hierarchy:

          tab,  semicolon,  comma,  blank

That is, if a tab character appears in the line, tab will be used.  If
not, then it looks for a semicolon.  Otherwise, if a comma appears in
the line it will be used.  If neither a tab, semicolon, or comma appears,
then blank will be used.

We tried to keep the amount of modification you might have to make to
existing files to a minimum.   The general layout is:

Line 1:  functional description of the data in "VisAD" lingo (aka the
         "MathType")

Line 2:  column headers, which name each parameter and possibly give
         them a physical unit, using delimiters defined above.  

Line 3-n: the data values (with delimiters as define above; for
          filenames without recognized extensions, the delimiter 
          used in this data section does not have to be the same 
          as the one used on Line 2)

Please refer to the VisAD Library Developers Guide, section "3.1
MathTypes" for information on how to define the functional
description.  Also, take a look to the examples at the end of this
file.

Also, please note that if you are using the TextAdapter constructor
directly, the "Line 1" and "Line 2" values do not have to be in the
file - alternate signatures allow these values to be passed as
arguments.  However, if your text file is used through the
VisAD DefaultFamily (as in the SpreadSheet), you must provide the
information right in the text file.


II. Line 1  (ignoring any preceding comment lines...)

This line specifies a functional description of the data, using the
VisAD "MathType" string.  

There are two categories of data that may be represented in these text
of files:  

   1) 2-D arrays of a single parameter, or 
   2) 1,2,or 3-D (domain) points of one or more parameters.

IIa.  2-D arrays

  In this case, the "VisAD" functional description looks like:
  
     (x,y)->(temperature)
     (Longitude, Latitude)->(speed)

  And the data portion of the file contains "x" values per line,
  and "y" lines of data.  See Examples #2 and Example #6, below.

  Only the "y" domain component may have its values defined in the
  file.  

  2-D arrays are implied when:
      * there are 2 domain components
      * there is only one range component
      * there is more than one domain sampling value for the first
        domain component (that is, more than one data value on a line
        in the text file)


IIb.  1,2,or 3-D points

  Just about every other form of a text file falls into this category.

  Examples of VisAD functional descriptions:

  (x)->(temperature, dewpoint, speed)
  (x,y,z)->(temperature, speed)
  
   At least one of the domain variables (x,y,z) _must_ be defined
   by data in the file.  See Examples #1, #3, #4 and #5, below.



III. Line 2  (ignoring any comment lines that might come before)

The second line of the text files defines which column of the data
portion contains what parameters.  (Note that, as with the "Line 1",
an alternate form of the constructor is available so this information
can be passed as an argument rather than being read from the file.) 

If you have other information that you need to specify for a parameter,
you should use a blank-separated sequence of phrases in the form
"key=value", to specify what you need.  Here are the possible keys:

     key      value
     ----     ---------------
     unit     name of Unit (default = no unit)

     miss     value to be treated as missing (default = no missing values)

     scale    value that each datum is multiplied by (default = 1.0)

     offset   value that is added to each scaled datum (default = 0.0)

     error    value of the estimated error for this parameter (default = none)
           ** In this release, only range error estimates are implemented.

     interval either 'true' or 'false' to indicate that this parameter
              is an _interval_, like a difference (default = false)

           ** The following is _not_ implemented in this release:
     pos      column-oriented location of the data values in the form
              first:last (if present for one item, this MUST be supplied
              for all items!)

Two short examples:
     a, b, c, temperature[unit=degC err=.1 miss=999.9], speed[m/s]
     Longitude[scale=-1], Latitude, temp[unit=degK miss=999.9],
dewPoint[unit=degK miss=999.9]

(Please note in the second example, the "scale=-1" for the Longitude serves
to invert the sign of the values read from the file).

(You might also note that "C" is not the VisAD unit name for degrees
Celcius..."C" means Coulomb...)


As with Line 1, there are two cases to consider when defining the contents of
this Line:

IIIa. 2-D arrays

  In this case, _only_ one range parameter is permitted.  You may have
  0 or 1 domain parameter names, as well as any "column skip" dummy names
  (these are only permitted _before_ the actual range parameter name,
  though).  For this simplest example:

    temperature[unit=degF]

  Says that the data contains only values of temperature in degrees F.
  The domain parameters defined in Line 1 will be computed based on
  the number of items per line and the number of lines of text.

  If you need to skip some columns before the values of the variable
  start, just put in a "skip" name for each column.  For example:

    skip, skip, skip, temperature[unit=degF]

  indicates that the values in the first three columns should be
  skipped, and the rest of the values on the line will make up the
  "columns" of the 2-D array.   The name "skip" can be _any_ unused
  name.

  
IIIb. 1,2,or 3-D points

  In this case, you define which column corresponds to which
  parameter you named on Line 1. The order doesn't matter, only that
  the correct column is identified.  If there are columns of data that
  are to be ignored, just use a "skip" name that was not defined on
  Line 1.  For example:

   x, y, skip, temperature[unit=degC], skip, pressure[unit=hPa]

  In this case, the name "skip" is a filler to indicate what column(s)
  should be skipped.
  
  
In both cases, you may also use this line of text to define the values
of the domain component samplings in the form:

    name(first:last)

This means that 
   a) "name" is a domain component, and... 
   b) the (sampling) values of "name" are NOT read from the file, but are
      computed based on the range "first to last" and the number of lines
      of text (number of samples), or (in the case of 2-D arrays)
      possibly the number of range values on the line (see below), and....  
   c) this name is ignored for the purposes of counting/locating, 
      columns for other parameters.

If the name of a domain component variable does NOT appear on Line
2, it's values are assumed to be 0:(N-1)  where "N" is the number of
samples (number of lines) in the file.  

There is one exception to this:  in the case of 2-D arrays, the first 
domain component is assumed to apply to the number of range values 
on each line of text, _not_ the number of "samples" or lines of text.


Finally, if you need to combine the range with other information about
the parameter, it would look like:
          
          x(1.0:13.7)[unit=cm]


IV. Examples

Here are a few examples taken from the beginning of some files:

Example #1 - Simple CSV file, for a function value=f(x) <=

(x)->(value)
value
0 
7.2
-9.1

Example #2 -  CSV file of a 2-D array <=

(x,y)->(value)
skip,skip,skip,skip,value
0 , 0 , 17 , 34 , 50 , 64 , 76 , 86 , 93 , 98 , 99 
1 , 17 , 34 , 50 , 64 , 76 , 86 , 93 , 98 , 99 , 98 

Example #3 - a ".txt" file of two range components (note that the
delimiters used on "Line 2" are different that the ones used in the
data) <=

(x)->(value1, value2)
skip  value1  skip  skip  skip  skip  value2
100 , 0 , 17 , 34 , 50 , 64 , 76 , 86 , 93 , 98 , 99 
101 , 17 , 34 , 50 , 64 , 76 , 86 , 93 , 98 , 99 , 98 

Example #4 - CSV file of two range components located at 2-d coordinates <=

(x,y)->(value_a, value_b)
y,x,p,value_a[unit=degC],p,p,p,value_b[unit=degF]
0 , 0 , 17 , 34 , 50 , 64 , 76 , 86 , 93 , 98 , 99 
1 , 17 , 34 , 50 , 64 , 76 , 86 , 93 , 98 , 99 , 98 

Example #5 -  BSV file of real data <=

% Retrieval statistics for mlw_K+ir3_2a.ret :
% Zbottom threshold = 0.0 km  liqclouds=1
%   IWP      IWP errors (dB)        Dme errors (dB) 
% (g/m^2)   mean   rms   median    mean   rms    median 
(IPW)->(IWP_Error, Dme_error, IWP_Error_mean, Dme_error_mean)
IPW[g/m^2]   IWP_Error_mean p  IWP_Error   Dme_error_mean  p   Dme_error
   1.41    7.092  7.890  6.831    0.746  1.768  1.139    717

Example #6 - BSV file of a 2D grid, with the locations given Lat/Lon values <=

(Longitude,Latitude)->(value)
Longitude(-130:-40) Latitude(20:60) p value[unit=degC]
0   0   17   34   50   64   76   86   93   98   99 
1   17   34   50   64   76   86   93   98   99   98 



-- 
Tom Whittaker (tomw@xxxxxxxxxxxxx)
University of Wisconsin-Madison
Space Science and Engineering Center
Phone/VoiceMail: 608/262-2759
Fax: 608/262-5974
2001 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the visad archives: