Due to the current gap in continued funding from the U.S. National Science Foundation (NSF), the NSF Unidata Program Center has temporarily paused most operations. See NSF Unidata Pause in Most Operations for details.

[netcdfgroup] HDF5-UDF 1.2 release, allowing translation of CSV to HDF5/NetCDF

<div class="socmaildefaultfont" dir="ltr" style="font-family:Arial, Helvetica, 
sans-serif;font-size:10pt" ><div dir="ltr" >Hello again!</div>
<div dir="ltr" >&nbsp;</div>
<div dir="ltr" >I'm writing to let you know about the availability of 
<strong>HDF5-UDF 1.2</strong>. You probably remember it, but it doesn’t hurt to 
state that the tool allows <strong>embedding routines written in C/C++, Python 
or Lua on HDF5/NetCDF files</strong> in a way that such routines execute each 
time the dataset is read.</div>
<div dir="ltr" >&nbsp;</div>
<div dir="ltr" >This new release comes with an exciting new feature: the 
ability to <strong>output compounds and string datatypes</strong>. This means 
that it's now possible to take a CSV file as input and dynamically generate 
HDF5 compound datasets as output. And the best part is that it's quite easy to 
do it, as the snippets below show.</div>
<div dir="ltr" >&nbsp;</div>
<div dir="ltr" >The new release is available on the project page at <a 
href="https://github.com/lucasvr/hdf5-udf"; 
>https://github.com/lucasvr/hdf5-udf</a>. Feedback is welcome as usual!</div>
<div dir="ltr" >&nbsp;</div>
<div dir="ltr" >Best regards,</div>
<div dir="ltr" >Lucas</div>
<div dir="ltr" >&nbsp;</div>
<div dir="ltr" >&nbsp;</div>
<div dir="ltr" >&nbsp;
<div><strong>Snippet of albumlist.csv:</strong></div>
<div><pre><code class="hljs javascript" ><span class="hljs-built_in" >  
Number</span>,Year,Album,Artist,Genre
<span class="hljs-number" >  1</span>,<span class="hljs-number" 
>1967</span>,Sgt. Peppers Lonely Hearts Club Band,The Beatles,Rock
<span class="hljs-number" >  2</span>,<span class="hljs-number" 
>1966</span>,Pet Sounds,The Beach Boys,Rock
<span class="hljs-number" >  3</span>,<span class="hljs-number" 
>1966</span>,Revolver,The Beatles,Rock
</code></pre>
<div>&nbsp;</div>
<div><strong><code class="hljs javascript" ><span 
style="font-family:Arial,Helvetica,sans-serif;" >User-Defined 
Function:</span></code></strong></div>
<div><pre><code class="hljs python" ><span class="hljs-function" ><span 
class="hljs-keyword" >  def</span> <span class="hljs-title" 
>dynamic_dataset</span><span class="hljs-params" >()</span>:</span>      
udf_data = lib.getData(<span class="hljs-string" >"GreatestAlbums"</span>)
    <span class="hljs-keyword" >  with</span> open(<span class="hljs-string" 
>"albumlist.csv"</span>) <span class="hljs-keyword" >as</span> f:
          <span class="hljs-comment" ># Skip the header</span>          
f.readline()

        <span class="hljs-keyword" >  for</span> i, line <span 
class="hljs-keyword" >in</span> enumerate(f.readlines()):
              <span class="hljs-comment" ># Split the line using "," as 
separator</span>              elements = [col.strip(<span class="hljs-string" 
>"\n"</span>) <span class="hljs-keyword" >for</span> col <span 
class="hljs-keyword" >in</span> line.split(<span class="hljs-string" 
>","</span>)]

              <span class="hljs-comment" ># Generate compound members 
on-the-fly</span>              udf_data[i].id = int(elements[<span 
class="hljs-number" >0</span>])
              udf_data[i].year = int(elements[<span class="hljs-number" 
>1</span>])
              lib.setString(udf_data[i].album, elements[<span 
class="hljs-number" >2</span>])
              lib.setString(udf_data[i].artist, elements[<span 
class="hljs-number" >3</span>])
              lib.setString(udf_data[i].genre, elements[<span 
class="hljs-number" >4</span>])
</code></pre>
<div>&nbsp;</div>
<div><strong>Command to embed the User-Defined Function on the HDF5/NetCDF 
file:</strong></div>
<div><pre><code class="hljs ruby" >  $ hdf5-udf file.nc4 dynamic_dataset.py \
    <span class="hljs-string" 
>'GreatestAlbums:{id:int32,year:int16,album:string(40),artist:string,genre:string}:500'</span></code></pre>
<div>&nbsp;</div>
<div><strong>First few entries of the dynamically generated 
dataset:</strong></div>
<div><pre><code class="hljs ruby" >  $ h5dump -O -d /GreatestAlbums file.nc4
   (<span class="hljs-number" >0</span>): {
         <span class="hljs-number" >1</span>,
         <span class="hljs-number" >1967</span>,
         <span class="hljs-string" >"Sgt. Peppers Lonely Hearts Club 
Band"</span>,
         <span class="hljs-string" >"The Beatles"</span>,
         <span class="hljs-string" >"Rock"</span>      },
   (<span class="hljs-number" >1</span>): {
         <span class="hljs-number" >2</span>,
         <span class="hljs-number" >1966</span>,
         <span class="hljs-string" >"Pet Sounds"</span>,
         <span class="hljs-string" >"The Beach Boys"</span>,
         <span class="hljs-string" >"Rock"</span>      },
   (<span class="hljs-number" >2</span>): {
         <span class="hljs-number" >3</span>,
         <span class="hljs-number" >1966</span>,
         <span class="hljs-string" >"Revolver"</span>,
         <span class="hljs-string" >"The Beatles"</span>,
         <span class="hljs-string" >"Rock"</span>      },
    ...
</code></pre></div></div></div></div></div></div><BR>


  • 2021 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: