[netcdfgroup] HDF5-UDF 1.2 release, allowing translation of CSV to HDF5/NetCDF

<div class="socmaildefaultfont" dir="ltr" style="font-family:Arial, Helvetica, 
sans-serif;font-size:10pt" ><div dir="ltr" >Hello again!</div>
<div dir="ltr" >&nbsp;</div>
<div dir="ltr" >I'm writing to let you know about the availability of 
<strong>HDF5-UDF 1.2</strong>. You probably remember it, but it doesn’t hurt to 
state that the tool allows <strong>embedding routines written in C/C++, Python 
or Lua on HDF5/NetCDF files</strong> in a way that such routines execute each 
time the dataset is read.</div>
<div dir="ltr" >&nbsp;</div>
<div dir="ltr" >This new release comes with an exciting new feature: the 
ability to <strong>output compounds and string datatypes</strong>. This means 
that it's now possible to take a CSV file as input and dynamically generate 
HDF5 compound datasets as output. And the best part is that it's quite easy to 
do it, as the snippets below show.</div>
<div dir="ltr" >&nbsp;</div>
<div dir="ltr" >The new release is available on the project page at <a 
href="https://github.com/lucasvr/hdf5-udf"; 
>https://github.com/lucasvr/hdf5-udf</a>. Feedback is welcome as usual!</div>
<div dir="ltr" >&nbsp;</div>
<div dir="ltr" >Best regards,</div>
<div dir="ltr" >Lucas</div>
<div dir="ltr" >&nbsp;</div>
<div dir="ltr" >&nbsp;</div>
<div dir="ltr" >&nbsp;
<div><strong>Snippet of albumlist.csv:</strong></div>
<div><pre><code class="hljs javascript" ><span class="hljs-built_in" >  
Number</span>,Year,Album,Artist,Genre
<span class="hljs-number" >  1</span>,<span class="hljs-number" 
>1967</span>,Sgt. Peppers Lonely Hearts Club Band,The Beatles,Rock
<span class="hljs-number" >  2</span>,<span class="hljs-number" 
>1966</span>,Pet Sounds,The Beach Boys,Rock
<span class="hljs-number" >  3</span>,<span class="hljs-number" 
>1966</span>,Revolver,The Beatles,Rock
</code></pre>
<div>&nbsp;</div>
<div><strong><code class="hljs javascript" ><span 
style="font-family:Arial,Helvetica,sans-serif;" >User-Defined 
Function:</span></code></strong></div>
<div><pre><code class="hljs python" ><span class="hljs-function" ><span 
class="hljs-keyword" >  def</span> <span class="hljs-title" 
>dynamic_dataset</span><span class="hljs-params" >()</span>:</span>      
udf_data = lib.getData(<span class="hljs-string" >"GreatestAlbums"</span>)
    <span class="hljs-keyword" >  with</span> open(<span class="hljs-string" 
>"albumlist.csv"</span>) <span class="hljs-keyword" >as</span> f:
          <span class="hljs-comment" ># Skip the header</span>          
f.readline()

        <span class="hljs-keyword" >  for</span> i, line <span 
class="hljs-keyword" >in</span> enumerate(f.readlines()):
              <span class="hljs-comment" ># Split the line using "," as 
separator</span>              elements = [col.strip(<span class="hljs-string" 
>"\n"</span>) <span class="hljs-keyword" >for</span> col <span 
class="hljs-keyword" >in</span> line.split(<span class="hljs-string" 
>","</span>)]

              <span class="hljs-comment" ># Generate compound members 
on-the-fly</span>              udf_data[i].id = int(elements[<span 
class="hljs-number" >0</span>])
              udf_data[i].year = int(elements[<span class="hljs-number" 
>1</span>])
              lib.setString(udf_data[i].album, elements[<span 
class="hljs-number" >2</span>])
              lib.setString(udf_data[i].artist, elements[<span 
class="hljs-number" >3</span>])
              lib.setString(udf_data[i].genre, elements[<span 
class="hljs-number" >4</span>])
</code></pre>
<div>&nbsp;</div>
<div><strong>Command to embed the User-Defined Function on the HDF5/NetCDF 
file:</strong></div>
<div><pre><code class="hljs ruby" >  $ hdf5-udf file.nc4 dynamic_dataset.py \
    <span class="hljs-string" 
>'GreatestAlbums:{id:int32,year:int16,album:string(40),artist:string,genre:string}:500'</span></code></pre>
<div>&nbsp;</div>
<div><strong>First few entries of the dynamically generated 
dataset:</strong></div>
<div><pre><code class="hljs ruby" >  $ h5dump -O -d /GreatestAlbums file.nc4
   (<span class="hljs-number" >0</span>): {
         <span class="hljs-number" >1</span>,
         <span class="hljs-number" >1967</span>,
         <span class="hljs-string" >"Sgt. Peppers Lonely Hearts Club 
Band"</span>,
         <span class="hljs-string" >"The Beatles"</span>,
         <span class="hljs-string" >"Rock"</span>      },
   (<span class="hljs-number" >1</span>): {
         <span class="hljs-number" >2</span>,
         <span class="hljs-number" >1966</span>,
         <span class="hljs-string" >"Pet Sounds"</span>,
         <span class="hljs-string" >"The Beach Boys"</span>,
         <span class="hljs-string" >"Rock"</span>      },
   (<span class="hljs-number" >2</span>): {
         <span class="hljs-number" >3</span>,
         <span class="hljs-number" >1966</span>,
         <span class="hljs-string" >"Revolver"</span>,
         <span class="hljs-string" >"The Beatles"</span>,
         <span class="hljs-string" >"Rock"</span>      },
    ...
</code></pre></div></div></div></div></div></div><BR>


  • 2021 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: