[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDFJava #BNA-191717]: chunking in Java



Jeff,

> I made the changes you suggested with the following result:
> 
> 10000 records, 8 bytes / record = 80000 bytes raw data
> 
> original program (NetCDF4, no chunking): 537880 bytes (6.7x)
> file size with chunk size of 2000 = 457852 bytes (5.7x)
> 
> So a little better, but still not good. I then tried different chunk sizes
> of 10000, 5000, 200, and even 1, which I would've thought would give me the
> original size, but all gave the same resulting file size of 457852.

Changing the chunking in a C program using the netCDF C library, version 4.3.2,
shows the expected improvement using larger chunk sizes:

  file size with chunk size of 1:       457869 bytes
  file size with chunk size of 2000:   82685 bytes
  file size using classic format:         80140 bytes

The netCDF-Java library uses the netCDF C library when writing netCDF-4 files,
but I'm not sure which netCDF C version it uses.  All versions of the netCDF-C
library after 4.2.0 (released July 2012) show the same file size using a chunk
size of 2000, 82685 bytes.

So it appears that there's something wrong with the way the chunk size is being 
set from the Java test program you're using ...

--Russ
  
> Finally, I tried writing more records to see if it's just a symptom of a
> small data set. With 1M records:
> 
> 8MB raw data, chunk size = 2000
> 45.4MB file (5.7x)
> 
> This is starting to seem like a lost cause given our small data records.
> I'm wondering if you have information I could use to go back to the archive
> group and try to convince them to use NetCDF3 instead.
> 
> jeff
> 
> 
> 
> address@hidden> wrote:
> 
> > Great, thanks Ethan, I'll give that a try. We have an external requirement
> > being imposed on us to use NetCDF4, but I don't know the reasoning behind
> > it.
> >
> > jeff
> >
> >
> > address@hidden> wrote:
> >
> >> Hi Jeff,
> >>
> >> An alternate approach would be to avoid the whole chunking issue by
> >> writing netCDF-3 files instead of netCDF-4 files. But, if you do that, you
> >> don't get to take advantage of compression.
> >>
> >> If you want to stick with netCDF-4, I included a few details and pointers
> >> to the needed methods and classes in my response on the netcdf-java list
> >>
> >>
> >> http://www.unidata.ucar.edu/mailing_lists/archives/netcdf-java/2014/msg00055.html
> >>
> >> But I think it boils down to a few lines of code. I haven't tested this
> >> but interspersed in your code below are a few lines that I think should get
> >> you going.
> >>
> >> Ethan
> >>
> >> On 5/2/2014 8:54 AM, Jeff Johnson - NOAA Affiliate wrote:> New Ticket:
> >> chunking in Java
> >> >
> >> > How do you set the chunk size via the Java API? I'm trying to get my
> >> file
> >> > size down and was told by Ethan from unidata to change the chunk size
> >> from
> >> > 1 to 2000 on the unlimited dimension, but I don't see an API to do that.
> >> >
> >> > Below is my sample code.
> >> >
> >> > import ucar.ma2.ArrayDouble;
> >> > import ucar.ma2.ArrayLong;
> >> > import ucar.ma2.DataType;
> >> > import ucar.ma2.InvalidRangeException;
> >> > import ucar.nc2.*;
> >> >
> >> > import java.io.IOException;
> >> > import java.nio.file.FileSystems;
> >> > import java.nio.file.Files;
> >> > import java.nio.file.Path;
> >> > import java.util.ArrayList;
> >> > import java.util.List;
> >> >
> >> > public class TestGenFile2 {
> >> >   public static void main(String[] args) {
> >> >     NetcdfFileWriter dataFile = null;
> >> >
> >> >     try {
> >> >       try {
> >> >
> >> >         // define the file
> >> >         String filePathName = "output.nc";
> >> >
> >> >         // delete the file if it already exists
> >> >         Path path = FileSystems.getDefault().getPath(filePathName);
> >> >         Files.deleteIfExists(path);
> >> >
> >> >         // enter definition mode for this NetCDF-4 file
> >> >         dataFile =
> >> > NetcdfFileWriter.createNew(NetcdfFileWriter.Version.netcdf4,
> >> filePathName);
> >>
> >> replace the above line with
> >>
> >> Nc4Chunking chunkingStrategy =
> >> Nc4ChunkingStrategyImpl.factory(Nc4Chunking.Strategy.standard, 0, false);
> >> NetcdfFileWriter.createNew(NetcdfFileWriter.Version.netcdf4,
> >> filePathName, chunkingStrategy);
> >>
> >>
> >> >         // create the root group
> >> >         Group rootGroup = dataFile.addGroup(null, null);
> >> >
> >> >         // define dimensions, in this case only one: time
> >> >         Dimension timeDim = dataFile.addUnlimitedDimension("time");
> >> >         List<Dimension> dimList = new ArrayList<>();
> >> >         dimList.add(timeDim);
> >> >
> >> >         // define variables
> >> >         Variable time = dataFile.addVariable(rootGroup, "time",
> >> > DataType.DOUBLE, dimList);
> >> >         dataFile.addVariableAttribute(time, new Attribute("units",
> >> > "milliseconds since 1970-01-01T00:00:00Z"));
> >>
> >> Add the following line here
> >>
> >> dataFile.addVariableAttribute(time, new Attribute("_ChunkSize", new
> >> Integer( 2000)));
> >>
> >>
> >> >
> >> >         // create the file
> >> >         dataFile.create();
> >> >
> >> >         // create 1-D arrays to hold data values (time is the dimension)
> >> >         ArrayDouble.D1 timeArray = new ArrayDouble.D1(1);
> >> >
> >> >         int[] origin = new int[]{0};
> >> >         long startTime = 1398978611132L;
> >> >
> >> >         // write the records to the file
> >> >         for (int i = 0; i < 10000; i++) {
> >> >           // load data into array variables
> >> >           double value = startTime++;
> >> >           timeArray.set(timeArray.getIndex(), value);
> >> >
> >> >           origin[0] = i;
> >> >
> >> >           // write a record
> >> >           dataFile.write(time, origin, timeArray);
> >> >         }
> >> >       } finally {
> >> >         if (null != dataFile) {
> >> >           // close the file
> >> >           dataFile.close();
> >> >         }
> >> >       }
> >> >     } catch (IOException | InvalidRangeException e) {
> >> >       e.printStackTrace();
> >> >     }
> >> >   }
> >> > }
> >> >
> >> > thanks-
> >> > jeff
> >> >
> >>
> >>
> >> Ticket Details
> >> ===================
> >> Ticket ID: BNA-191717
> >> Department: Support netCDF Java
> >> Priority: Normal
> >> Status: Closed
> >>
> >>
> >
> >
> > --
> > Jeff Johnson
> > DSCOVR Ground System Development
> > Space Weather Prediction Center
> > address@hidden
> > 303-497-6260
> >
> 
> 
> 
> --
> Jeff Johnson
> DSCOVR Ground System Development
> Space Weather Prediction Center
> address@hidden
> 303-497-6260
> 
> 
Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: BNA-191717
Department: Support netCDF
Priority: Normal
Status: Closed