[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[THREDDS #ZIZ-818863]: Thredds inflation of data



We can probably make more headway if you
use the client-side ToolsUI program
to read the file. THis way we can avoid all of the
server-side clutter and focus only on the unzipping
of that file.

Once you do that, it turns out there is a way to set some
of the debugging flags.

There is a class DebugFlagsImpl that has a constructor::
  public DebugFlagsImpl(String flagsOn)
So if you create an instance with an argument like this:
    DebugFlags flags = new DebugFlagsImpl("H5iosp/filter");
and then call:
  H5Iosp.setDebugFlags(flags)
It in theory will turn on that debug flag.
You can do this by building a main program that invokes
ToolsUI and contains the above code.

You can see the complete set of flags by looking at 
the body of H5Iosp.setDebugFlags().
You might try it and see if it works and send me the output.



> How do you know if those log entries are being generated by the thredds
> H5Iosp code or by the apache httpclient code?
> 
> 
> > At the moment the log is being generated by the version of java.util.zip 
> > that is needed to explit the hardware, without turning that debug parameter 
> > on in H5tiledLayoutBB.java I can't see any more detail.
> >
> > Martyn Hunt  
> > Technical Lead, Mainframe
> > Met Office  FitzRoy Road  Exeter  Devon  EX1 3PB  United Kingdom
> > Tel: +44 (0)1392 884897  
> > Email: address@hidden  Website: www.metoffice.gov.uk
> >
> > -----Original Message-----
> > From: Unidata THREDDS Support [mailto:address@hidden]
> > Sent: 26 October 2017 17:21
> > To: Hunt, Martyn <address@hidden>
> > Cc: address@hidden
> > Subject: [THREDDS #ZIZ-818863]: Thredds inflation of data
> >
> > >I really don't understand is how the data which has been compressed and 
> > >chunked
> > > elsewhere at a large chuinksize can be decompressed in smaller blocks 
> > > than the chunksize.
> > I will assume that this is correct.
> >
> > In your original message you said:
> > > Instead of this, in the compression software logs,
> > > I see 1000's of 512byte inflate requests
> > Who is generating that log: your hardware, InflateInputStream, something 
> > else?
> >
> >
> > > Your description does appear to be what is happening, what I really don't 
> > > understand is how the data which has been compressed and chunked 
> > > elsewhere at a large chuinksize can be decompressed in smaller blocks 
> > > than the chunksize.
> > >
> > > As I mentioned before, I suspect it is something in H5tiledLayoutBB.java 
> > > (https://github.com/Unidata/thredds/blob/b731bcb45b6e10b7e6102e97a9ef35e9fef43c93/cdm/src/main/java/ucar/nc2/iosp/hdf5/H5tiledLayoutBB.java),
> > >  which has a debug flag available to be set, which on line 285 would 
> > > start printing out the "bytes in, bytes out" that the HDF5 code think is 
> > > being sent to java.util.zip.  Is there a parameter in the Thredds log 
> > > configuration to set this on, so we can see another piece of the puzzle?
> > >
> > > Thanks
> > >
> > > Martyn
> > >
> > > Martyn Hunt
> > > Technical Lead, Mainframe
> > > Met Office  FitzRoy Road  Exeter  Devon  EX1 3PB  United Kingdom
> > > Tel: +44 (0)1392 884897
> > > Email: address@hidden  Website: www.metoffice.gov.uk
> > >
> > > -----Original Message-----
> > > From: Unidata THREDDS Support [mailto:address@hidden]
> > > Sent: 25 October 2017 20:24
> > > To: Hunt, Martyn <address@hidden>
> > > Cc: address@hidden
> > > Subject: [THREDDS #ZIZ-818863]: Thredds inflation of data
> > >
> > > Ok, now I see.
> > > As a rule, HDF5/netcdf-4 decompression operates on a chunk at a time 
> > > (where chunk is the chunking parameters associated with the file). Do you 
> > > know what the chunking parameters are for one of your files? You can see 
> > > it by using the ncdump command in the c-library
> > > ncdump -hs <filename>
> > >
> > > My speculation is this:
> > > 1. You are using the pure-Java HDF5 reader code in Thredds
> > > (this is the default for netcdf-4 files).
> > > 2. The pure java HDF5 reader is either using a different
> > > implementation of zip or is breaking up the incoming chunk
> > > into smaller pieces and decompressing those smaller chunks.
> > > I will undertake to see which case in #2 (if either) is being used.
> > > Any additional insight you have would be appreciated.
> > >
> > >
> > > > Our dataflow is:
> > > >
> > > > 1 - HPC Produces Chunked Compressed NetCDF Data
> > > > 2 - HPC FTPs data to our Thredds Systems
> > > > 3 - Downstream client systems request the data from Thredds from the 
> > > > opendap interface.  They expect to retrieve uncompressed data, and 
> > > > request a sinlge chunk at a time.
> > > > 4 - Thredds reads the chunk, uncompresses it, and then sends it using 
> > > > the opendap protocol.
> > > >
> > > > It is step 4 where I want the hardware to be used, and I can force it 
> > > > to be used by reducing the limit on the smallest decompress that is 
> > > > passed to the hardware, however as the data Thredds is currently 
> > > > passing is 512 bytes rather than the recommended minimum for the 
> > > > hardware of 16384 bytes, performance is awful.
> > > >
> > > > As our client systems only ever request a full chunk at a time (which 
> > > > is always ~7MB of data when uncompressed), the behaviour I was looking 
> > > > for is that Thredds will read a single chunk from disk (between 1MB and 
> > > > 2MB depending on the data and compression level), pass that whole chunk 
> > > > as one to java.util.zip (or at least more than 16384 bytes at a time, 
> > > > the bigger the better), where the hardware will take over and inflate 
> > > > the data.  The hardware and java.util.zip then return the uncompressed 
> > > > data to Thredds/Opendap which then return it to the client system.
> > > >
> > > > Thanks
> > > >
> > > > Martyn
> > > >
> > > > Martyn Hunt
> > > > Technical Lead, Mainframe
> > > > Met Office  FitzRoy Road  Exeter  Devon  EX1 3PB  United Kingdom
> > > > Tel: +44 (0)1392 884897
> > > > Email: address@hidden  Website: www.metoffice.gov.uk
> > > >
> > > > -----Original Message-----
> > > > From: Unidata THREDDS Support
> > > > [mailto:address@hidden]
> > > > Sent: 24 October 2017 20:22
> > > > To: Hunt, Martyn <address@hidden>
> > > > Cc: address@hidden
> > > > Subject: [THREDDS #ZIZ-818863]: Thredds inflation of data
> > > >
> > > > My mistake, I though you were talking about http level chunking and 
> > > > compression.
> > > > I thought this because of the logs you sent me.
> > > > But I am confused about where you want to use your hardware.  Is your 
> > > > plan to use it to decompress the file on the server before transmitting 
> > > > it using the opendap protocol? As a reminder, the file will be 
> > > > decompressed before translating it into the opendap format in order to 
> > > > pass it over the http connection.
> > > > Can you elaborate on how you ideally want that special hardware to be 
> > > > used?
> > > >
> > > >
> > > >
> > > > > That’s not what I understood from the HDF5 doc 
> > > > > (https://support.hdfgroup.org/HDF5/doc/Advanced/Chunking/):
> > > > >
> > > > > "Dataset chunking also enables the use of I/O filters, including 
> > > > > compression. The filters are applied to each chunk individually, and 
> > > > > the entire chunk is processed at once."
> > > > >
> > > > > Note that I am only talking about reading and serving a compressed 
> > > > > netcdf file here, and not about the apache/tomcat compression for 
> > > > > data transfer, i.e. the problem I have is likely somewhere in here:
> > > > >
> > > > > https://github.com/Unidata/thredds/blob/b731bcb45b6e10b7e6102e97a9ef
> > > > > 35
> > > > > e9fef43c93/cdm/src/main/java/ucar/nc2/iosp/hdf5/H5tiledLayoutBB.java
> > > > >
> > > > >
> > > > > Martyn Hunt
> > > > > Technical Lead, Mainframe
> > > > > Met Office  FitzRoy Road  Exeter  Devon  EX1 3PB  United Kingdom
> > > > > Tel: +44 (0)1392 884897
> > > > > Email: address@hidden  Website: www.metoffice.gov.uk
> > > > >
> > > > > -----Original Message-----
> > > > > From: Unidata THREDDS Support
> > > > > [mailto:address@hidden]
> > > > > Sent: 23 October 2017 18:34
> > > > > To: Hunt, Martyn <address@hidden>
> > > > > Cc: address@hidden
> > > > > Subject: [THREDDS #ZIZ-818863]: Thredds inflation of data
> > > > >
> > > > > We use the Apache Httpclient system
> > > > > (http://hc.apache.org/httpcomponents-client-4.5.x/)
> > > > > so the fix will need to be with respect to that.
> > > > >
> > > > > My speculation is that there are two related issues that need 
> > > > > investigation.
> > > > > 1. chunking - 1 large response is chunked into multiple smaller
> > > > > chunks on the server side that are then reassembled on the client 
> > > > > side.
> > > > > 2. A specific compressor -- GzipCompressingEntity, I think -- is
> > > > > used to do the actual compression on the server side.
> > > > >
> > > > > I do not know the order in which these are used by the server side. 
> > > > > It is possible that the compressor operates first and then the 
> > > > > chunker divides that compressed output.
> > > > > It is also possible that the chunker is first and that the compressor 
> > > > > operates on each separate chunk.
> > > > >
> > > > > We will need to investigate to see which is the case (if either) and 
> > > > > then figure out how to change the chunking and/or the compression 
> > > > > parameters.  I suspect that sending very large (1.6m) chunks is a bad 
> > > > > idea. So, I would hope we set things up so that the compression is 
> > > > > first and the compressed data is then chunked.
> > > > > Note that this will also require a corresponding change on the client 
> > > > > side.
> > > > >
> > > > > In any case, this is going to take a while for me to figure it out.
> > > > >
> > > > >
> > > > > =======================
> > > > > > I am currently trying to get a compression accelerator to work with 
> > > > > > Thredds, with the aim to reduce the CPU time that Thredds spends 
> > > > > > decompressing chunks of data.  The compression card plugs straight 
> > > > > > in to IBM Java8 and java.util.zip with no changes needed to the 
> > > > > > application that uses it.  However, the replacement code with 
> > > > > > always revert to software inflation when the size of the data 
> > > > > > passed to java.util.zip is less than 16384 bytes.
> > > > > >
> > > > > > Our data is chunked and compressed, with the data we want to 
> > > > > > retrieve ending up as 70  ~1.6MB chunks (compressed), which should 
> > > > > > inflate to ~7MB each (see the hdls.txt file for more detail).
> > > > > >
> > > > > > When requesting the data, I use the following URL to request a 
> > > > > > single chunk (our applications run through selecting each of the 70 
> > > > > > chunks sequentially one at a time, in this example I'm just picking 
> > > > > > one chunk.
> > > > > >
> > > > > > http://dvtds02-zvopaph2:8080/thredds/dodsC/decoupler/mhtest/origin
> > > > > > al
> > > > > > .n
> > > > > > c.ascii?air_temperature[0][43][0:1:1151][0:1:1535<http://dvtds02-z
> > > > > > vo
> > > > > > pa
> > > > > > ph2:8080/thredds/dodsC/decoupler/mhtest/original.nc.ascii?air_temp
> > > > > > er at ure%5b0%5d%5b43%5d%5b0:1:1151%5d%5b0:1:1535>]
> > > > > >
> > > > > > While I expect there may be a few smaller inflate operations at the 
> > > > > > start/end of the request, I'd expect that there would be a single 
> > > > > > 1.6MB --> 7MB inflate request in there.  Instead of this, in the 
> > > > > > compression software logs, I see 1000's of 512byte inflate 
> > > > > > requests, which as they are smaller than the min 16384byte limit 
> > > > > > the compression card has, never get passed to the compression card.
> > > > > >
> > > > > > e.g.
> > > > > >
> > > > > > 2017-10-23T12:55:31.643894+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0] inflate:   flush=1 next_in=0x9b917b198 avail_in=512 
> > > > > > next_out=0x9b917b3b8 avail_out=9200 total_in=2048 total_out=31827 
> > > > > > crc/adler=1b38b342
> > > > > > 2017-10-23T12:55:31.644229+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0]            flush=1 next_in=0x9b917b398 avail_in=0 
> > > > > > next_out=0x9b917c422 avail_out=4998 total_in=2560 total_out=36029 
> > > > > > crc/adler=e835c747 rc=0
> > > > > > 2017-10-23T12:55:31.644541+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0] inflate:   flush=1 next_in=0x9b917b398 avail_in=0 
> > > > > > next_out=0x9b917b3b8 avail_out=9200 total_in=2560 total_out=36029 
> > > > > > crc/adler=e835c747
> > > > > > 2017-10-23T12:55:31.644909+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0]            flush=1 next_in=0x9b917b398 avail_in=0 
> > > > > > next_out=0x9b917b3b8 avail_out=9200 total_in=2560 total_out=36029 
> > > > > > crc/adler=e835c747 rc=-5
> > > > > > 2017-10-23T12:55:31.645234+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0] inflate:   flush=1 next_in=0x9b917b198 avail_in=512 
> > > > > > next_out=0x9b917b3b8 avail_out=9200 total_in=2560 total_out=36029 
> > > > > > crc/adler=e835c747
> > > > > > 2017-10-23T12:55:31.645568+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0]            flush=1 next_in=0x9b917b398 avail_in=0 
> > > > > > next_out=0x9b917c47a avail_out=4910 total_in=3072 total_out=40319 
> > > > > > crc/adler=f1a70cdc rc=0
> > > > > > 2017-10-23T12:55:31.645879+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0] inflate:   flush=1 next_in=0x9b917b398 avail_in=0 
> > > > > > next_out=0x9b917b3b8 avail_out=9200 total_in=3072 total_out=40319 
> > > > > > crc/adler=f1a70cdc
> > > > > > 2017-10-23T12:55:31.646199+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0]            flush=1 next_in=0x9b917b398 avail_in=0 
> > > > > > next_out=0x9b917b3b8 avail_out=9200 total_in=3072 total_out=40319 
> > > > > > crc/adler=f1a70cdc rc=-5
> > > > > > 2017-10-23T12:55:31.646511+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0] inflate:   flush=1 next_in=0x9b917b198 avail_in=512 
> > > > > > next_out=0x9b917b3b8 avail_out=9200 total_in=3072 total_out=40319 
> > > > > > crc/adler=f1a70cdc
> > > > > > 2017-10-23T12:55:31.646847+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0]            flush=1 next_in=0x9b917b398 avail_in=0 
> > > > > > next_out=0x9b917c272 avail_out=5430 total_in=3584 total_out=44089 
> > > > > > crc/adler=8dba79f4 rc=0
> > > > > > 2017-10-23T12:55:31.647166+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0] inflate:   flush=1 next_in=0x9b917b398 avail_in=0 
> > > > > > next_out=0x9b917b3b8 avail_out=9200 total_in=3584 total_out=44089 
> > > > > > crc/adler=8dba79f4
> > > > > > 2017-10-23T12:55:31.647490+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0]            flush=1 next_in=0x9b917b398 avail_in=0 
> > > > > > next_out=0x9b917b3b8 avail_out=9200 total_in=3584 total_out=44089 
> > > > > > crc/adler=8dba79f4 rc=-5
> > > > > >
> > > > > > Happy to send across the datafile I'm using as an example, please 
> > > > > > let me know if you need any other info.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > Martyn
> > > > > >
> > > > > > Martyn Hunt
> > > > > > Technical Lead, Mainframe
> > > > > > Met Office  FitzRoy Road  Exeter  Devon  EX1 3PB  United Kingdom
> > > > > > Tel: +44 (0)1392 884897
> > > > > > Email:
> > > > > > address@hidden<mailto:address@hidden>
> > > > > > Website: www.metoffice.gov.uk<http://www.metoffice.gov.uk>
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > > =Dennis Heimbigner
> > > > > Unidata
> > > > >
> > > > >
> > > > > Ticket Details
> > > > > ===================
> > > > > Ticket ID: ZIZ-818863
> > > > > Department: Support THREDDS
> > > > > Priority: Normal
> > > > > Status: Open
> > > > > ===================
> > > > > NOTE: All email exchanges with Unidata User Support are recorded in 
> > > > > the Unidata inquiry tracking system and then made publicly available 
> > > > > through the web.  If you do not want to have your interactions made 
> > > > > available in this way, you must let us know in each email you send to 
> > > > > us.
> > > > >
> > > > >
> > > > >
> > > >
> > > > =Dennis Heimbigner
> > > > Unidata
> > > >
> > > >
> > > > Ticket Details
> > > > ===================
> > > > Ticket ID: ZIZ-818863
> > > > Department: Support THREDDS
> > > > Priority: Normal
> > > > Status: Open
> > > > ===================
> > > > NOTE: All email exchanges with Unidata User Support are recorded in the 
> > > > Unidata inquiry tracking system and then made publicly available 
> > > > through the web.  If you do not want to have your interactions made 
> > > > available in this way, you must let us know in each email you send to 
> > > > us.
> > > >
> > > >
> > > >
> > >
> > > =Dennis Heimbigner
> > > Unidata
> > >
> > >
> > > Ticket Details
> > > ===================
> > > Ticket ID: ZIZ-818863
> > > Department: Support THREDDS
> > > Priority: Normal
> > > Status: Closed
> > > ===================
> > > NOTE: All email exchanges with Unidata User Support are recorded in the 
> > > Unidata inquiry tracking system and then made publicly available through 
> > > the web.  If you do not want to have your interactions made available in 
> > > this way, you must let us know in each email you send to us.
> > >
> > >
> > >
> >
> >
> > > Your description does appear to be what is happening, what I really don't 
> > > understand is how the data which has been compressed and chunked 
> > > elsewhere at a large chuinksize can be decompressed in smaller blocks 
> > > than the chunksize.
> > >
> > > As I mentioned before, I suspect it is something in H5tiledLayoutBB.java 
> > > (https://github.com/Unidata/thredds/blob/b731bcb45b6e10b7e6102e97a9ef35e9fef43c93/cdm/src/main/java/ucar/nc2/iosp/hdf5/H5tiledLayoutBB.java),
> > >  which has a debug flag available to be set, which on line 285 would 
> > > start printing out the "bytes in, bytes out" that the HDF5 code think is 
> > > being sent to java.util.zip.  Is there a parameter in the Thredds log 
> > > configuration to set this on, so we can see another piece of the puzzle?
> > >
> > > Thanks
> > >
> > > Martyn
> > >
> > > Martyn Hunt
> > > Technical Lead, Mainframe
> > > Met Office  FitzRoy Road  Exeter  Devon  EX1 3PB  United Kingdom
> > > Tel: +44 (0)1392 884897
> > > Email: address@hidden  Website: www.metoffice.gov.uk
> > >
> > > -----Original Message-----
> > > From: Unidata THREDDS Support [mailto:address@hidden]
> > > Sent: 25 October 2017 20:24
> > > To: Hunt, Martyn <address@hidden>
> > > Cc: address@hidden
> > > Subject: [THREDDS #ZIZ-818863]: Thredds inflation of data
> > >
> > > Ok, now I see.
> > > As a rule, HDF5/netcdf-4 decompression operates on a chunk at a time 
> > > (where chunk is the chunking parameters associated with the file). Do you 
> > > know what the chunking parameters are for one of your files? You can see 
> > > it by using the ncdump command in the c-library
> > > ncdump -hs <filename>
> > >
> > > My speculation is this:
> > > 1. You are using the pure-Java HDF5 reader code in Thredds
> > > (this is the default for netcdf-4 files).
> > > 2. The pure java HDF5 reader is either using a different
> > > implementation of zip or is breaking up the incoming chunk
> > > into smaller pieces and decompressing those smaller chunks.
> > > I will undertake to see which case in #2 (if either) is being used.
> > > Any additional insight you have would be appreciated.
> > >
> > >
> > > > Our dataflow is:
> > > >
> > > > 1 - HPC Produces Chunked Compressed NetCDF Data
> > > > 2 - HPC FTPs data to our Thredds Systems
> > > > 3 - Downstream client systems request the data from Thredds from the 
> > > > opendap interface.  They expect to retrieve uncompressed data, and 
> > > > request a sinlge chunk at a time.
> > > > 4 - Thredds reads the chunk, uncompresses it, and then sends it using 
> > > > the opendap protocol.
> > > >
> > > > It is step 4 where I want the hardware to be used, and I can force it 
> > > > to be used by reducing the limit on the smallest decompress that is 
> > > > passed to the hardware, however as the data Thredds is currently 
> > > > passing is 512 bytes rather than the recommended minimum for the 
> > > > hardware of 16384 bytes, performance is awful.
> > > >
> > > > As our client systems only ever request a full chunk at a time (which 
> > > > is always ~7MB of data when uncompressed), the behaviour I was looking 
> > > > for is that Thredds will read a single chunk from disk (between 1MB and 
> > > > 2MB depending on the data and compression level), pass that whole chunk 
> > > > as one to java.util.zip (or at least more than 16384 bytes at a time, 
> > > > the bigger the better), where the hardware will take over and inflate 
> > > > the data.  The hardware and java.util.zip then return the uncompressed 
> > > > data to Thredds/Opendap which then return it to the client system.
> > > >
> > > > Thanks
> > > >
> > > > Martyn
> > > >
> > > > Martyn Hunt
> > > > Technical Lead, Mainframe
> > > > Met Office  FitzRoy Road  Exeter  Devon  EX1 3PB  United Kingdom
> > > > Tel: +44 (0)1392 884897
> > > > Email: address@hidden  Website: www.metoffice.gov.uk
> > > >
> > > > -----Original Message-----
> > > > From: Unidata THREDDS Support
> > > > [mailto:address@hidden]
> > > > Sent: 24 October 2017 20:22
> > > > To: Hunt, Martyn <address@hidden>
> > > > Cc: address@hidden
> > > > Subject: [THREDDS #ZIZ-818863]: Thredds inflation of data
> > > >
> > > > My mistake, I though you were talking about http level chunking and 
> > > > compression.
> > > > I thought this because of the logs you sent me.
> > > > But I am confused about where you want to use your hardware.  Is your 
> > > > plan to use it to decompress the file on the server before transmitting 
> > > > it using the opendap protocol? As a reminder, the file will be 
> > > > decompressed before translating it into the opendap format in order to 
> > > > pass it over the http connection.
> > > > Can you elaborate on how you ideally want that special hardware to be 
> > > > used?
> > > >
> > > >
> > > >
> > > > > That’s not what I understood from the HDF5 doc 
> > > > > (https://support.hdfgroup.org/HDF5/doc/Advanced/Chunking/):
> > > > >
> > > > > "Dataset chunking also enables the use of I/O filters, including 
> > > > > compression. The filters are applied to each chunk individually, and 
> > > > > the entire chunk is processed at once."
> > > > >
> > > > > Note that I am only talking about reading and serving a compressed 
> > > > > netcdf file here, and not about the apache/tomcat compression for 
> > > > > data transfer, i.e. the problem I have is likely somewhere in here:
> > > > >
> > > > > https://github.com/Unidata/thredds/blob/b731bcb45b6e10b7e6102e97a9ef
> > > > > 35
> > > > > e9fef43c93/cdm/src/main/java/ucar/nc2/iosp/hdf5/H5tiledLayoutBB.java
> > > > >
> > > > >
> > > > > Martyn Hunt
> > > > > Technical Lead, Mainframe
> > > > > Met Office  FitzRoy Road  Exeter  Devon  EX1 3PB  United Kingdom
> > > > > Tel: +44 (0)1392 884897
> > > > > Email: address@hidden  Website: www.metoffice.gov.uk
> > > > >
> > > > > -----Original Message-----
> > > > > From: Unidata THREDDS Support
> > > > > [mailto:address@hidden]
> > > > > Sent: 23 October 2017 18:34
> > > > > To: Hunt, Martyn <address@hidden>
> > > > > Cc: address@hidden
> > > > > Subject: [THREDDS #ZIZ-818863]: Thredds inflation of data
> > > > >
> > > > > We use the Apache Httpclient system
> > > > > (http://hc.apache.org/httpcomponents-client-4.5.x/)
> > > > > so the fix will need to be with respect to that.
> > > > >
> > > > > My speculation is that there are two related issues that need 
> > > > > investigation.
> > > > > 1. chunking - 1 large response is chunked into multiple smaller
> > > > > chunks on the server side that are then reassembled on the client 
> > > > > side.
> > > > > 2. A specific compressor -- GzipCompressingEntity, I think -- is
> > > > > used to do the actual compression on the server side.
> > > > >
> > > > > I do not know the order in which these are used by the server side. 
> > > > > It is possible that the compressor operates first and then the 
> > > > > chunker divides that compressed output.
> > > > > It is also possible that the chunker is first and that the compressor 
> > > > > operates on each separate chunk.
> > > > >
> > > > > We will need to investigate to see which is the case (if either) and 
> > > > > then figure out how to change the chunking and/or the compression 
> > > > > parameters.  I suspect that sending very large (1.6m) chunks is a bad 
> > > > > idea. So, I would hope we set things up so that the compression is 
> > > > > first and the compressed data is then chunked.
> > > > > Note that this will also require a corresponding change on the client 
> > > > > side.
> > > > >
> > > > > In any case, this is going to take a while for me to figure it out.
> > > > >
> > > > >
> > > > > =======================
> > > > > > I am currently trying to get a compression accelerator to work with 
> > > > > > Thredds, with the aim to reduce the CPU time that Thredds spends 
> > > > > > decompressing chunks of data.  The compression card plugs straight 
> > > > > > in to IBM Java8 and java.util.zip with no changes needed to the 
> > > > > > application that uses it.  However, the replacement code with 
> > > > > > always revert to software inflation when the size of the data 
> > > > > > passed to java.util.zip is less than 16384 bytes.
> > > > > >
> > > > > > Our data is chunked and compressed, with the data we want to 
> > > > > > retrieve ending up as 70  ~1.6MB chunks (compressed), which should 
> > > > > > inflate to ~7MB each (see the hdls.txt file for more detail).
> > > > > >
> > > > > > When requesting the data, I use the following URL to request a 
> > > > > > single chunk (our applications run through selecting each of the 70 
> > > > > > chunks sequentially one at a time, in this example I'm just picking 
> > > > > > one chunk.
> > > > > >
> > > > > > http://dvtds02-zvopaph2:8080/thredds/dodsC/decoupler/mhtest/origin
> > > > > > al
> > > > > > .n
> > > > > > c.ascii?air_temperature[0][43][0:1:1151][0:1:1535<http://dvtds02-z
> > > > > > vo
> > > > > > pa
> > > > > > ph2:8080/thredds/dodsC/decoupler/mhtest/original.nc.ascii?air_temp
> > > > > > er at ure%5b0%5d%5b43%5d%5b0:1:1151%5d%5b0:1:1535>]
> > > > > >
> > > > > > While I expect there may be a few smaller inflate operations at the 
> > > > > > start/end of the request, I'd expect that there would be a single 
> > > > > > 1.6MB --> 7MB inflate request in there.  Instead of this, in the 
> > > > > > compression software logs, I see 1000's of 512byte inflate 
> > > > > > requests, which as they are smaller than the min 16384byte limit 
> > > > > > the compression card has, never get passed to the compression card.
> > > > > >
> > > > > > e.g.
> > > > > >
> > > > > > 2017-10-23T12:55:31.643894+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0] inflate:   flush=1 next_in=0x9b917b198 avail_in=512 
> > > > > > next_out=0x9b917b3b8 avail_out=9200 total_in=2048 total_out=31827 
> > > > > > crc/adler=1b38b342
> > > > > > 2017-10-23T12:55:31.644229+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0]            flush=1 next_in=0x9b917b398 avail_in=0 
> > > > > > next_out=0x9b917c422 avail_out=4998 total_in=2560 total_out=36029 
> > > > > > crc/adler=e835c747 rc=0
> > > > > > 2017-10-23T12:55:31.644541+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0] inflate:   flush=1 next_in=0x9b917b398 avail_in=0 
> > > > > > next_out=0x9b917b3b8 avail_out=9200 total_in=2560 total_out=36029 
> > > > > > crc/adler=e835c747
> > > > > > 2017-10-23T12:55:31.644909+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0]            flush=1 next_in=0x9b917b398 avail_in=0 
> > > > > > next_out=0x9b917b3b8 avail_out=9200 total_in=2560 total_out=36029 
> > > > > > crc/adler=e835c747 rc=-5
> > > > > > 2017-10-23T12:55:31.645234+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0] inflate:   flush=1 next_in=0x9b917b198 avail_in=512 
> > > > > > next_out=0x9b917b3b8 avail_out=9200 total_in=2560 total_out=36029 
> > > > > > crc/adler=e835c747
> > > > > > 2017-10-23T12:55:31.645568+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0]            flush=1 next_in=0x9b917b398 avail_in=0 
> > > > > > next_out=0x9b917c47a avail_out=4910 total_in=3072 total_out=40319 
> > > > > > crc/adler=f1a70cdc rc=0
> > > > > > 2017-10-23T12:55:31.645879+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0] inflate:   flush=1 next_in=0x9b917b398 avail_in=0 
> > > > > > next_out=0x9b917b3b8 avail_out=9200 total_in=3072 total_out=40319 
> > > > > > crc/adler=f1a70cdc
> > > > > > 2017-10-23T12:55:31.646199+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0]            flush=1 next_in=0x9b917b398 avail_in=0 
> > > > > > next_out=0x9b917b3b8 avail_out=9200 total_in=3072 total_out=40319 
> > > > > > crc/adler=f1a70cdc rc=-5
> > > > > > 2017-10-23T12:55:31.646511+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0] inflate:   flush=1 next_in=0x9b917b198 avail_in=512 
> > > > > > next_out=0x9b917b3b8 avail_out=9200 total_in=3072 total_out=40319 
> > > > > > crc/adler=f1a70cdc
> > > > > > 2017-10-23T12:55:31.646847+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0]            flush=1 next_in=0x9b917b398 avail_in=0 
> > > > > > next_out=0x9b917c272 avail_out=5430 total_in=3584 total_out=44089 
> > > > > > crc/adler=8dba79f4 rc=0
> > > > > > 2017-10-23T12:55:31.647166+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0] inflate:   flush=1 next_in=0x9b917b398 avail_in=0 
> > > > > > next_out=0x9b917b3b8 avail_out=9200 total_in=3584 total_out=44089 
> > > > > > crc/adler=8dba79f4
> > > > > > 2017-10-23T12:55:31.647490+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0]            flush=1 next_in=0x9b917b398 avail_in=0 
> > > > > > next_out=0x9b917b3b8 avail_out=9200 total_in=3584 total_out=44089 
> > > > > > crc/adler=8dba79f4 rc=-5
> > > > > >
> > > > > > Happy to send across the datafile I'm using as an example, please 
> > > > > > let me know if you need any other info.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > Martyn
> > > > > >
> > > > > > Martyn Hunt
> > > > > > Technical Lead, Mainframe
> > > > > > Met Office  FitzRoy Road  Exeter  Devon  EX1 3PB  United Kingdom
> > > > > > Tel: +44 (0)1392 884897
> > > > > > Email:
> > > > > > address@hidden<mailto:address@hidden>
> > > > > > Website: www.metoffice.gov.uk<http://www.metoffice.gov.uk>
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > > =Dennis Heimbigner
> > > > > Unidata
> > > > >
> > > > >
> > > > > Ticket Details
> > > > > ===================
> > > > > Ticket ID: ZIZ-818863
> > > > > Department: Support THREDDS
> > > > > Priority: Normal
> > > > > Status: Open
> > > > > ===================
> > > > > NOTE: All email exchanges with Unidata User Support are recorded in 
> > > > > the Unidata inquiry tracking system and then made publicly available 
> > > > > through the web.  If you do not want to have your interactions made 
> > > > > available in this way, you must let us know in each email you send to 
> > > > > us.
> > > > >
> > > > >
> > > > >
> > > >
> > > > =Dennis Heimbigner
> > > > Unidata
> > > >
> > > >
> > > > Ticket Details
> > > > ===================
> > > > Ticket ID: ZIZ-818863
> > > > Department: Support THREDDS
> > > > Priority: Normal
> > > > Status: Open
> > > > ===================
> > > > NOTE: All email exchanges with Unidata User Support are recorded in the 
> > > > Unidata inquiry tracking system and then made publicly available 
> > > > through the web.  If you do not want to have your interactions made 
> > > > available in this way, you must let us know in each email you send to 
> > > > us.
> > > >
> > > >
> > > >
> > >
> > > =Dennis Heimbigner
> > > Unidata
> > >
> > >
> > > Ticket Details
> > > ===================
> > > Ticket ID: ZIZ-818863
> > > Department: Support THREDDS
> > > Priority: Normal
> > > Status: Closed
> > > ===================
> > > NOTE: All email exchanges with Unidata User Support are recorded in the 
> > > Unidata inquiry tracking system and then made publicly available through 
> > > the web.  If you do not want to have your interactions made available in 
> > > this way, you must let us know in each email you send to us.
> > >
> > >
> > >
> >
> >
> >
> > > Your description does appear to be what is happening, what I really don't 
> > > understand is how the data which has been compressed and chunked 
> > > elsewhere at a large chuinksize can be decompressed in smaller blocks 
> > > than the chunksize.
> > >
> > > As I mentioned before, I suspect it is something in H5tiledLayoutBB.java 
> > > (https://github.com/Unidata/thredds/blob/b731bcb45b6e10b7e6102e97a9ef35e9fef43c93/cdm/src/main/java/ucar/nc2/iosp/hdf5/H5tiledLayoutBB.java),
> > >  which has a debug flag available to be set, which on line 285 would 
> > > start printing out the "bytes in, bytes out" that the HDF5 code think is 
> > > being sent to java.util.zip.  Is there a parameter in the Thredds log 
> > > configuration to set this on, so we can see another piece of the puzzle?
> > >
> > > Thanks
> > >
> > > Martyn
> > >
> > > Martyn Hunt
> > > Technical Lead, Mainframe
> > > Met Office  FitzRoy Road  Exeter  Devon  EX1 3PB  United Kingdom
> > > Tel: +44 (0)1392 884897
> > > Email: address@hidden  Website: www.metoffice.gov.uk
> > >
> > > -----Original Message-----
> > > From: Unidata THREDDS Support [mailto:address@hidden]
> > > Sent: 25 October 2017 20:24
> > > To: Hunt, Martyn <address@hidden>
> > > Cc: address@hidden
> > > Subject: [THREDDS #ZIZ-818863]: Thredds inflation of data
> > >
> > > Ok, now I see.
> > > As a rule, HDF5/netcdf-4 decompression operates on a chunk at a time 
> > > (where chunk is the chunking parameters associated with the file). Do you 
> > > know what the chunking parameters are for one of your files? You can see 
> > > it by using the ncdump command in the c-library
> > > ncdump -hs <filename>
> > >
> > > My speculation is this:
> > > 1. You are using the pure-Java HDF5 reader code in Thredds
> > > (this is the default for netcdf-4 files).
> > > 2. The pure java HDF5 reader is either using a different
> > > implementation of zip or is breaking up the incoming chunk
> > > into smaller pieces and decompressing those smaller chunks.
> > > I will undertake to see which case in #2 (if either) is being used.
> > > Any additional insight you have would be appreciated.
> > >
> > >
> > > > Our dataflow is:
> > > >
> > > > 1 - HPC Produces Chunked Compressed NetCDF Data
> > > > 2 - HPC FTPs data to our Thredds Systems
> > > > 3 - Downstream client systems request the data from Thredds from the 
> > > > opendap interface.  They expect to retrieve uncompressed data, and 
> > > > request a sinlge chunk at a time.
> > > > 4 - Thredds reads the chunk, uncompresses it, and then sends it using 
> > > > the opendap protocol.
> > > >
> > > > It is step 4 where I want the hardware to be used, and I can force it 
> > > > to be used by reducing the limit on the smallest decompress that is 
> > > > passed to the hardware, however as the data Thredds is currently 
> > > > passing is 512 bytes rather than the recommended minimum for the 
> > > > hardware of 16384 bytes, performance is awful.
> > > >
> > > > As our client systems only ever request a full chunk at a time (which 
> > > > is always ~7MB of data when uncompressed), the behaviour I was looking 
> > > > for is that Thredds will read a single chunk from disk (between 1MB and 
> > > > 2MB depending on the data and compression level), pass that whole chunk 
> > > > as one to java.util.zip (or at least more than 16384 bytes at a time, 
> > > > the bigger the better), where the hardware will take over and inflate 
> > > > the data.  The hardware and java.util.zip then return the uncompressed 
> > > > data to Thredds/Opendap which then return it to the client system.
> > > >
> > > > Thanks
> > > >
> > > > Martyn
> > > >
> > > > Martyn Hunt
> > > > Technical Lead, Mainframe
> > > > Met Office  FitzRoy Road  Exeter  Devon  EX1 3PB  United Kingdom
> > > > Tel: +44 (0)1392 884897
> > > > Email: address@hidden  Website: www.metoffice.gov.uk
> > > >
> > > > -----Original Message-----
> > > > From: Unidata THREDDS Support
> > > > [mailto:address@hidden]
> > > > Sent: 24 October 2017 20:22
> > > > To: Hunt, Martyn <address@hidden>
> > > > Cc: address@hidden
> > > > Subject: [THREDDS #ZIZ-818863]: Thredds inflation of data
> > > >
> > > > My mistake, I though you were talking about http level chunking and 
> > > > compression.
> > > > I thought this because of the logs you sent me.
> > > > But I am confused about where you want to use your hardware.  Is your 
> > > > plan to use it to decompress the file on the server before transmitting 
> > > > it using the opendap protocol? As a reminder, the file will be 
> > > > decompressed before translating it into the opendap format in order to 
> > > > pass it over the http connection.
> > > > Can you elaborate on how you ideally want that special hardware to be 
> > > > used?
> > > >
> > > >
> > > >
> > > > > That’s not what I understood from the HDF5 doc 
> > > > > (https://support.hdfgroup.org/HDF5/doc/Advanced/Chunking/):
> > > > >
> > > > > "Dataset chunking also enables the use of I/O filters, including 
> > > > > compression. The filters are applied to each chunk individually, and 
> > > > > the entire chunk is processed at once."
> > > > >
> > > > > Note that I am only talking about reading and serving a compressed 
> > > > > netcdf file here, and not about the apache/tomcat compression for 
> > > > > data transfer, i.e. the problem I have is likely somewhere in here:
> > > > >
> > > > > https://github.com/Unidata/thredds/blob/b731bcb45b6e10b7e6102e97a9ef
> > > > > 35
> > > > > e9fef43c93/cdm/src/main/java/ucar/nc2/iosp/hdf5/H5tiledLayoutBB.java
> > > > >
> > > > >
> > > > > Martyn Hunt
> > > > > Technical Lead, Mainframe
> > > > > Met Office  FitzRoy Road  Exeter  Devon  EX1 3PB  United Kingdom
> > > > > Tel: +44 (0)1392 884897
> > > > > Email: address@hidden  Website: www.metoffice.gov.uk
> > > > >
> > > > > -----Original Message-----
> > > > > From: Unidata THREDDS Support
> > > > > [mailto:address@hidden]
> > > > > Sent: 23 October 2017 18:34
> > > > > To: Hunt, Martyn <address@hidden>
> > > > > Cc: address@hidden
> > > > > Subject: [THREDDS #ZIZ-818863]: Thredds inflation of data
> > > > >
> > > > > We use the Apache Httpclient system
> > > > > (http://hc.apache.org/httpcomponents-client-4.5.x/)
> > > > > so the fix will need to be with respect to that.
> > > > >
> > > > > My speculation is that there are two related issues that need 
> > > > > investigation.
> > > > > 1. chunking - 1 large response is chunked into multiple smaller
> > > > > chunks on the server side that are then reassembled on the client 
> > > > > side.
> > > > > 2. A specific compressor -- GzipCompressingEntity, I think -- is
> > > > > used to do the actual compression on the server side.
> > > > >
> > > > > I do not know the order in which these are used by the server side. 
> > > > > It is possible that the compressor operates first and then the 
> > > > > chunker divides that compressed output.
> > > > > It is also possible that the chunker is first and that the compressor 
> > > > > operates on each separate chunk.
> > > > >
> > > > > We will need to investigate to see which is the case (if either) and 
> > > > > then figure out how to change the chunking and/or the compression 
> > > > > parameters.  I suspect that sending very large (1.6m) chunks is a bad 
> > > > > idea. So, I would hope we set things up so that the compression is 
> > > > > first and the compressed data is then chunked.
> > > > > Note that this will also require a corresponding change on the client 
> > > > > side.
> > > > >
> > > > > In any case, this is going to take a while for me to figure it out.
> > > > >
> > > > >
> > > > > =======================
> > > > > > I am currently trying to get a compression accelerator to work with 
> > > > > > Thredds, with the aim to reduce the CPU time that Thredds spends 
> > > > > > decompressing chunks of data.  The compression card plugs straight 
> > > > > > in to IBM Java8 and java.util.zip with no changes needed to the 
> > > > > > application that uses it.  However, the replacement code with 
> > > > > > always revert to software inflation when the size of the data 
> > > > > > passed to java.util.zip is less than 16384 bytes.
> > > > > >
> > > > > > Our data is chunked and compressed, with the data we want to 
> > > > > > retrieve ending up as 70  ~1.6MB chunks (compressed), which should 
> > > > > > inflate to ~7MB each (see the hdls.txt file for more detail).
> > > > > >
> > > > > > When requesting the data, I use the following URL to request a 
> > > > > > single chunk (our applications run through selecting each of the 70 
> > > > > > chunks sequentially one at a time, in this example I'm just picking 
> > > > > > one chunk.
> > > > > >
> > > > > > http://dvtds02-zvopaph2:8080/thredds/dodsC/decoupler/mhtest/origin
> > > > > > al
> > > > > > .n
> > > > > > c.ascii?air_temperature[0][43][0:1:1151][0:1:1535<http://dvtds02-z
> > > > > > vo
> > > > > > pa
> > > > > > ph2:8080/thredds/dodsC/decoupler/mhtest/original.nc.ascii?air_temp
> > > > > > er at ure%5b0%5d%5b43%5d%5b0:1:1151%5d%5b0:1:1535>]
> > > > > >
> > > > > > While I expect there may be a few smaller inflate operations at the 
> > > > > > start/end of the request, I'd expect that there would be a single 
> > > > > > 1.6MB --> 7MB inflate request in there.  Instead of this, in the 
> > > > > > compression software logs, I see 1000's of 512byte inflate 
> > > > > > requests, which as they are smaller than the min 16384byte limit 
> > > > > > the compression card has, never get passed to the compression card.
> > > > > >
> > > > > > e.g.
> > > > > >
> > > > > > 2017-10-23T12:55:31.643894+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0] inflate:   flush=1 next_in=0x9b917b198 avail_in=512 
> > > > > > next_out=0x9b917b3b8 avail_out=9200 total_in=2048 total_out=31827 
> > > > > > crc/adler=1b38b342
> > > > > > 2017-10-23T12:55:31.644229+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0]            flush=1 next_in=0x9b917b398 avail_in=0 
> > > > > > next_out=0x9b917c422 avail_out=4998 total_in=2560 total_out=36029 
> > > > > > crc/adler=e835c747 rc=0
> > > > > > 2017-10-23T12:55:31.644541+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0] inflate:   flush=1 next_in=0x9b917b398 avail_in=0 
> > > > > > next_out=0x9b917b3b8 avail_out=9200 total_in=2560 total_out=36029 
> > > > > > crc/adler=e835c747
> > > > > > 2017-10-23T12:55:31.644909+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0]            flush=1 next_in=0x9b917b398 avail_in=0 
> > > > > > next_out=0x9b917b3b8 avail_out=9200 total_in=2560 total_out=36029 
> > > > > > crc/adler=e835c747 rc=-5
> > > > > > 2017-10-23T12:55:31.645234+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0] inflate:   flush=1 next_in=0x9b917b198 avail_in=512 
> > > > > > next_out=0x9b917b3b8 avail_out=9200 total_in=2560 total_out=36029 
> > > > > > crc/adler=e835c747
> > > > > > 2017-10-23T12:55:31.645568+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0]            flush=1 next_in=0x9b917b398 avail_in=0 
> > > > > > next_out=0x9b917c47a avail_out=4910 total_in=3072 total_out=40319 
> > > > > > crc/adler=f1a70cdc rc=0
> > > > > > 2017-10-23T12:55:31.645879+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0] inflate:   flush=1 next_in=0x9b917b398 avail_in=0 
> > > > > > next_out=0x9b917b3b8 avail_out=9200 total_in=3072 total_out=40319 
> > > > > > crc/adler=f1a70cdc
> > > > > > 2017-10-23T12:55:31.646199+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0]            flush=1 next_in=0x9b917b398 avail_in=0 
> > > > > > next_out=0x9b917b3b8 avail_out=9200 total_in=3072 total_out=40319 
> > > > > > crc/adler=f1a70cdc rc=-5
> > > > > > 2017-10-23T12:55:31.646511+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0] inflate:   flush=1 next_in=0x9b917b198 avail_in=512 
> > > > > > next_out=0x9b917b3b8 avail_out=9200 total_in=3072 total_out=40319 
> > > > > > crc/adler=f1a70cdc
> > > > > > 2017-10-23T12:55:31.646847+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0]            flush=1 next_in=0x9b917b398 avail_in=0 
> > > > > > next_out=0x9b917c272 avail_out=5430 total_in=3584 total_out=44089 
> > > > > > crc/adler=8dba79f4 rc=0
> > > > > > 2017-10-23T12:55:31.647166+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0] inflate:   flush=1 next_in=0x9b917b398 avail_in=0 
> > > > > > next_out=0x9b917b3b8 avail_out=9200 total_in=3584 total_out=44089 
> > > > > > crc/adler=8dba79f4
> > > > > > 2017-10-23T12:55:31.647490+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0]            flush=1 next_in=0x9b917b398 avail_in=0 
> > > > > > next_out=0x9b917b3b8 avail_out=9200 total_in=3584 total_out=44089 
> > > > > > crc/adler=8dba79f4 rc=-5
> > > > > >
> > > > > > Happy to send across the datafile I'm using as an example, please 
> > > > > > let me know if you need any other info.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > Martyn
> > > > > >
> > > > > > Martyn Hunt
> > > > > > Technical Lead, Mainframe
> > > > > > Met Office  FitzRoy Road  Exeter  Devon  EX1 3PB  United Kingdom
> > > > > > Tel: +44 (0)1392 884897
> > > > > > Email:
> > > > > > address@hidden<mailto:address@hidden>
> > > > > > Website: www.metoffice.gov.uk<http://www.metoffice.gov.uk>
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > > =Dennis Heimbigner
> > > > > Unidata
> > > > >
> > > > >
> > > > > Ticket Details
> > > > > ===================
> > > > > Ticket ID: ZIZ-818863
> > > > > Department: Support THREDDS
> > > > > Priority: Normal
> > > > > Status: Open
> > > > > ===================
> > > > > NOTE: All email exchanges with Unidata User Support are recorded in 
> > > > > the Unidata inquiry tracking system and then made publicly available 
> > > > > through the web.  If you do not want to have your interactions made 
> > > > > available in this way, you must let us know in each email you send to 
> > > > > us.
> > > > >
> > > > >
> > > > >
> > > >
> > > > =Dennis Heimbigner
> > > > Unidata
> > > >
> > > >
> > > > Ticket Details
> > > > ===================
> > > > Ticket ID: ZIZ-818863
> > > > Department: Support THREDDS
> > > > Priority: Normal
> > > > Status: Open
> > > > ===================
> > > > NOTE: All email exchanges with Unidata User Support are recorded in the 
> > > > Unidata inquiry tracking system and then made publicly available 
> > > > through the web.  If you do not want to have your interactions made 
> > > > available in this way, you must let us know in each email you send to 
> > > > us.
> > > >
> > > >
> > > >
> > >
> > > =Dennis Heimbigner
> > > Unidata
> > >
> > >
> > > Ticket Details
> > > ===================
> > > Ticket ID: ZIZ-818863
> > > Department: Support THREDDS
> > > Priority: Normal
> > > Status: Closed
> > > ===================
> > > NOTE: All email exchanges with Unidata User Support are recorded in the 
> > > Unidata inquiry tracking system and then made publicly available through 
> > > the web.  If you do not want to have your interactions made available in 
> > > this way, you must let us know in each email you send to us.
> > >
> > >
> > >
> >
> >
> > > Your description does appear to be what is happening, what I really don't 
> > > understand is how the data which has been compressed and chunked 
> > > elsewhere at a large chuinksize can be decompressed in smaller blocks 
> > > than the chunksize.
> > >
> > > As I mentioned before, I suspect it is something in H5tiledLayoutBB.java 
> > > (https://github.com/Unidata/thredds/blob/b731bcb45b6e10b7e6102e97a9ef35e9fef43c93/cdm/src/main/java/ucar/nc2/iosp/hdf5/H5tiledLayoutBB.java),
> > >  which has a debug flag available to be set, which on line 285 would 
> > > start printing out the "bytes in, bytes out" that the HDF5 code think is 
> > > being sent to java.util.zip.  Is there a parameter in the Thredds log 
> > > configuration to set this on, so we can see another piece of the puzzle?
> > >
> > > Thanks
> > >
> > > Martyn
> > >
> > > Martyn Hunt  
> > > Technical Lead, Mainframe
> > > Met Office  FitzRoy Road  Exeter  Devon  EX1 3PB  United Kingdom
> > > Tel: +44 (0)1392 884897  
> > > Email: address@hidden  Website: www.metoffice.gov.uk
> > >
> > > -----Original Message-----
> > > From: Unidata THREDDS Support [mailto:address@hidden]
> > > Sent: 25 October 2017 20:24
> > > To: Hunt, Martyn <address@hidden>
> > > Cc: address@hidden
> > > Subject: [THREDDS #ZIZ-818863]: Thredds inflation of data
> > >
> > > Ok, now I see.
> > > As a rule, HDF5/netcdf-4 decompression operates on a chunk at a time 
> > > (where chunk is the chunking parameters associated with the file). Do you 
> > > know what the chunking parameters are for one of your files? You can see 
> > > it by using the ncdump command in the c-library
> > > ncdump -hs <filename>
> > >
> > > My speculation is this:
> > > 1. You are using the pure-Java HDF5 reader code in Thredds
> > > (this is the default for netcdf-4 files).
> > > 2. The pure java HDF5 reader is either using a different
> > > implementation of zip or is breaking up the incoming chunk
> > > into smaller pieces and decompressing those smaller chunks.
> > > I will undertake to see which case in #2 (if either) is being used.
> > > Any additional insight you have would be appreciated.
> > >
> > >
> > > > Our dataflow is:
> > > >
> > > > 1 - HPC Produces Chunked Compressed NetCDF Data
> > > > 2 - HPC FTPs data to our Thredds Systems
> > > > 3 - Downstream client systems request the data from Thredds from the 
> > > > opendap interface.  They expect to retrieve uncompressed data, and 
> > > > request a sinlge chunk at a time.
> > > > 4 - Thredds reads the chunk, uncompresses it, and then sends it using 
> > > > the opendap protocol.
> > > >
> > > > It is step 4 where I want the hardware to be used, and I can force it 
> > > > to be used by reducing the limit on the smallest decompress that is 
> > > > passed to the hardware, however as the data Thredds is currently 
> > > > passing is 512 bytes rather than the recommended minimum for the 
> > > > hardware of 16384 bytes, performance is awful.
> > > >
> > > > As our client systems only ever request a full chunk at a time (which 
> > > > is always ~7MB of data when uncompressed), the behaviour I was looking 
> > > > for is that Thredds will read a single chunk from disk (between 1MB and 
> > > > 2MB depending on the data and compression level), pass that whole chunk 
> > > > as one to java.util.zip (or at least more than 16384 bytes at a time, 
> > > > the bigger the better), where the hardware will take over and inflate 
> > > > the data.  The hardware and java.util.zip then return the uncompressed 
> > > > data to Thredds/Opendap which then return it to the client system.
> > > >
> > > > Thanks
> > > >
> > > > Martyn
> > > >
> > > > Martyn Hunt
> > > > Technical Lead, Mainframe
> > > > Met Office  FitzRoy Road  Exeter  Devon  EX1 3PB  United Kingdom
> > > > Tel: +44 (0)1392 884897
> > > > Email: address@hidden  Website: www.metoffice.gov.uk
> > > >
> > > > -----Original Message-----
> > > > From: Unidata THREDDS Support
> > > > [mailto:address@hidden]
> > > > Sent: 24 October 2017 20:22
> > > > To: Hunt, Martyn <address@hidden>
> > > > Cc: address@hidden
> > > > Subject: [THREDDS #ZIZ-818863]: Thredds inflation of data
> > > >
> > > > My mistake, I though you were talking about http level chunking and 
> > > > compression.
> > > > I thought this because of the logs you sent me.
> > > > But I am confused about where you want to use your hardware.  Is your 
> > > > plan to use it to decompress the file on the server before transmitting 
> > > > it using the opendap protocol? As a reminder, the file will be 
> > > > decompressed before translating it into the opendap format in order to 
> > > > pass it over the http connection.
> > > > Can you elaborate on how you ideally want that special hardware to be 
> > > > used?
> > > >
> > > >
> > > >
> > > > > That’s not what I understood from the HDF5 doc 
> > > > > (https://support.hdfgroup.org/HDF5/doc/Advanced/Chunking/):
> > > > >
> > > > > "Dataset chunking also enables the use of I/O filters, including 
> > > > > compression. The filters are applied to each chunk individually, and 
> > > > > the entire chunk is processed at once."
> > > > >
> > > > > Note that I am only talking about reading and serving a compressed 
> > > > > netcdf file here, and not about the apache/tomcat compression for 
> > > > > data transfer, i.e. the problem I have is likely somewhere in here:
> > > > >
> > > > > https://github.com/Unidata/thredds/blob/b731bcb45b6e10b7e6102e97a9ef
> > > > > 35
> > > > > e9fef43c93/cdm/src/main/java/ucar/nc2/iosp/hdf5/H5tiledLayoutBB.java
> > > > >
> > > > >
> > > > > Martyn Hunt
> > > > > Technical Lead, Mainframe
> > > > > Met Office  FitzRoy Road  Exeter  Devon  EX1 3PB  United Kingdom
> > > > > Tel: +44 (0)1392 884897
> > > > > Email: address@hidden  Website: www.metoffice.gov.uk
> > > > >
> > > > > -----Original Message-----
> > > > > From: Unidata THREDDS Support
> > > > > [mailto:address@hidden]
> > > > > Sent: 23 October 2017 18:34
> > > > > To: Hunt, Martyn <address@hidden>
> > > > > Cc: address@hidden
> > > > > Subject: [THREDDS #ZIZ-818863]: Thredds inflation of data
> > > > >
> > > > > We use the Apache Httpclient system
> > > > > (http://hc.apache.org/httpcomponents-client-4.5.x/)
> > > > > so the fix will need to be with respect to that.
> > > > >
> > > > > My speculation is that there are two related issues that need 
> > > > > investigation.
> > > > > 1. chunking - 1 large response is chunked into multiple smaller
> > > > > chunks on the server side that are then reassembled on the client 
> > > > > side.
> > > > > 2. A specific compressor -- GzipCompressingEntity, I think -- is
> > > > > used to do the actual compression on the server side.
> > > > >
> > > > > I do not know the order in which these are used by the server side. 
> > > > > It is possible that the compressor operates first and then the 
> > > > > chunker divides that compressed output.
> > > > > It is also possible that the chunker is first and that the compressor 
> > > > > operates on each separate chunk.
> > > > >
> > > > > We will need to investigate to see which is the case (if either) and 
> > > > > then figure out how to change the chunking and/or the compression 
> > > > > parameters.  I suspect that sending very large (1.6m) chunks is a bad 
> > > > > idea. So, I would hope we set things up so that the compression is 
> > > > > first and the compressed data is then chunked.
> > > > > Note that this will also require a corresponding change on the client 
> > > > > side.
> > > > >
> > > > > In any case, this is going to take a while for me to figure it out.
> > > > >
> > > > >
> > > > > =======================
> > > > > > I am currently trying to get a compression accelerator to work with 
> > > > > > Thredds, with the aim to reduce the CPU time that Thredds spends 
> > > > > > decompressing chunks of data.  The compression card plugs straight 
> > > > > > in to IBM Java8 and java.util.zip with no changes needed to the 
> > > > > > application that uses it.  However, the replacement code with 
> > > > > > always revert to software inflation when the size of the data 
> > > > > > passed to java.util.zip is less than 16384 bytes.
> > > > > >
> > > > > > Our data is chunked and compressed, with the data we want to 
> > > > > > retrieve ending up as 70  ~1.6MB chunks (compressed), which should 
> > > > > > inflate to ~7MB each (see the hdls.txt file for more detail).
> > > > > >
> > > > > > When requesting the data, I use the following URL to request a 
> > > > > > single chunk (our applications run through selecting each of the 70 
> > > > > > chunks sequentially one at a time, in this example I'm just picking 
> > > > > > one chunk.
> > > > > >
> > > > > > http://dvtds02-zvopaph2:8080/thredds/dodsC/decoupler/mhtest/origin
> > > > > > al
> > > > > > .n
> > > > > > c.ascii?air_temperature[0][43][0:1:1151][0:1:1535<http://dvtds02-z
> > > > > > vo
> > > > > > pa
> > > > > > ph2:8080/thredds/dodsC/decoupler/mhtest/original.nc.ascii?air_temp
> > > > > > er at ure%5b0%5d%5b43%5d%5b0:1:1151%5d%5b0:1:1535>]
> > > > > >
> > > > > > While I expect there may be a few smaller inflate operations at the 
> > > > > > start/end of the request, I'd expect that there would be a single 
> > > > > > 1.6MB --> 7MB inflate request in there.  Instead of this, in the 
> > > > > > compression software logs, I see 1000's of 512byte inflate 
> > > > > > requests, which as they are smaller than the min 16384byte limit 
> > > > > > the compression card has, never get passed to the compression card.
> > > > > >
> > > > > > e.g.
> > > > > >
> > > > > > 2017-10-23T12:55:31.643894+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0] inflate:   flush=1 next_in=0x9b917b198 avail_in=512 
> > > > > > next_out=0x9b917b3b8 avail_out=9200 total_in=2048 total_out=31827 
> > > > > > crc/adler=1b38b342
> > > > > > 2017-10-23T12:55:31.644229+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0]            flush=1 next_in=0x9b917b398 avail_in=0 
> > > > > > next_out=0x9b917c422 avail_out=4998 total_in=2560 total_out=36029 
> > > > > > crc/adler=e835c747 rc=0
> > > > > > 2017-10-23T12:55:31.644541+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0] inflate:   flush=1 next_in=0x9b917b398 avail_in=0 
> > > > > > next_out=0x9b917b3b8 avail_out=9200 total_in=2560 total_out=36029 
> > > > > > crc/adler=e835c747
> > > > > > 2017-10-23T12:55:31.644909+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0]            flush=1 next_in=0x9b917b398 avail_in=0 
> > > > > > next_out=0x9b917b3b8 avail_out=9200 total_in=2560 total_out=36029 
> > > > > > crc/adler=e835c747 rc=-5
> > > > > > 2017-10-23T12:55:31.645234+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0] inflate:   flush=1 next_in=0x9b917b198 avail_in=512 
> > > > > > next_out=0x9b917b3b8 avail_out=9200 total_in=2560 total_out=36029 
> > > > > > crc/adler=e835c747
> > > > > > 2017-10-23T12:55:31.645568+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0]            flush=1 next_in=0x9b917b398 avail_in=0 
> > > > > > next_out=0x9b917c47a avail_out=4910 total_in=3072 total_out=40319 
> > > > > > crc/adler=f1a70cdc rc=0
> > > > > > 2017-10-23T12:55:31.645879+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0] inflate:   flush=1 next_in=0x9b917b398 avail_in=0 
> > > > > > next_out=0x9b917b3b8 avail_out=9200 total_in=3072 total_out=40319 
> > > > > > crc/adler=f1a70cdc
> > > > > > 2017-10-23T12:55:31.646199+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0]            flush=1 next_in=0x9b917b398 avail_in=0 
> > > > > > next_out=0x9b917b3b8 avail_out=9200 total_in=3072 total_out=40319 
> > > > > > crc/adler=f1a70cdc rc=-5
> > > > > > 2017-10-23T12:55:31.646511+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0] inflate:   flush=1 next_in=0x9b917b198 avail_in=512 
> > > > > > next_out=0x9b917b3b8 avail_out=9200 total_in=3072 total_out=40319 
> > > > > > crc/adler=f1a70cdc
> > > > > > 2017-10-23T12:55:31.646847+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0]            flush=1 next_in=0x9b917b398 avail_in=0 
> > > > > > next_out=0x9b917c272 avail_out=5430 total_in=3584 total_out=44089 
> > > > > > crc/adler=8dba79f4 rc=0
> > > > > > 2017-10-23T12:55:31.647166+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0] inflate:   flush=1 next_in=0x9b917b398 avail_in=0 
> > > > > > next_out=0x9b917b3b8 avail_out=9200 total_in=3584 total_out=44089 
> > > > > > crc/adler=8dba79f4
> > > > > > 2017-10-23T12:55:31.647490+00:00 dvtds02-zvopaph2 server: ### 
> > > > > > [0x3ff183456a0]            flush=1 next_in=0x9b917b398 avail_in=0 
> > > > > > next_out=0x9b917b3b8 avail_out=9200 total_in=3584 total_out=44089 
> > > > > > crc/adler=8dba79f4 rc=-5
> > > > > >
> > > > > > Happy to send across the datafile I'm using as an example, please 
> > > > > > let me know if you need any other info.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > Martyn
> > > > > >
> > > > > > Martyn Hunt
> > > > > > Technical Lead, Mainframe
> > > > > > Met Office  FitzRoy Road  Exeter  Devon  EX1 3PB  United Kingdom
> > > > > > Tel: +44 (0)1392 884897
> > > > > > Email:
> > > > > > address@hidden<mailto:address@hidden>
> > > > > > Website: www.metoffice.gov.uk<http://www.metoffice.gov.uk>
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > > =Dennis Heimbigner
> > > > > Unidata
> > > > >
> > > > >
> > > > > Ticket Details
> > > > > ===================
> > > > > Ticket ID: ZIZ-818863
> > > > > Department: Support THREDDS
> > > > > Priority: Normal
> > > > > Status: Open
> > > > > ===================
> > > > > NOTE: All email exchanges with Unidata User Support are recorded in 
> > > > > the Unidata inquiry tracking system and then made publicly available 
> > > > > through the web.  If you do not want to have your interactions made 
> > > > > available in this way, you must let us know in each email you send to 
> > > > > us.
> > > > >
> > > > >
> > > > >
> > > >
> > > > =Dennis Heimbigner
> > > > Unidata
> > > >
> > > >
> > > > Ticket Details
> > > > ===================
> > > > Ticket ID: ZIZ-818863
> > > > Department: Support THREDDS
> > > > Priority: Normal
> > > > Status: Open
> > > > ===================
> > > > NOTE: All email exchanges with Unidata User Support are recorded in the 
> > > > Unidata inquiry tracking system and then made publicly available 
> > > > through the web.  If you do not want to have your interactions made 
> > > > available in this way, you must let us know in each email you send to 
> > > > us.
> > > >
> > > >
> > > >
> > >
> > > =Dennis Heimbigner
> > > Unidata
> > >
> > >
> > > Ticket Details
> > > ===================
> > > Ticket ID: ZIZ-818863
> > > Department: Support THREDDS
> > > Priority: Normal
> > > Status: Closed
> > > ===================
> > > NOTE: All email exchanges with Unidata User Support are recorded in the 
> > > Unidata inquiry tracking system and then made publicly available through 
> > > the web.  If you do not want to have your interactions made available in 
> > > this way, you must let us know in each email you send to us.
> > >
> > >
> > >
> >
> > =Dennis Heimbigner
> > Unidata
> >
> >
> > Ticket Details
> > ===================
> > Ticket ID: ZIZ-818863
> > Department: Support THREDDS
> > Priority: Normal
> > Status: Open
> > ===================
> > NOTE: All email exchanges with Unidata User Support are recorded in the 
> > Unidata inquiry tracking system and then made publicly available through 
> > the web.  If you do not want to have your interactions made available in 
> > this way, you must let us know in each email you send to us.
> >
> >
> >
> 
> =Dennis Heimbigner
> Unidata
> 

=Dennis Heimbigner
  Unidata


Ticket Details
===================
Ticket ID: ZIZ-818863
Department: Support THREDDS
Priority: Normal
Status: Open
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.