[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[THREDDS #ZIZ-818863]: Thredds inflation of data



My mistake, I though you were talking about http level chunking and compression.
I thought this because of the logs you sent me.
But I am confused about where you want to use your hardware.  Is your plan to
use it to decompress the file on the server before transmitting it using the
opendap protocol? As a reminder, the file will be decompressed before 
translating
it into the opendap format in order to pass it over the http connection.
Can you elaborate on how you ideally want that special hardware to be used?



> Thatâs not what I understood from the HDF5 doc 
> (https://support.hdfgroup.org/HDF5/doc/Advanced/Chunking/):
> 
> "Dataset chunking also enables the use of I/O filters, including compression. 
> The filters are applied to each chunk individually, and the entire chunk is 
> processed at once."
> 
> Note that I am only talking about reading and serving a compressed netcdf 
> file here, and not about the apache/tomcat compression for data transfer, 
> i.e. the problem I have is likely somewhere in here:
> 
> https://github.com/Unidata/thredds/blob/b731bcb45b6e10b7e6102e97a9ef35e9fef43c93/cdm/src/main/java/ucar/nc2/iosp/hdf5/H5tiledLayoutBB.java
> 
> 
> Martyn HuntÂÂ
> Technical Lead, Mainframe
> Met OfficeÂÂFitzRoy RoadÂÂExeterÂÂDevonÂÂEX1 3PBÂÂUnited Kingdom
> Tel: +44 (0)1392 884897ÂÂ
> Email: address@hidden ÂWebsite: www.metoffice.gov.uk
> 
> -----Original Message-----
> From: Unidata THREDDS Support [mailto:address@hidden
> Sent: 23 October 2017 18:34
> To: Hunt, Martyn <address@hidden>
> Cc: address@hidden
> Subject: [THREDDS #ZIZ-818863]: Thredds inflation of data
> 
> We use the Apache Httpclient system 
> (http://hc.apache.org/httpcomponents-client-4.5.x/)
> so the fix will need to be with respect to that.
> 
> My speculation is that there are two related issues that need investigation.
> 1. chunking - 1 large response is chunked into multiple smaller chunks on the
> server side that are then reassembled on the client side.
> 2. A specific compressor -- GzipCompressingEntity, I think -- is used to do 
> the
> actual compression on the server side.
> 
> I do not know the order in which these are used by the server side. It is 
> possible that the compressor operates first and then the chunker divides that 
> compressed output.
> It is also possible that the chunker is first and that the compressor 
> operates on each separate chunk.
> 
> We will need to investigate to see which is the case (if either) and then 
> figure out how to change the chunking and/or the compression parameters.  I 
> suspect that sending very large (1.6m) chunks is a bad idea. So, I would hope 
> we set things up so that the compression is first and the compressed data is 
> then chunked.
> Note that this will also require a corresponding change on the client side.
> 
> In any case, this is going to take a while for me to figure it out.
> 
> 
> =======================
> > I am currently trying to get a compression accelerator to work with 
> > Thredds, with the aim to reduce the CPU time that Thredds spends 
> > decompressing chunks of data.  The compression card plugs straight in to 
> > IBM Java8 and java.util.zip with no changes needed to the application that 
> > uses it.  However, the replacement code with always revert to software 
> > inflation when the size of the data passed to java.util.zip is less than 
> > 16384 bytes.
> >
> > Our data is chunked and compressed, with the data we want to retrieve 
> > ending up as 70  ~1.6MB chunks (compressed), which should inflate to ~7MB 
> > each (see the hdls.txt file for more detail).
> >
> > When requesting the data, I use the following URL to request a single chunk 
> > (our applications run through selecting each of the 70 chunks sequentially 
> > one at a time, in this example I'm just picking one chunk.
> >
> > http://dvtds02-zvopaph2:8080/thredds/dodsC/decoupler/mhtest/original.n
> > c.ascii?air_temperature[0][43][0:1:1151][0:1:1535<http://dvtds02-zvopa
> > ph2:8080/thredds/dodsC/decoupler/mhtest/original.nc.ascii?air_temperat
> > ure%5b0%5d%5b43%5d%5b0:1:1151%5d%5b0:1:1535>]
> >
> > While I expect there may be a few smaller inflate operations at the 
> > start/end of the request, I'd expect that there would be a single 1.6MB --> 
> > 7MB inflate request in there.  Instead of this, in the compression software 
> > logs, I see 1000's of 512byte inflate requests, which as they are smaller 
> > than the min 16384byte limit the compression card has, never get passed to 
> > the compression card.
> >
> > e.g.
> >
> > 2017-10-23T12:55:31.643894+00:00 dvtds02-zvopaph2 server: ### 
> > [0x3ff183456a0] inflate:   flush=1 next_in=0x9b917b198 avail_in=512 
> > next_out=0x9b917b3b8 avail_out=9200 total_in=2048 total_out=31827 
> > crc/adler=1b38b342
> > 2017-10-23T12:55:31.644229+00:00 dvtds02-zvopaph2 server: ### 
> > [0x3ff183456a0]            flush=1 next_in=0x9b917b398 avail_in=0 
> > next_out=0x9b917c422 avail_out=4998 total_in=2560 total_out=36029 
> > crc/adler=e835c747 rc=0
> > 2017-10-23T12:55:31.644541+00:00 dvtds02-zvopaph2 server: ### 
> > [0x3ff183456a0] inflate:   flush=1 next_in=0x9b917b398 avail_in=0 
> > next_out=0x9b917b3b8 avail_out=9200 total_in=2560 total_out=36029 
> > crc/adler=e835c747
> > 2017-10-23T12:55:31.644909+00:00 dvtds02-zvopaph2 server: ### 
> > [0x3ff183456a0]            flush=1 next_in=0x9b917b398 avail_in=0 
> > next_out=0x9b917b3b8 avail_out=9200 total_in=2560 total_out=36029 
> > crc/adler=e835c747 rc=-5
> > 2017-10-23T12:55:31.645234+00:00 dvtds02-zvopaph2 server: ### 
> > [0x3ff183456a0] inflate:   flush=1 next_in=0x9b917b198 avail_in=512 
> > next_out=0x9b917b3b8 avail_out=9200 total_in=2560 total_out=36029 
> > crc/adler=e835c747
> > 2017-10-23T12:55:31.645568+00:00 dvtds02-zvopaph2 server: ### 
> > [0x3ff183456a0]            flush=1 next_in=0x9b917b398 avail_in=0 
> > next_out=0x9b917c47a avail_out=4910 total_in=3072 total_out=40319 
> > crc/adler=f1a70cdc rc=0
> > 2017-10-23T12:55:31.645879+00:00 dvtds02-zvopaph2 server: ### 
> > [0x3ff183456a0] inflate:   flush=1 next_in=0x9b917b398 avail_in=0 
> > next_out=0x9b917b3b8 avail_out=9200 total_in=3072 total_out=40319 
> > crc/adler=f1a70cdc
> > 2017-10-23T12:55:31.646199+00:00 dvtds02-zvopaph2 server: ### 
> > [0x3ff183456a0]            flush=1 next_in=0x9b917b398 avail_in=0 
> > next_out=0x9b917b3b8 avail_out=9200 total_in=3072 total_out=40319 
> > crc/adler=f1a70cdc rc=-5
> > 2017-10-23T12:55:31.646511+00:00 dvtds02-zvopaph2 server: ### 
> > [0x3ff183456a0] inflate:   flush=1 next_in=0x9b917b198 avail_in=512 
> > next_out=0x9b917b3b8 avail_out=9200 total_in=3072 total_out=40319 
> > crc/adler=f1a70cdc
> > 2017-10-23T12:55:31.646847+00:00 dvtds02-zvopaph2 server: ### 
> > [0x3ff183456a0]            flush=1 next_in=0x9b917b398 avail_in=0 
> > next_out=0x9b917c272 avail_out=5430 total_in=3584 total_out=44089 
> > crc/adler=8dba79f4 rc=0
> > 2017-10-23T12:55:31.647166+00:00 dvtds02-zvopaph2 server: ### 
> > [0x3ff183456a0] inflate:   flush=1 next_in=0x9b917b398 avail_in=0 
> > next_out=0x9b917b3b8 avail_out=9200 total_in=3584 total_out=44089 
> > crc/adler=8dba79f4
> > 2017-10-23T12:55:31.647490+00:00 dvtds02-zvopaph2 server: ### 
> > [0x3ff183456a0]            flush=1 next_in=0x9b917b398 avail_in=0 
> > next_out=0x9b917b3b8 avail_out=9200 total_in=3584 total_out=44089 
> > crc/adler=8dba79f4 rc=-5
> >
> > Happy to send across the datafile I'm using as an example, please let me 
> > know if you need any other info.
> >
> > Thanks
> >
> > Martyn
> >
> > Martyn Hunt
> > Technical Lead, Mainframe
> > Met Office  FitzRoy Road  Exeter  Devon  EX1 3PB  United Kingdom
> > Tel: +44 (0)1392 884897
> > Email:
> > address@hidden<mailto:address@hidden>
> > Website: www.metoffice.gov.uk<http://www.metoffice.gov.uk>
> >
> >
> >
> 
> =Dennis Heimbigner
> Unidata
> 
> 
> Ticket Details
> ===================
> Ticket ID: ZIZ-818863
> Department: Support THREDDS
> Priority: Normal
> Status: Open
> ===================
> NOTE: All email exchanges with Unidata User Support are recorded in the 
> Unidata inquiry tracking system and then made publicly available through the 
> web.  If you do not want to have your interactions made available in this 
> way, you must let us know in each email you send to us.
> 
> 
> 

=Dennis Heimbigner
  Unidata


Ticket Details
===================
Ticket ID: ZIZ-818863
Department: Support THREDDS
Priority: Normal
Status: Open
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.



NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.