[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20040422: McIDAS gzip ADDE transfers



>From:  Dave Parker <address@hidden>
>Organization:  Space Science and Engineering Center - Madison WI
>Keywords:  200404221643.i3MGhICT022502 McIDAS ADDE gzip

Hi Dave,

>I have a gzip transfer question for you.  We have found that using gzip
>does not necessarily decrease file transfer time and can incur
>significant overhead on the serving machine, especially if the source
>data is very large (several MB).

Hmm...  I have not done any comparative timing tests between 'compress'
and 'gzip' compressed transfers.  I will run some to see if there is a
measurable difference between the two when transferring a 22 MB file.
...  Here are the results for 3 attempts using each type of
compression:

MCCOMPRESS=TRUE (compress)

% time imgcopy.k GINIWEST/GW1KVIS MYDATA/IMAGES.1234 SIZE=ALL
3.0u 1.0s 0:15 25% 0+0k 0+0io 0pf+0w
3.0u 1.0s 0:16 24% 0+0k 0+0io 0pf+0w
3.0u 1.0s 0:16 24% 0+0k 0+0io 0pf+0w
3.0u 1.0s 0:19 20% 0+0k 0+0io 0pf+0w
3.0u 1.0s 0:15 25% 0+0k 0+0io 0pf+0w

MCCOMPRESS=GZIP (gzip)

% time imgcopy.k GINIWEST/GW1KVIS MYDATA/IMAGES.1234 SIZE=ALL
3.0u 2.0s 0:31 16% 0+0k 0+0io 0pf+0w
2.0u 1.0s 0:31 9% 0+0k 0+0io 0pf+0w
3.0u 2.0s 0:21 23% 0+0k 0+0io 0pf+0w
3.0u 2.0s 0:22 22% 0+0k 0+0io 0pf+0w
3.0u 2.0s 0:33 15% 0+0k 0+0io 0pf+0w

While the numbers have to be taken with a grain of salt, I believe that
they show that gzip transfers are slower on average than compress
transfers for the same data.  This is finding is unexpected...

I guess that the other thing we need to consider when judging the gzip
against compress is how they differ in being able to compress the
data.  Here is the comparison for the file just transferred:

   7978027 -rw-rw-r--   1 ustaff   22528848 Apr 22 12:38 AREA1234
   7978082 -rw-rw-r--   1 ustaff   16051301 Apr 22 12:38 AREA1234.Z
   7978082 -rw-rw-r--   1 ustaff   15915943 Apr 22 12:38 AREA1234.gz gzip
   7978082 -rw-rw-r--   1 ustaff   15654197 Apr 22 12:38 AREA1234.gz gzip -1

Interestingly, adding the '-1' flag results in a smaller compressed
file than gzip with no options!  After seeing this, I was curious if
the '-1' flag actually did speed up compression:

% time gzip AREA1234
13.0u 0.0s 0:14 90% 0+0k 0+0io 0pf+0w
% time gzip -1 AREA1234
8.0u 0.0s 0:09 88% 0+0k 0+0io 0pf+0w

Yes, it does speed the compression time, at least for an entire file.

Given the above, I think that specifying the '-1' flag for gzip _is_
the correct thing to do both from a speed and size point of view.

>Have you done any throughput or load testing with gzip,

I hadn't, no.

>and do you know of any optimizations that might be
>done to decrease the load on the server?  We are currently using the
>"-1" flag to specify the fastest compression.

The help I get using 'gzip -help' reads the same for Solaris SPARC
and Fedora Core Linux, and they both show that the only option
that controls compression speed is '-1'.

>Thanks, and enjoy the snow...

Interestingly, it was snowing heavily at my house in the foothills,
but it is just raining here in Boulder.

Cheers,

Tom

>From address@hidden  Thu Apr 22 13:30:29 2004

Tom, thanks for the reply... we have been doing some imgcopy transfer
testing and are seeing some weird results.  Becky has a table that she
will forward you with our results.  The times are fairly consistent
between tests.

The ones that most confuse me are in the Mica (OS X) column, where gzip
is slooow for an image transfer, but the fastest for a grid transfer.
In most cases, it appears that compress is at least as fast as gzip.  Of
course, this is on a local network where bandwidth doesnt play as much
of a roll.  Still, I wouldnt expect to see the outlying values...

DaveP

>From address@hidden  Thu Apr 22 13:30:30 2004

Tom - 

Here is the email with the original gzip test results that Dave wanted
me to send to you.

- Becky

-------- Original Message --------

     Subject: Re: Testing tracking with GZIP compression as default
     Date: Wed, 21 Apr 2004 22:17:12 -0500
     From: Becky Schaffer
     To: mug.team

I just ran some simple time tests on Citation, pointing at four
different servers with the three different choices for compression. I
also did what Rick did, and did the IMGDISP of MAG=-20 as the IMGCOPY
was occurring. Here are the times that I noted, and the number of
"chunks" of data that I counted as IMGDISP displayed them. I just
counted them as they came up on the screen, so the numbers probably
aren't as precise as some other method that I couldn't easily think of
at the time. The IMGCOPY resulted in an 84M AREA file, the PTCOPY was
a 13M MD file, and the GRDCOPY was a 58M GRID file.

- Becky
      Server -->
      Jeep
      Mica
      Balrog
      Brutus

      IMGCOPY gzip time (chunks)
      47.3 (5)
      135.0 (16)
      39.6 (9)
      65.0 (8)

      IMGCOPY none
      35.5 (7)
      35.3 (15)
      49.8 (7)
      137.9 (17)

      IMGCOPY compress
      36.8 (7)
      36.9 (8)
      51.5 (16)
      48.3 (8)

      PTCOPY gzip
      5.5
      9.7
      10.6
      17.8

      PTCOPY none
      11.0
      12.7
      12.0
      16.6

      PTCOPY compress
      6.4
      8.2
      6.8
      16.2

      GRDCOPY gzip
      61.7
      87.1
      120.1
      86.5

      GRDCOPY none
      66.6
      166.2
      86.9
      80.0

      GRDCOPY compress
      67.5
      111.9
      98.6
      78.7

Here's a copy of my script:

OS "imgdel.k A/A.4001 4003

OS "time imgcopy.k ABOM/AREAS.926 A/A.4001 SIZE=ALL MCCOMPRESS=GZIP
OS "time imgcopy.k ABOM/AREAS.926 A/A.4002 SIZE=ALL MCCOMPRESS=NONE
OS "time imgcopy.k ABOM/AREAS.926 A/A.4003 SIZE=ALL MCCOMPRESS=COMPRESS

OS "ls -la AREA400[1-3]

OS "time ptcopy.k ABOM/MDS.1 M/M.4001 MCCOMPRESS=GZIP DEL=YES
OS "time ptcopy.k ABOM/MDS.1 M/M.4002 MCCOMPRESS=NONE DEL=YES
OS "time ptcopy.k ABOM/MDS.1 M/M.4003 MCCOMPRESS=COMPRESS DEL=YES

OS "ls -la MDXX400[1-3]

OS "time grdcopy.k ABOM/GRIDS G/G.4001 NUM=1000 MCCOMPRESS=GZIP DEL=YES
OS "time grdcopy.k ABOM/GRIDS G/G.4002 NUM=1000 MCCOMPRESS=NONE DEL=YES
OS "time grdcopy.k ABOM/GRIDS G/G.4003 NUM=1000 MCCOMPRESS=COMPRESS
DEL=YES

OS "ls -la GRID400[1-3]

Dave Santek wrote:

This is probably also dependent on the operating system running the
server.  From my recollection, this blocking seemed more noticeable
with servers on Linux than other OSs.

dave

Rick Kohrs wrote:

Users will now see a tracking differences when they start using gzip
compression - compared to no-compression. With no compression, the data
would gradually flow in, with compression, the data is received in
large chunks.

For my test, I started an IMGCOPY of a 200mb file, at the same time, I
started an IMGDISP of the destination file blown down by -20 (480x640
frame). I saw 3 distinct chunks of data displayed, previously I saw
maybe 50.

This is definately a difference from the DFASAP days of old.

Rick