[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #LOD-955053]: LDM and byte-swapping error?

Hi Leonard,

> We use LDM to transfer files from our observatory in Hawaii to here in
> HAO in Boulder.  I think we are running 6.4.6.  The files are binary, in
> our own format.  The files are sent from a Solaris system, arrive via
> LDM running on another Solaris system, and then are copied to Linux
> where they get byte-swapped to make them little-endian.

OK.  The LDM doesn't know or care about the contents of products, so the
transfers would be the same from an upstream big-endian system to a downstream
little-endian system.

> All has been well, until recently, when I started experimenting with
> observing at a higher cadence.  Until this time, we transfer something
> like 3 3MB files every three minutes.  My higher cadence has 5 3MB files
> created a minute, or about 5 times as many files a minute.

In the overall scheme of things, this is not a very high number/volume
of data to move using the LDM.  TIGGE archive centers (NCAR, ECMWF,
and the CMA) are moving upwards of 20 GB/hr using the LDM over long-haul

> The strange thing is that about once a minute, one of the files is
> "corrupted" in a very strange way.  It is a copy of the next earlier
> file, with the right filename, but with the contents of the next earlier
> file byte-swapped.  Really strange, eh?

Strange yes, but this is likely the clue to what is going on.  I suspect
that the action that processes the received product(s) out of the downstream
LDM queue is not fully emptying the PIPE from the 'pqact' process that is
processing it.  When the LDM notices that a product that it has PIPEd to
a "decoder" has been closed before the entire product has been read by
the "decoder", it tries the action one more time.  It is the job of the
"decoder" to empty the PIPE fully before exiting.  If my hunch is correct,
you should see pairs of error messages indicating a broken pipe in your
LDM log file, ~ldm/logs/ldmd.log.

> For example if the files file1,
> file2, file3, file4, file5 are created and send over LDM, then and the
> receiving end, file2 could be a byte-swapped copy of file1, or file4
> could be a byte-swapped copy of file3.  The pattern isn't exact, but
> about 1 out of five.

The LDM does no byte swapping.  The products it sends from point-to-point
are treated as streams of bytes only.

> The files arrive gzipped.  I don't know which part does that.  Perhaps
> it's a configuration of the sender, or that's how LDM works?

The gzipping will have to have been done by the process that is inserting
the products into the upstream LDM's product queue.  The LDM makes no
modification to the products it moves.
> Could LDM do this?


> Or, is funny stuff happening at the networking
> level?  Or?  Or, NFS on this end?

It could be a number of things outside of the LDM:

- inserting process on the upstream node
- bad router somewhere in the data transfer path
- a problem with the process on the downstream side
  that is acting on the product

> Thanks for any ideas you have about this.

The first thing I would do is take a hard look at entries
in your LDM log file, ~ldm/logs/ldmd.log.  A problem running
the "decode" action should show up there (as per my musings
above).  If all looks OK there (I am betting that it won't),
I would investigate the process inserting the products on the
upstream (sending) machine and the process that is doing the
byte flipping on the downstream (receiving) machine.

Please let us know if the above helps you zero in on the problem.
We can review your LDM log file and point out any indicated
problems if you send it along to us.


Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
Unidata HomePage                       http://www.unidata.ucar.edu

Ticket Details
Ticket ID: LOD-955053
Department: Support IDD
Priority: Normal
Status: Closed