[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[TIGGE #YCA-363338]: Re: transfer tests between ECMWF and CMA



Baudouin,

> We have tried to substitute ldm by ftp to send the Chinese model output
> from CMA to ECMWF. We are using the same machines (tigge-ldm.ecmwf.int
> and tigge-ldm.cma.gov.cn). We were able to transfer two 14GB files (each
> containing a full model cycle) simultaneously, each at 1MB/s. The
> transfer were initiated by ECMWF (ftp get).

Is "14 GB" "14 gigabytes" or "14 gigabits"?

> This seems to prove that the network bandwidth is sufficient between
> ECMWF and CMA. With LDM we cannot transfer more than 100MB/hour (28
> KB/s), with 20 parallel transfers.

OK.  "MB" here means "megabits".

> Because we can transfer data
> efficiently between ECMWF and NCAR, I think that there must something
> wrong in the installation of LDM at CMA. Could is be possible that
> something is waiting for a system call which times out(*), before
> sending each field (e.g. name resolution, or binding the wrong
> interface,...)? Alternatively, if each of the 20 ldmd processes dies and
> is re-forked after each field is sent, we would have to re-establish a
> TCP connection each time, which is very costly. A last idea would be
> that somewhere a firewall/router treats the LDM traffic differently from
> the FTP traffic.hostbyaddr
> 
> Cheers,
> Baudouin
> 
> (*) a bit of math shows (I may be wrong) that at 100MB/hour, 20 streams
> must transport each 140KB in 98 seconds, which is why I am tempted to
> think that there is a 90s timeout somewhere.

The LDM system has explicit 30 and 60 second timeouts, but no 90 second
timeout.  It could be, however, that converting a hostname to an IP 
address times-out after 90 seconds -- depending on the hostname resolution
environment on the host in question.  If so, then there should be WARN-ing
messages in the LDM log file on the host that attempts the
conversion (both upstream and downstream LDM-s will attempt this
conversion).  Both ECMWF and CMA should check their log files for such
warnings.  Search for the string "Resolving".

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: YCA-363338
Department: Support IDD TIGGE
Priority: Normal
Status: On Hold