[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[TIGGE #BGR-338705]: A problem when perform LDM tests

Subject: [TIGGE #BGR-338705]: A problem when perform LDM tests
Date: Wed, 30 May 2007 13:58:29 -0600
Hi Yangxin,

re:
> I do have configured the downstream LDM to process the data via "pqact", the
> contents of pqact.conf is similar to the real CMA LDM Server.

OK.  This has a bearing on your comment that some products were missed by your
receiving LDM.

re:
> The pqact is part of LDM, I understand the TIGGE data transfer in this way: 
> Multiple
> REQUESTS in the downstream LDM result in upstream's LDM Server invoking 
> multiple "pqinsert"
> processes to work simultaneously, however, downstream's pqact only have one 
> process to save
> the data product from PQ to files in the filesystem. So I guess the load of 
> upsteam LDM
> server should be higher than that of the downstream.

> Do I understand correctly?

Almost, but not quite...

The insertion of products into an upstream's LDM queue is independent of 
REQUESTs
from downstream LDMs.  Sites providing data to others (i.e., acting as an 
upstream)
insert data products into their LDM queues whenever data is available to send to
downstreams.  Sending the data to a downstream LDM is only done when the 
downstream
LDM is running and has issued one or more REQUESTs for data from the upstream.
 
> I'm not sure what "VM session cannot memory map a 512 MB queue" really means.
> Are you saying that the virtual machine works differently for the memory 
> management
> than a real machine?

No.  What I am trying to say is that the size of an LDM queue that can be memory
mapped will be less than the total amount of memory available on the machine or
in the VM session.  The virtual machine should work in the same way as a 
non-virtual
machine.  The idea I was trying to express is that one should make a queue that
is smaller than physical memory so that the entire queue can be memory mapped 
AND
with enough left over so that there is very little or no swapping to disk.  When
the operating system is forced into swapping to disk, the performance of the LDM
decreases rapidly.
re: Are you saying that some data did _not_ make it to the downstream?

> Here, I mean only very few products have been lost during the tests. For 
> example, in
> one scenarian of a series of tests, I mv part of one cycle of data 
> "2007051200"
> into the outgoing directory in upstream, which contains 861 files (I 
> calculated by
> "ls -l 2007051200|grep z_tigge|wc"). When the test is over, I found 858 files 
> reached
> the downstream under the "2007051200" directory where I calculated via
> "ls -lR 2007051200|grep z_tigge|wc". I know the manifest and done file did 
> not reach
> (still don't know why), so there is one product file failed to be received at 
> the
> downstream.
> 
> Other scenario is almost the case.

Thank you for the very clear explanation.

Your observation that the number of products processed to disk through a pqact
action on the downstream machine was less than the number of products sent by 
the upstream
machine does not mean that products were not received by the downstream.  
Instead,
it is most likely that the missing products were not extracted out of the 
downstream
LDM queue before they got overwritten by new data received from the upstream 
LDM.  This
kind of situation is best addressed by running multiple pqact processes on the 
downstream
machine.  The processing load for each pqact instance on the downstream will be
less _IF_ the downstream is configured to run each pqact on a mutually 
exclusive subset
of the data being received.  I believe that Manuel is running this kind of a 
setup
at ECMWF.

> CuiYueming told me that restriction was not set for certain ports but for
> all ports. Right now, they have removed the restrictions to P2P APPs for My 
> IP.

Very good.

Question:

- if there is a 1 Mbps limit on each port, how is it possible to send 30-40 Mbps
  using port 8080?

Another way of asking the same question is: does CSTNET/CuiYueming has some way 
of
classifying activity as being from a P2P application?  If yes, then there 
should be
a way of configuring to _not_ consider LDM traffic as P2P.

> CuiYueming: I know that you know the IP (tigge-ldm.cma.gov.cn) for CMA's LDM 
> Server,
> do you need to know the IP for Unidata LDM Server? If yes, you may ask Tom 
> for this
> information or anything you think is necessary, then we can go ahead to 
> perform tests
> between "tigge-ldm.cma.gov.cn" and a Unidata LDM Server.

The machine we would use for testing is:

yakov.unidata.ucar.edu <-> 128.117.156.86

This is a dual 3.2 Ghz Xeon machine running 64-bit Fedora Core 5 Linux.  The 
machine
has 4 GB of memory, and it is currently using a 1 GB queue, but we would 
increase
this to 3 GB for the test.

If this machine is not sufficient for use in testing we would use one of our IDD
cluster nodes that has 16 GB of RAM and a 12 GB LDM queue.  I can provide 
information
for this machine if it is needed.

> Tom: The "tigge-ldm.cma.gov.cn" is constantly sending and receiving TIGGE 
> Data, do
> you think we should do the following tests for port 388 via this server or I 
> setup
> another one which I have used for test recently between CMA and CSTNET?

I think it would be much better to run the test on a machine that is not 
participating
in the TIGGE data distribution.  This way we can separate cause and effects.

Cheers,

Tom
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: BGR-338705
Department: Support IDD TIGGE
Priority: Normal
Status: Closed
Prev by Date: [TIGGE #BGR-338705]: A problem when perform LDM tests
Next by Date: [TIGGE #BGR-338705]: A problem when perform LDM tests
Previous by thread: [TIGGE #BGR-338705]: A problem when perform LDM tests
Next by thread: [TIGGE #BGR-338705]: A problem when perform LDM tests
Index(es):
- Date
- Thread