Tradeoffs in Configuring the LDM


Contents


Tradeoffs between robustness, latency, and load due to redundant data-feeds from different upstream LDM-s

Assume that your LDM system receives data-products of a particular feedtype and data-product identifier ERE from a single PRIMARY REQUEST entry. Asume further that the REQUEST entry is duplicated but to a different upstream host. Then -- assuming sufficient bandwidth exists between the upstream LDM and the downstream LDM -- the following will occur due to the added redundancy:

If insufficient bandwidth exists, then only bad things will happen (i.e., there will be increases in latency, probability of data-loss, and computer load).

Depending on your particular circumstances, this might or might not be a good idea. If you have sufficient bandwidth and a sufficiently fast machine, then adding such redundant REQUEST entries is a good idea because it will decrease data-product latency and loss without affecting performance. Because every environment is different, the only way to tell for sure is to try it and see.


Tradeoffs between latency and load due to split data-feeds from the same upstream LDM

While it may seem counter-intuitive, it is often the case with existing TCP implementations that -- between two computers -- a single TCP connection has much poorer performance than two TCP connections that each carry half the data1. Therefore, the number of REQUEST entries to the same upstream LDM can affect the tradeoff that consequently exists in the LDM system between data-product latency and computer load. Take, for example, the following entry in the LDM configuration-file, ldmd.conf:

REQUEST WMO .* hostId
It might be better, in terms of data-product latency to split this request into the following:
REQUEST HDS        .* hostId
REQUEST IDS|DDPLUS .* hostId
If the increased load on the computer from having two downstream LDMs running instead of one caused performance problems, however, then this would be a bad idea.

Besides splitting the data-feeds on the feedtype as in the above example, it is also possible to split them on the data-product identifier ERE. This is most useful for extremely high-volume data-feeds like CONDUIT. For example, the following entries:

REQUEST CONDUIT "[02468]$" hostId
REQUEST CONDUIT "[13579]$" hostId
would split reception of CONDUIT data into two connections to the same upstream LDM of equal volume by taking advantage of the trailing sequence number that is part of the data-product identifier of CONDUIT data-products. Similarly, the feed could be split into five connections of equal volume if experience proved that necessary.


1. It is hoped that a new TCP implementation called FAST from the California Institute of Technology will fix this. The implementation might become available in 2004.