Tradeoffs in Configuring the LDM


Tradeoffs between latency and load due to split data-feeds from the same upstream LDM

While it may seem counter-intuitive, it is often the case with existing TCP implementations that -- between two computers -- a single TCP connection has much poorer performance than two TCP connections that each carry half the data1. Therefore, the number of REQUEST entries to the same upstream LDM can affect the tradeoff that consequently exists in the LDM system between data-product latency and computer load. Take, for example, the following entry in the LDM configuration-file, ldmd.conf:

It might be better, in terms of data-product latency to split this request into the following:
REQUEST HDS        .* hostId
If the increased load on the computer from having two downstream LDMs running instead of one caused performance problems, however, then this would be a bad idea.

Besides splitting the data-feeds on the feedtype as in the above example, it is also possible to split them on the data-product identifier ERE. This is most useful for extremely high-volume data-feeds like CONDUIT. For example, the following entries:

REQUEST CONDUIT "[02468]$" hostId
REQUEST CONDUIT "[13579]$" hostId
would split reception of CONDUIT data into two connections to the same upstream LDM of equal volume by taking advantage of the trailing sequence number that is part of the data-product identifier of CONDUIT data-products. Similarly, the feed could be split into five connections of equal volume if experience proved that necessary.

1. It is hoped that a new TCP implementation called FAST from the California Institute of Technology will fix this.