[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #DCN-100393]: Writer-Counter Error



Robert,

> I have encountered 5 or so instances in the past several years
> where I have attempted to manually restart LDM and received the "The
> writer-counter of the product-queue isn't zero..." message, which left
> LDM is a stopped state.  I always resolved the situation by rebuilding
> the queue.  In any case, I am somewhat hesitant to restart LDM during
> times when I am "pqinsert-ing" large files into the queue (for instance
> GRIB files during the model cycles) as I feel that would leave the queue
> most vulnerable.  That said, I realized recently that the 'ldmadmin check'
> (which I run each hour) will induce an automatic restart if it needs
> to reconcile the queue (in my case I have a static 4G queue size and
> choose to decrease max latency).  Getting to my question... are there
> any safegaurds built into the 'ldmadmin check' that might prevent the
> aforementioned error from occurring if it needs to restart the service?
> The last thing I would want is for the LDM service to stop during a
> self-induced restart.  If there is no guarantee the service will always
> restart, is it better to set reconciliation to "do nothing" and manually
> reconcile the queue's max latency?  Mind you I have never had such a
> auto-restart ever fail to restart, but I have had manual restarts result
> in the writer-counter error.

There are safeguards to ensure that the LDM product-queue doesn't get 
corrupted. For example, the product-queue library blocks most signals 
(including SIGTERM) while the queue is being accessed. That being said, there 
is no guarantee that the LDM code is bug free.

I have no qualms having an active reconciliation mode if the product-queue is 
close to its equilibrium size. The only problems I've seen are when the queue 
is far too small for the reconciliation algorithm to make a good guess.

If the LDM doesn't restart after a reconciliation, then you likely have bigger 
problems (disk partition full, for example).

> Best Regards,
> Bob
Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: DCN-100393
Department: Support LDM
Priority: Normal
Status: Closed