[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[NOAAPORT #JZP-576695]: dropping products



Hi,

re:
> Ok I have fixed all of the things that were broken by pulling the
> multicast data off the main network. I have the Novra receivers on a
> second NIC card going into my two computers that run the LDM software.

Very good.

re:
> My numbers are looking really good as you can see from the last 3 days
> snapshot (updated to the more current script you asked me to do):
> 
> wxengine3:: 20210529.000000: nGap:      0 nFrame:          0 nG1sec:
> 0 nG5sec:     0 nG15sec:     0 nG1min:     0 nProds:  7127187 nDups:
> 19232 nPinfo:     24
> wxengine3:: 20210530.151020: nGap:     26 nFrame:       2886 nG1sec:
> 26 nG5sec:    26 nG15sec:    26 nG1min:    26 nProds:  7141806 nDups:
> 15984 nPinfo:     24
> wxengine3:: 20210531.200207: nGap:      6 nFrame:        406 nG1sec:
> 4 nG5sec:     4 nG15sec:     4 nG1min:     4 nProds:  7049794 nDups:
> 18778 nPinfo:     24
> 
> 
> wxengine4:: 20210529.000000: nGap:      0 nFrame:          0 nG1sec:
> 0 nG5sec:     0 nG15sec:     0 nG1min:     0 nProds:  7127241 nDups:
> 19090 nPinfo:     24
> wxengine4:: 20210530.151020: nGap:     26 nFrame:       2886 nG1sec:
> 26 nG5sec:    26 nG15sec:    26 nG1min:    26 nProds:  7141949 nDups:
> 15841 nPinfo:     24
> wxengine4:: 20210531.200207: nGap:      6 nFrame:        406 nG1sec:
> 4 nG5sec:     4 nG15sec:     4 nG1min:     4 nProds:  7050055 nDups:
> 18508 nPinfo:     24

The number of Gap messages and associated missed frames is looking much
better than before.  One thing that I don't understand, however, is how
the total number of products (nProds) and number of duplicate products
(nDups) can be so different especially on the day where no Gap messages
were logged *unless* the size of the LDM queues being used on your two
machines are different.

Even though you may have provided this information previously:

- what are the sizes of the LDM queues being used on each of your machines?

  The easiest way to seee the likeness/difference between the machines is to
  compare the output of 'ldmadmin config' on both machines.  Please send
  us the output of 'ldmadmin config' from each machine in your reply to
  this email.

re:
> I have almost identical numbers between the boxes. 

Yes, the number of Gap messages and associated missed frames are very close
to being the same.

Aside:  we know that there can/will be slight differences in various
counts of things like total number of products since the times on the
machines are not exactly the same (they should be close, of course).

re:
> Here is the issue I
> still have not been able to resolve. I am saving the NBM model output
> data. What I actually do is push all of it into a SQL database so we can
> query the data.  For testing purposes I am am just writing the files out
> to the hard drive. So for example the NBE (the extended model output)
> for May 31st at 00z. When I compare the file sizes between the two
> computers, they will be different. Sometimes significantly different.
> This is where I do not understand what could be causing this. If both
> LDM computers have matching nGaps, etc then how can they have different
> file sizes?

If one machine is using a larger LDM queue than the other, and if some
of the products are sent multiple times in NOAAPort, the one with the
larger queue is more likely to detect and eliminate more of the duplicates
than the other.

Looking back through other exchanges in this inquiry, I see that at one
time you were using the following pattern-action file actions to FILE
NBM products into files with a '.txt' suffix:

NGRID ^F(E|O)US1([5-8]) KWNO (..)(....) /p(NBH|NBS|NBE|NBX|NBP)
FILE -close nbm/(\3:yyyy)/(\3:mm)/\3/text/blend/\5_(\3:yyyy)(\3:mm)\3\4.txt

NGRID ^F(E|O)US1([5-8]) KWNO (..)(....) /p(NBH|NBS|NBE|NBX)
FILE -overwrite -close -strip nbmdata/\5-\4_(seq).txt

The top portion of the snapshot you sent suggests that the files listed
were created using the first of these two actions.  Since the FILE
action will append all products processed that match the extended
regular expression to the output file, if one of the machines is
duplicating and eliminating less duplicates that the other, the output
file it is creating can/will be larger than on the other machine.

Question:

- have you compared the contents of the files created on one machine
  against the same file created on the other?

  An aside to this question is if the products being written are actually
  textual in nature.  If they are, then you could use a simple 'diff' to
  see the differences in the files.  If, on the other hand, the products are
  actually binary, you will need to use something like 'cmp'.

re:
> At your direction I did split the NGRID into a separate .conf file and
> that did not change my results. 

By 'results', are you referring to the difference in output file size?

re:
> I would like to see if I can solve this
> issue so I can reliably save data to either of these computers.

In the absence of errors when writing the files to disk, I do not
think that the difference could be anything other than more duplicate
products are being written to the output file on the machine that 
is using a smaller LDM queue.  I could be wrong, but I don't see how
at this point.

re:
> I am attaching a screen shot of the file sizes from the two different
> computers. The wxengine3 computer always outperforms wxengine4.

By 'outperforms' do you mean creates larger output files?  If yes, and if
the LDM queue size on wxengine3 is smaller than the one on wxengine4, the
problem could be simply more duplicates.

re:
> Ironically, it is a lesser computer. It is only a HP Z400 and wxengine4
> is an HP Z820.  

You've said this before. 

re:
> In the screen shot wxenine4 is on the left and wxengine3
> is on the right.

OK, so your 'outperforms' comment really is equivalent to 'creates
larger output files'.  Please correct me if I am misinterpreting things.

re:
> Do you have any ideas what step I should take to figure out what my
> problem is?

Would you like to Meet (Google Meet) to discuss the situation?

re:
> Thanks so much!

No worries.

Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: JZP-576695
Department: Support NOAAPORT
Priority: Normal
Status: Open
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.