[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [NCEP.List.PMB-PCSP] TIGGE resend requests.



Doug et al.

I store both the .status* files as well as the individual products on
motherlode.ucar.edu. I have a web cgi which can be used to check
whether we have the individual products in the .status inventory.
We have enough disk space to keep 2 days on line this way for monitoring
the data flow. The data directory is organized by directory name so the
data/ and data2/ from:
http://motherlode.ucar.edu/cgi-bin/ldm/conduit_reception_new.csh

Note that motherlode feeds from idd.unidata just as ultrazone does.

Looking at your Feb 15, 18Z time you mentioned for the pgrb2a files:
http://motherlode.ucar.edu/cgi-bin/ldm/conduit_reception_new.csh?/data/nccf/com/gens/prod/gefs.20070215/18

I show that motherlode received and stored all the products (green
colored file names) in the inventory.

Your rtstats sent from untrazone show that your latency is small, so if
you aren't seeing the products on your disk. You may need to check your
pqact processing of the data to make sure that the data is getting out
of your queue before new data arriving into your queue pushed the old
data out.

You can monito the age of the oldest data product in your queue with
"pqmon".
The right most column will show the age of the oldest product.

You can send 2 "USR2" signals to your pqact process (eg kill -USR2 pid")
to
throw the program into debug mode which will output a "delay" message
which shows how long it is taking pqact to process a product. If this
delay reached the time of the oldest product....you will likely not get
the data written out to disk before it is overwritten in your queue by
new data. Note....don't leave pqact running in debug mode for too long
since logging takes cpu time. Also new information....LDM 6.5.0
will log a message into your ldmd.log file when it is processing the
oldest product in your queue. 

We also have an uptime script that Tom Y. wrote to track the age of the
oldest product in your queue.

If you are falling that far behind in your disk IO of products, you will
want to look into
how your pqact.conf is designed, and we can help with that. You may need
multiple pqact processes.

It would be a poor use of network to resend data that did get
sucessfully go out if the
problem lies on the end processing, so we should really try to ensure
you are not biting off more data reception than you can process in the
size queue (buffer) you have allotted.

In another note, The CONDUIT GRIB2 names are based on table entries at
the injection site- I use the NCEP/GEMPAK tables for names. For the
GRIB1names, I had used the wgrib code, but it is compiled into the
program- and not as easy then to update with new entries as external
tables, so the evolution to GRIB2 processing with gribinsert is much
more flexible. 

Steve Chiswell
Unidata User Support

On Fri, 2007-02-16 at 14:05 -0700, Doug Schuster wrote:
> Hi Patrick,
> 
> Thats probably about all you can do on your end unless Steve has  
> further suggestions.
> 
> I see that we didn't receive the product, but I also see it logged in  
> our .status.* file.
> 
> With the volume we're dealing with in the TIGGE project (~175 GB of  
> model data transmitted
> daily from 4 centers, and over 200GB/day from ~10 centers in the  
> future),
> all centers tend to have sporadic missing fields in some forecast  
> cycles.  ECMWF and NCAR built a protocol on
> top of LDM to deal with this that involves a process of automated  
> resend requests and resends.
> This has worked well between ECMWF and NCAR, but was determined a  
> security risk for NCEP.
> 
> Doug
> 
> 
> 
> 
> On Feb 16, 2007, at 1:27 PM, Patrick O'Reilly wrote:
> 
> > Hi Doug,
> >
> > Since you mention that NCEP has sporadic missing files, and we have  
> > begun getting frequent lists of missing data, maybe we should try  
> > troubleshooting from end to end?
> >
> > In the most recent TIGGE resend request, starting with the first  
> > field in the list:
> >
> > gens,gefs,20070215,18,pgrb2,gep,001,0000,HGHT,2,TMPK
> >
> > I see that field present in the original file, and the file  
> > correctly processed using gribinsert.  On our LDM, the ldmd.log shows:
> >
> > Feb 15 22:52:35 ldm2 pqact[18843] INFO:    31918 20070215224852.025  
> > CONDUIT 033 data/nccf/com/gens/prod/gefs.20070215/18/pgrb2a/ 
> > gep01.t18z.pgrb2af00 !grib2/ncep/SPEC62MRF/#000/200702151800F000/ 
> > TMPK/2 m HGHT! 000033
> >
> > which to me says that the product, if this is the correct entry,  
> > was inserted into the LDM queue.  Is there any more I can  
> > troubleshoot on this end?  What do you see on your end at this time  
> > (Feb 15 22:52:35) with regard to this product?
> >
> > Patrick
> >
> > Paula Freeman wrote:
> >> Doug,
> >> The parameter names you use are not the ones printed by our wgrib2  
> >> program, so it's difficult to match them up.
> >> We'll need a translator of some kind.  How do you determine the  
> >> parameters names you use?
> >> -Paula
> >> Patrick O'Reilly wrote:
> >>> Paula,
> >>>
> >>> Yesterday, Justin and I noticed that the parameter names that  
> >>> they send aren't
> >>> necessarily the ones we use in our grib files.  This could be a  
> >>> big hassle when we
> >>> try to do all this.  For example in this case, they call it  
> >>> "TMPK" while in our files
> >>> it's "TMP".  The first file he listed:
> >>>
> >>> gens,gefs,20070215,18,pgrb2,gep,001,0000,HGHT,2,TMPK
> >>>
> >>> cd /pub/data/nccf/com/gens/prod/gefs.20070215/18/pgrb2a
> >>> wgrib2 gep01.t18z.pgrb2af00 | grep TMP
> >>> 8:298828:d=2007021518:TMP:1000 mb:anl
> >>> 9:325564:d=2007021518:TMP:925 mb:anl
> >>> 10:351852:d=2007021518:TMP:850 mb:anl
> >>> 11:377672:d=2007021518:TMP:700 mb:anl
> >>> 12:402856:d=2007021518:TMP:500 mb:anl
> >>> 13:427485:d=2007021518:TMP:250 mb:anl
> >>> 14:450560:d=2007021518:TMP:200 mb:anl
> >>> 34:1432164:d=2007021518:TMP:2 m above ground:anl
> >>>
> >>> So it is there, it's the last field showing.  For the fields  
> >>> listed below, "HGHT" is
> >>> "HGT" in our files, and "CAPE" is actually "CAPE"!
> >>>
> >>> We need a translator!
> >>> Patrick
> >>>
> >>> Paula Freeman wrote:
> >>>> All,
> >>>>
> >>>> Apparently now that the retransmit requests are coming, we're  
> >>>> going to be flooded with them
> >>>> and need to process them more efficiently.
> >>>>
> >>>> I need to identify if the parameter he says is missing is  
> >>>> present or missing in the original file
> >>>> we inserted into the ldm from WOC.
> >>>>
> >>>> Patrick, can you help me identify if the parameters he's seeking  
> >>>> are missing in the we sent to the ldm from WOC?
> >>>>
> >>>> ex.  for the first one in the list below that Doug Schuster sent:
> >>>>
> >>>> The format of each request line is:
> >>>>
> >>>> #Model,SubModel,InitDate,Cycle,GribType,ForecastType,EnsMember,Fore 
> >>>> castHour,LevelType,Level,Parameter
> >>>>
> >>>> The first request line is:
> >>>>
> >>>> gens,gefs,20070215,18,pgrb2,gep,001,0000,HGHT,2,TMPK
> >>>>
> >>>> So I look for TMPK in the appropriate file:
> >>>>
> >>>> $ cd /pub/data/nccf/com/gens/prod/gefs.20070215/18/pgrb2a
> >>>> $ wgrib2 gep05.t18z.pgrb2af00 | grep TMPK
> >>>>
> >>>> doesn't print anything, so is it missing?
> >>>>
> >>>> -Paula
> >>>>
> >>>> address@hidden wrote:
> >>>>> #Please reinsert the following GENS pgrb2 fields into CONDUIT
> >>>>> #Model,SubModel,InitDate,Cycle,GribType,ForecastType,EnsMember,For 
> >>>>> ecastHour,LevelType,Level,Parameter
> >>>>> gens,gefs,20070215,18,pgrb2,gep,001,0000,HGHT,2,TMPK
> >>>>> gens,gefs,20070215,18,pgrb2,gep,005,0000,HGHT,2,TMPK
> >>>>> gens,gefs,20070215,18,pgrb2,gep,009,0000,HGHT,2,TMPK
> >>>>> gens,gefs,20070215,18,pgrb2,gep,013,0000,HGHT,2,TMPK
> >>>>> gens,gefs,20070215,18,pgrb2,gep,005,0000,PDLY,180-0,CAPE
> >>>>> gens,gefs,20070215,18,pgrb2,gep,008,0000,PDLY,180-0,CAPE
> >>>>> gens,gefs,20070215,18,pgrb2,gep,013,0000,PDLY,180-0,CAPE
> >>>>> gens,gefs,20070215,18,pgrb2,gep,014,0000,PDLY,180-0,CAPE
> >>>>> gens,gefs,20070215,18,pgrb2,gep,005,0000,PRES,1000,HGHT
> >>>>> gens,gefs,20070215,18,pgrb2,gep,005,0000,PRES,700,TMPK
> >>>>> gens,gefs,20070215,18,pgrb2,gep,005,0000,PRES,850,TMPK
> >>>>> gens,gefs,20070215,18,pgrb2,gep,008,0000,PRES,1000,HGHT
> >>>>> gens,gefs,20070215,18,pgrb2,gep,008,0000,PRES,700,TMPK
> >>>>> gens,gefs,20070215,18,pgrb2,gep,008,0000,PRES,850,TMPK
> >>>>> gens,gefs,20070215,18,pgrb2,gep,013,0000,PRES,1000,HGHT
> >>>>> gens,gefs,20070215,18,pgrb2,gep,013,0000,PRES,700,TMPK
> >>>>> gens,gefs,20070215,18,pgrb2,gep,013,0000,PRES,850,TMPK
> >>>>> gens,gefs,20070215,18,pgrb2,gep,014,0000,PRES,1000,HGHT
> >>>>> gens,gefs,20070215,18,pgrb2,gep,014,0000,PRES,700,TMPK
> >>>>> gens,gefs,20070215,18,pgrb2,gep,014,0000,PRES,850,TMPK
> >>>>> _______________________________________________
> >>>>> NCEP.List.PMB-PCSP mailing list
> >>>>> address@hidden
> >>>>> https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.pmb-pcsp
> >>>>>
> >>>> _______________________________________________
> >>>> NCEP.List.PMB-PCSP mailing list
> >>>> address@hidden
> >>>> https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.pmb-pcsp
> >>>
> >>> _______________________________________________
> >>> NCEP.List.PMB-PCSP mailing list
> >>> address@hidden
> >>> https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.pmb-pcsp
> >> _______________________________________________
> >> NCEP.List.PMB-PCSP mailing list
> >> address@hidden
> >> https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.pmb-pcsp
-- 
Steve Chiswell <address@hidden>
Unidata