[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20060201: RSA Comm Test (Weather - JSC) LDM Troubleshooting (cont.)



>From:  "Hoeth, Brian R. \(JSC-WS8\)[LM]" <address@hidden>
>Organization:  NASA
>Keywords:  200602011404.k11E427s002835 LDM rpc.ldmd notifyme

Brice et. al.,

re:
>I did run into Tom and his theory is that step 9 of installing the LDM
>(http://www.unidata.ucar.edu/software/ldm/ldm-6.4.4/basics/source-install
>-steps.html) on our (JSC) side was either not performed or that it was
>not successfully completed.  This step is as such:

>9.  Install some components with superuser privileges:

>    su root -c 'make install_setuids'

>Note that you need the superuser's password to accomplish this step.

>This step is necessary for the LDM server to listen on port 388 (which
>is a restricted port) and for the hupsyslog(1) utility to notify the
>syslogd(8) daemon when a new log-file has been created.

>He was thinking that this step either wasn't done or it was attempted,
>but some security setting was in place to disallow the setting of user
>ids.  I talked briefly with Tim O about this on the phone yesterday.
>Would one or both of you please check on this?

I just want to clarify what I discussed with Brian in reaction to
his comments that:

- the downstream was not getting any data from the upstream

- a 'notifyme' from the downstream to the upstream shows that the
  upstream _is_ getting data products put into its LDM queue

If, as Brian outlined above, the LDM installation on the downstream
did not include the final step:

<as 'root'>
cd ~ldm-x.x.x/src
make install_setuids

then the rpc.ldmd process(es) on the downstream will not use port 388
(since it is a privileged port).  Instead, it(they) will attempt to
contact the portmapper on the upstream for the feed connection, and it
is very common for the portmapper to either not be running or the
firewall to be setup to not allow access to the portmapper.  The quick
and easy check to see if the final installation step was done is to do
a long listing of the ~ldm/bin directory and look at the permissions on
rpc.ldmd and hupsyslog.  If they are not owned by 'root' and/or do not
have the setuid root permission bit set, then the rpc.ldmd will go the
portmapper route.

The other thing I mentioned is that the file system that the LDM is
installed on could be configured to _NOT_ allow invocation of setuid
root proceses.  In this case, the rpc.ldmd process(es) would also run
without 'root' privilege in the initial step of using port 388, and so
should again try to go through the portmapper.  The Brazilian
institution CPTEC just went through this exact same scenario, so I know
that it is possible.  CPTEC's "solution" was to copy the rpc.ldmd and
hupsyslog process to a file system that was not setup to block
invocation of setuid root processes and create a symbolic link back to
the ~ldm/bin directory.  I don't particularly like this "solution"
since it makes doing an upgrade more difficult (by making the person
doing the upgrade remember that they have to copy the executables to
the other file system and redo the symbolic link).

The other thing I mentioned was checking the request lines in
~ldm/etc/ldmd.conf to make sure that here were no typos in the request
pattern (unprintable characters, etc.).

Again, my comments to Brian were in reaction to 'notifyme' running
successfully and 'rpc.ldmd' requests not getting any of the data.

Cheers,

Tom

>________________________________
>
>From: Biggerstaff, Brice A9 [mailto:address@hidden
>Sent: Tue 1/31/2006 8:57 AM
>To: Petit, Jackie
>Cc: Sims, Vashon J (MSFC Secondary); Oram, Timothy D. (JSC-ZS8); Batson, =
>Bryan; Schaffert, Lowell; Sautter, David; Hoeth, Brian R. (JSC-ZS8)[LM]; =
>Zickert, Glenn A; address@hidden
>Subject: RE: RSA Comm Test (Weather - JSC) LDM Troubleshooting
>
>
>
>Jackie, et al,=20
>
>  Just wanted everyone else to see the text of a note that I sent to =
>Steve Emmerson - without the ldm logs that I sent to him and a few of =
>you.  If you got it double, that's why.
>
>       Steve,=20
>
>         Thanks for the information about the multi-homed servers.  I am =
>enclosing the reciprocal logs for the timeframe that Jackie sent you.  =
>They are tar'd up and gzipped.  When you unzip it, there won't be a =
>'.tar' extension, but it is a standard tar file.  To clarify some =
>information about our situation.  The "SA" and "GP" files that Jackie =
>referred to are all transmitted as LDM "EXP" files.  Internally they are =
>surface observation and GPsonde balloon data respectively, in ASCII =
>format.  On my side I am putting them in using a simple pqinsert =
>manually.  Jackie is 'bent-piping' a data feed of similar data files =
>from local data receivers, wind sensors, radar profilers, etc., all in =
>ASCII files transmitted as LDM EXP.  She receives the data feed from her =
>local systems and I can see that her server is getting it by using a =
>notifyme command on our server, which you will be able to see when you =
>check the logs.  However, none of the data is ever received at either =
>end.  Until Friday that is.  On Friday Jackie was able to receive two of =
>the surface ob files, but she never received any of the balloon files =
>that I tried to send.  The major difference between the surface ob files =
>and the balloon files, as I see it, is size.  Surface ob files are ~ 240 =
>bytes and the balloon files are ~18K. =20
>
>         I didn't get a chance to poke Brian about getting to Tom Yoksas at =
>the AMS conference, but I'm sure he will if he gets a chance.  As for =
>letting you login in to the systems, the Cape system doesn't have any =
>Internet connection that I'm aware of and ours is a restricted no =
>incoming login, so that won't be doable, I'm afraid.  Appreciate your =
>time and assistance, Steve. We are just not seeing what we expect from =
>our experience.
>
>  Also, Jackie,  I will be on-site close to the LDM server this week =
>working on another piece of our system.  So, I won't be seeing my email =
>except briefly in the mornings.  If you want to try anything else or if =
>you want to get a conference call set up with Steve or such, try me at =
>281-483-2270 or the operators at 281-483-1045.  I'll make sure to check =
>in with them and they should be able to find me most of the time.  You =
>can try my pager at 713-764-2601, but in the bowels of the control =
>center, it is sometimes unreliable.
>
>Brice=20
>
>
>-----Original Message-----=20
>From: Steve Emmerson [mailto:address@hidden =
><mailto:address@hidden> ]=20
>Sent: Friday, January 27, 2006 5:24 PM=20
>To: Petit, Jackie=20
>Cc: address@hidden; Sims, Vashon J (MSFC Secondary); Oram, =
>Timothy D. (JSC-ZS8); Batson, Bryan; Schaffert, Lowell; Sautter, David; =
>Hoeth, Brian R. (JSC-ZS8)[LM]; Biggerstaff, Brice A9; Zickert, Glenn A; =
>address@hidden
>
>Subject: Re: RSA Comm Test (Weather - JSC) LDM Troubleshooting=20
>
>Jackie,=20
>
>>Date: Fri, 27 Jan 2006 17:07:01 -0500=20
>>From: "Petit, Jackie" <address@hidden>=20
>>Organization: UCAR/Unidata=20
>>To: "Steve Emmerson (E-mail)" <address@hidden>=20
>>Subject: RE: RSA Comm Test (Weather - JSC) LDM Troubleshooting=20
>
>The above message contained the following:=20
>
>> Brice, Tim and I got together and did some troubleshooting and I was=20
>> able to see some files that they sent.  For some reason I could only=20
>> see type SA files (SA04 and SA11).=20
>
>I'm afraid that I don't know what "SA04" and "SA11" files are.=20
>
>> When he tried to send GP type files, nothing came through.=20
>
>I'm afraid I don't know what "GP" files are, either.  Sorry.=20
>
>In general, for a downstream LDM to be able to receive certain =
>data-products from an upstream LDM, the following must be true:
>
>    1.  The upstream LDM must receive the data-products.  This can be=20
>        verified by executing, on the upstream LDM's host, the command=20
>
>            pqcat -vl- -f <<feedtype>> -p <<pattern>> -o <<offset>>=20
>
>        where=20
>            <<feedtype>>        is the feedtype of the data-products =
>(e.g.,=20
>                                EXP)=20
>
>            <<pattern>>         is the extended regular expression for=20
>                                the product-identifier of the =
>data-aproducts.=20
>
>            <<offset>>          Is the time-offset in seconds in which=20
>                                to go back in the product-queue to find=20
>                                matching data-products (e.g., 300 for 5=20
>                                minutes).=20
>
>    2.  The downstream LDM must be able to connect to the upstream LDM.=20
>        This can be verified by executing, on the downstream host, the=20
>        command=20
>
>            ldmping -i 0 <<upstream host>>=20
>
>        where <<upstream host>> is the identifier for the upstream host=20
>        (either hostname, fully-qualified hostname, or IP address).=20
>
>    3.  The downstream LDM must be allowed to receive the requested=20
>        class of data-products from the upstream LDM (i.e., the LDM=20
>        configuration-file on the upstream LDM must have appropriate=20
>        entries).=20
>
>These three items can be combined into executing, on the downstream =
>host, the single command=20
>
>    notifyme -h <<upstream host>> -f <<feedtype>> -p <<pattern>> -o =
><<offset>>=20
>
>If a downstream LDM process is unable to connect to the upstream LDM =
>server, then the following command can be useful in diagnosing problems:
>
>    rpcinfo -n 388 -t <<upstream host>> 300029 6=20
>
>This command attempts to contact version 6 of program 300029 (the LDM) =
>via a TCP connection to port 388 on host <<upstream host>>.  Because =
>this command is non-standard, it might be necessary to adapt it to your =
>system by using different options.
>
>> They were never able to see files from us but did get notified of our=20
>> files on the notifier.  (Brice, please elaborate.)=20
>
>Notifier?=20
>
>> They had to get ready for a power outage so Brice asked if I would=20
>> send you an E-mail to find out if having two ethernet ports could=20
>> cause a problem with ldm.=20
>
>By default, the LDM server will listen for incoming connections on all =
>available interfaces.  This is, usually, not a problem.  We're running =
>the LDM on several multi-homed computers here.
>
>This default can be overridden via the $ip_addr variable in the file =
>"etc/ldmadmin-pl.conf".=20
>
>> They use the workstation as a bridge/firewall between their LAN and=20
>> ours.  He thinks it may be getting confused since he sees mention of=20
>> an ldm5 and ldm6.  We only see ldm6 referenced in our log (see=20
>> attached) and only have one ethernet port.=20
>
>Looking at just one downstream LDM process on host "rsaintrf", I see the =
>following at the beginning of the log file:=20
>
>    Jan 27 21:10:40 ftpsvr rsaintrf[20183] NOTE: Starting Up(6.4.2): =
>rsaintrf.midds.jsc.nasa.gov:388 20060127201040.869 TS_ENDT {{ANY,  =
>".*"}}=20
>
>    Jan 27 21:10:40 ftpsvr rsaintrf[20183] NOTE: LDM-6 desired =
>product-class: 20060127210938.347 TS_ENDT {{ANY,  ".*"},{NONE,  =
>"SIG=3D9b7a056982e167351f69140376671e58"}}=20
>
>    Jan 27 21:10:41 ftpsvr rsaintrf[20183] NOTE: Upstream LDM-6 on =
>rsaintrf.midds.jsc.nasa.gov is willing to be a primary feeder=20
>
>    Jan 27 21:21:01 ftpsvr rsaintrf[20183] ERROR: Terminating due to LDM =
>failure; Connection to upstream LDM closed=20
>    Jan 27 21:21:01 ftpsvr rsaintrf[20183] NOTE: LDM-6 desired =
>product-class: 20060127211936.115 TS_ENDT {{ANY,  ".*"},{NONE,  =
>"SIG=3D4330b0a1b311f387d2038d03dd7faa67"}}=20
>
>    Jan 27 21:21:01 ftpsvr rsaintrf[20183] ERROR: Terminating due to LDM =
>failure; Couldn't connect to LDM on rsaintrf.midds.jsc.nasa.gov using =
>either port 388 or portmapper; : RPC: Program not registered=20
>
>    ...=20
>
>The above indicates that, after an initial, successful connection to the =
>upstream LDM on host "ftpsvr" (from 21:10:41 to 21:21:01) the downstream =
>LDM on "rsaintrf" lost the connection and was unable to reconnect =
>because the upstream LDM wasn't available: it was unable to create a TCP =
>connection to port 388 on the upstream host (because nothing was =
>listening on that port) and the LDM wasn't registered with the =
>portmapper on any other port on the upstream host.
>
>The log file also contains the following:=20
>
>    Jan 27 21:11:03 ftpsvr rsaintrf[20348] NOTE: Data-product with =
>signature 60f97e00a793afb4f67c7dd94fe46e41 wasn't found in product-queue =
>
>
>    Jan 27 21:11:03 ftpsvr rsaintrf(feed)[20348] NOTE: Starting =
>Up(6.4.2/6): 20060127205131.054 TS_ENDT {{ANY,  ".*"}}, Primary=20
>
>    Jan 27 21:11:03 ftpsvr rsaintrf(feed)[20348] NOTE: topo:  =
>rsaintrf.midds.jsc.nasa.gov {{ANY, (.*)}}=20
>    Jan 27 21:27:11 ftpsvr rsaintrf(feed)[20348] ERROR: feed or notify =
>failure; HEREIS: RPC: Unable to send; errno =3D Broken pipe=20
>
>    Jan 27 21:27:11 ftpsvr rpc.ldmd[20180] NOTE: child 20348 exited with =
>status 7=20
>
>The above indicates that an upstream LDM process was started on host =
>"rsaintrf" feeding data-products of feedtype/pattern ANY/.* to a =
>downstream LDM on host "ftpsvr" using primary exchange mode.  This =
>process lasted from 21:11:03 to 21:27:11 at which time the upstream LDM =
>was unable to send a data-product to the downstream LDM because the =
>connection was broken for some reason (the reason might be found in the =
>LDM log file on host "ftpsvr").  At this time, the upstream LDM on host =
>"rsaintrf" exited.
>
>I hope this helps.  Feel free to contact me with any questions.  Also, =
>if I can log onto the systems in question as the LDM user, then I should =
>be able to more easily diagnose any problems.
>
>Incidentally, Tom Yoksas will also be at the AMS meeting in Atlanta, =
>where he will be presenting several papers on the LDM and Internet data =
>distribution.  He has considerable experience diagnosing connectivity =
>problems in LDM networks.  You might tell Brian Hoeth to look him up.
>
>He'll have a laptop and the two of them might be able to solve all your =
>problems while at the convention.=20
>
>Regards,=20
>Steve Emmerson=20
>LDM Developer=20
>
>
>------_=_NextPart_001_01C62738.57610B47
>Content-Type: text/html;
>       charset="iso-8859-1"
>Content-Transfer-Encoding: quoted-printable
>
><META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
>charset=3Diso-8859-1">=0A=
><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">=0A=
><HTML>=0A=
><HEAD>=0A=
>=0A=
><META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
>6.0.6603.0">=0A=
><TITLE>RE: RSA Comm Test (Weather - JSC) LDM Troubleshooting</TITLE>=0A=
></HEAD>=0A=
><BODY>=0A=
><DIV id=3DidOWAReplyText25445 dir=3Dltr>=0A=
><DIV dir=3Dltr><FONT face=3DArial color=3D#000000 =
>size=3D2>Brice,</FONT></DIV>=0A=
><DIV dir=3Dltr><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>=0A=
><DIV dir=3Dltr><FONT face=3DArial size=3D2>I did run into Tom and his =
>theory is =0A=
>that&nbsp;step 9&nbsp;of installing the LDM (<A =0A=
>href=3D"http://www.unidata.ucar.edu/software/ldm/ldm-6.4.4/basics/source-=
>install-steps.html">http://www.unidata.ucar.edu/software/ldm/ldm-6.4.4/ba=
>sics/source-install-steps.html</A>) =0A=
>on our (JSC) side was either not performed or that it was not =
>successfully =0A=
>completed.&nbsp; This step is as such:</FONT></DIV>=0A=
><DIV dir=3Dltr><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>=0A=
><DIV dir=3Dltr>&nbsp;</DIV>=0A=
><DIV dir=3Dltr>9.&nbsp; Install some components with superuser =
>privileges: </DIV>=0A=
><DIV dir=3Dltr>&nbsp;</DIV>=0A=
><DIV dir=3Dltr>&nbsp;&nbsp;&nbsp; su root -c 'make install_setuids'</DIV>=0A=
><DIV dir=3Dltr>&nbsp;</DIV>=0A=
><DIV dir=3Dltr>Note that you need the superuser's password to accomplish =
>this =0A=
>step. </DIV>=0A=
><DIV dir=3Dltr>&nbsp;</DIV>=0A=
><DIV dir=3Dltr>This step is necessary for the <A =
>href=3D"glindex.html#LDM">LDM</A> =0A=
>server to listen on port 388 (which is a restricted port) and for the =0A=
><B><CODE>hupsyslog(1)</CODE></B> utility to notify the =0A=
><B><CODE>syslogd(8)</CODE></B> daemon when a new log-file has been =0A=
>created.</DIV>=0A=
><DIV dir=3Dltr>&nbsp;</DIV></DIV>=0A=
><P dir=3Dltr>He was thinking that this step either&nbsp;wasn't done or =
>it was =0A=
>attempted, but some security setting was in place to disallow the =
>setting of =0A=
>user ids.&nbsp; I talked briefly with Tim O about this on the phone =0A=
>yesterday.&nbsp; Would one or both of you please check on this? </P>=0A=
><P dir=3Dltr>Thanks!</P>=0A=
><P dir=3Dltr>Brian</P>=0A=
><DIV dir=3Dltr>=0A=
><DIV dir=3Dltr><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV></DIV>=0A=
><DIV dir=3Dltr><BR>=0A=
><HR tabIndex=3D-1>=0A=
><FONT face=3DTahoma size=3D2><B>From:</B> Biggerstaff, Brice A9 =0A=
>[mailto:address@hidden<BR><B>Sent:</B> Tue 1/31/2006 =
>8:57 =0A=
>AM<BR><B>To:</B> Petit, Jackie<BR><B>Cc:</B> Sims, Vashon J (MSFC =
>Secondary); =0A=
>Oram, Timothy D. (JSC-ZS8); Batson, Bryan; Schaffert, Lowell; Sautter, =
>David; =0A=
>Hoeth, Brian R. (JSC-ZS8)[LM]; Zickert, Glenn A; =0A=
>address@hidden<BR><B>Subject:</B> RE: RSA Comm Test (Weather - =
>JSC) LDM =0A=
>Troubleshooting<BR></FONT><BR></DIV>=0A=
><DIV>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>Jackie, et =
>al,</FONT></SPAN> </P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>&nbsp; Just</FONT> =
><FONT face=3DArial =0A=
>size=3D2>wanted everyone else to see the text of a note that I sent to =
>Steve =0A=
>Emmerson - without the ldm logs that I sent to him and a few of =
>you.&nbsp; If =0A=
>you got it double, that's why.</FONT></SPAN></P>=0A=
><UL>=0A=
>  <P><SPAN lang=3Den-us><FONT face=3DArial color=3D#0000ff =
>size=3D2>Steve</FONT><FONT =0A=
>  face=3DArial color=3D#0000ff size=3D2>,</FONT></SPAN> </P>=0A=
>  <P><SPAN lang=3Den-us><FONT face=3DArial color=3D#0000ff =
>size=3D2>&nbsp; Thanks for =0A=
>  the information about the multi-homed servers.&nbsp; I am enclosing =
>the =0A=
>  reciprocal logs for the timeframe that Jackie sent you.&nbsp; They are =
>tar'd =0A=
>  up and gzipped.&nbsp; When you unzip it, there won't be a '.tar' =
>extension, =0A=
>  but it is a standard tar file.&nbsp; To clarify some information about =
>our =0A=
>  situation.&nbsp; The "SA" and "GP" files that Jackie referred to are =
>all =0A=
>  transmitted as LDM "EXP" files.&nbsp; Internally they are surface =
>observation =0A=
>  and GPsonde balloon data respectively, in ASCII format.&nbsp; On my =
>side I am =0A=
>  putting them in using a simple pqinsert manually.&nbsp; Jackie is =0A=
>  'bent-piping' a data feed of similar data files from local data =
>receivers, =0A=
>  wind sensors, radar profilers, etc., all in ASCII files transmitted as =
>LDM =0A=
>  EXP.&nbsp; She receives the data feed from her local systems and I can =
>see =0A=
>  that her server is getting it by using a notifyme command on our =
>server, which =0A=
>  you will be able to see when you check the logs.&nbsp; However, none =
>of the =0A=
>  data is ever received at either end.&nbsp; Until Friday that is.&nbsp; =
>On =0A=
>  Friday Jackie was able to receive two of the surface ob files, but she =
>never =0A=
>  received any of the balloon files that I tried to send.&nbsp; The =
>major =0A=
>  difference between the surface ob files and the balloon files, as I =
>see it, is =0A=
>  size.&nbsp; Surface ob files are ~ 240 bytes and the balloon files are =0A=
>  ~18K.&nbsp; </FONT></SPAN></P>=0A=
>  <P><SPAN lang=3Den-us><FONT face=3DArial color=3D#0000ff =
>size=3D2>&nbsp; I didn't get =0A=
>  a chance to poke Brian about getting to Tom Yoksas at the AMS =
>conference, but =0A=
>  I'm sure he will if he gets a chance.&nbsp; As for letting you login =
>in to the =0A=
>  systems, the Cape system doesn't have any Internet connection that I'm =
>aware =0A=
>  of and ours is a restricted no incoming login, so that won't be =
>doable, I'm =0A=
>  afraid.&nbsp; Appreciate your time and assistance, Steve. We are just =
>not =0A=
>  seeing what we expect from our experience.</FONT></SPAN></P></UL>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>&nbsp; Also, =
>Jackie,&nbsp; I will be =0A=
>on-site close to the LDM server this week working on another piece of =
>our =0A=
>system.&nbsp; So, I won't be seeing my email except briefly in the =0A=
>mornings.&nbsp; If you want to try anything else or if you want to get a =0A=
>conference call set up with Steve or such, try me at 281-483-2270 or the =0A=
>operators at 281-483-1045.&nbsp; I'll make sure to check in with them =
>and they =0A=
>should be able to find me most of the time.&nbsp; You can try my pager =
>at =0A=
>713-764-2601, but in the bowels of the control center, it is sometimes =0A=
>unreliable.</FONT></SPAN></P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>Brice</FONT></SPAN> =
></P><BR>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>-----Original =0A=
>Message-----</FONT></SPAN> <BR><SPAN lang=3Den-us><FONT face=3DArial =
>size=3D2>From: =0A=
>Steve Emmerson [</FONT></SPAN><A =
>href=3D"mailto:address@hidden";><SPAN =0A=
>lang=3Den-us><U><FONT face=3DArial color=3D#0000ff =0A=
>size=3D2>mailto:address@hidden</FONT></U></SPAN></A><SPAN =
>lang=3Den-us><FONT =0A=
>face=3DArial size=3D2>] </FONT></SPAN><BR><SPAN lang=3Den-us><FONT =
>face=3DArial =0A=
>size=3D2>Sent: Friday, January 27, 2006 5:24 PM</FONT></SPAN> <BR><SPAN =0A=
>lang=3Den-us><FONT face=3DArial size=3D2>To: Petit, Jackie</FONT></SPAN> =
><BR><SPAN =0A=
>lang=3Den-us><FONT face=3DArial size=3D2>Cc: =
>address@hidden; Sims, =0A=
>Vashon J (MSFC Secondary); Oram, Timothy D. (JSC-ZS8); Batson, Bryan; =
>Schaffert, =0A=
>Lowell; Sautter, David; Hoeth, Brian R. (JSC-ZS8)[LM]; Biggerstaff, =
>Brice A9; =0A=
>Zickert, Glenn A; address@hidden</FONT></SPAN></P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>Subject: Re: RSA Comm =
>Test (Weather =0A=
>- JSC) LDM Troubleshooting</FONT></SPAN> </P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>Jackie,</FONT></SPAN> =
></P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>&gt;Date: Fri, 27 Jan =
>2006 17:07:01 =0A=
>-0500</FONT></SPAN> <BR><SPAN lang=3Den-us><FONT face=3DArial =
>size=3D2>&gt;From: =0A=
>"Petit, Jackie" &lt;address@hidden&gt;</FONT></SPAN> <BR><SPAN =0A=
>lang=3Den-us><FONT face=3DArial size=3D2>&gt;Organization: =
>UCAR/Unidata</FONT></SPAN> =0A=
><BR><SPAN lang=3Den-us><FONT face=3DArial size=3D2>&gt;To: "Steve =
>Emmerson (E-mail)" =0A=
>&lt;address@hidden&gt;</FONT></SPAN> <BR><SPAN =
>lang=3Den-us><FONT =0A=
>face=3DArial size=3D2>&gt;Subject: RE: RSA Comm Test (Weather - JSC) LDM =0A=
>Troubleshooting</FONT></SPAN> </P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>The above message =
>contained the =0A=
>following:</FONT></SPAN> </P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>&gt; Brice, Tim and I =
>got together =0A=
>and did some troubleshooting and I was </FONT></SPAN><BR><SPAN =
>lang=3Den-us><FONT =0A=
>face=3DArial size=3D2>&gt; able to see some files that they sent.&nbsp; =
>For some =0A=
>reason I could only </FONT></SPAN><BR><SPAN lang=3Den-us><FONT =
>face=3DArial =0A=
>size=3D2>&gt; see type SA files (SA04 and SA11).</FONT></SPAN> </P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>I'm afraid that I =
>don't know what =0A=
>"SA04" and "SA11" files are.</FONT></SPAN> </P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>&gt; When he tried to =
>send GP type =0A=
>files, nothing came through.</FONT></SPAN> </P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>I'm afraid I don't =
>know what "GP" =0A=
>files are, either.&nbsp; Sorry.</FONT></SPAN> </P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>In general, for a =
>downstream LDM to =0A=
>be able to receive certain data-products from an upstream LDM, the =
>following =0A=
>must be true:</FONT></SPAN></P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>&nbsp;&nbsp;&nbsp; =
>1.&nbsp; The =0A=
>upstream LDM must receive the data-products.&nbsp; This can =
>be</FONT></SPAN> =0A=
><BR><SPAN lang=3Den-us>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <FONT =
>face=3DArial =0A=
>size=3D2>verified by executing, on the upstream LDM's host, the =0A=
>command</FONT></SPAN> </P>=0A=
><P><SPAN lang=3Den-us>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <FONT =
>face=3DArial =0A=
>size=3D2>&nbsp;&nbsp;&nbsp; pqcat -vl- -f &lt;&lt;feedtype&gt;&gt; -p =0A=
>&lt;&lt;pattern&gt;&gt; -o &lt;&lt;offset&gt;&gt;</FONT></SPAN> </P>=0A=
><P><SPAN lang=3Den-us>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <FONT =
>face=3DArial =0A=
>size=3D2>where</FONT></SPAN> <BR><SPAN =0A=
>lang=3Den-us>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <FONT =
>face=3DArial =0A=
>size=3D2>&nbsp;&nbsp;&nbsp; =0A=
>&lt;&lt;feedtype&gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; is =
>the =0A=
>feedtype of the data-products (e.g.,</FONT></SPAN> <BR><SPAN =0A=
>lang=3Den-us>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =0A=
>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =0A=
>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =0A=
>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <FONT face=3DArial =0A=
>size=3D2>EXP)</FONT></SPAN> </P>=0A=
><P><SPAN lang=3Den-us>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <FONT =
>face=3DArial =0A=
>size=3D2>&nbsp;&nbsp;&nbsp; &lt;&lt;pattern&gt;&gt; =0A=
>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; is the extended regular =
>expression =0A=
>for</FONT></SPAN> <BR><SPAN =0A=
>lang=3Den-us>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =0A=
>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =0A=
>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =0A=
>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <FONT face=3DArial =
>size=3D2>the =0A=
>product-identifier of the data-aproducts.</FONT></SPAN> </P>=0A=
><P><SPAN lang=3Den-us>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <FONT =
>face=3DArial =0A=
>size=3D2>&nbsp;&nbsp;&nbsp; &lt;&lt;offset&gt;&gt;&nbsp; =0A=
>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Is the time-offset in seconds =
>in =0A=
>which</FONT></SPAN> <BR><SPAN =0A=
>lang=3Den-us>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =0A=
>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =0A=
>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =0A=
>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <FONT face=3DArial =
>size=3D2>to go back in =0A=
>the product-queue to find</FONT></SPAN> <BR><SPAN =0A=
>lang=3Den-us>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =0A=
>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =0A=
>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =0A=
>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <FONT face=3DArial =
>size=3D2>matching =0A=
>data-products (e.g., 300 for 5</FONT></SPAN> <BR><SPAN =0A=
>lang=3Den-us>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =0A=
>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =0A=
>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =0A=
>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <FONT face=3DArial =0A=
>size=3D2>minutes).</FONT></SPAN> </P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>&nbsp;&nbsp;&nbsp; =
>2.&nbsp; The =0A=
>downstream LDM must be able to connect to the upstream =
>LDM.</FONT></SPAN> =0A=
><BR><SPAN lang=3Den-us>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <FONT =
>face=3DArial =0A=
>size=3D2>This can be verified by executing, on the downstream host, =0A=
>the</FONT></SPAN> <BR><SPAN =0A=
>lang=3Den-us>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <FONT =
>face=3DArial =0A=
>size=3D2>command</FONT></SPAN> </P>=0A=
><P><SPAN lang=3Den-us>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <FONT =
>face=3DArial =0A=
>size=3D2>&nbsp;&nbsp;&nbsp; ldmping -i 0 &lt;&lt;upstream =0A=
>host&gt;&gt;</FONT></SPAN> </P>=0A=
><P><SPAN lang=3Den-us>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <FONT =
>face=3DArial =0A=
>size=3D2>where &lt;&lt;upstream host&gt;&gt; is the identifier for the =
>upstream =0A=
>host</FONT></SPAN> <BR><SPAN =0A=
>lang=3Den-us>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <FONT =
>face=3DArial =0A=
>size=3D2>(either hostname, fully-qualified hostname, or IP =
>address).</FONT></SPAN> =0A=
></P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>&nbsp;&nbsp;&nbsp; =
>3.&nbsp; The =0A=
>downstream LDM must be allowed to receive the requested</FONT></SPAN> =
><BR><SPAN =0A=
>lang=3Den-us>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <FONT =
>face=3DArial =0A=
>size=3D2>class of data-products from the upstream LDM (i.e., the =
>LDM</FONT></SPAN> =0A=
><BR><SPAN lang=3Den-us>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <FONT =
>face=3DArial =0A=
>size=3D2>configuration-file on the upstream LDM must have =0A=
>appropriate</FONT></SPAN> <BR><SPAN =0A=
>lang=3Den-us>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <FONT =
>face=3DArial =0A=
>size=3D2>entries).</FONT></SPAN> </P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>These three items can =
>be combined =0A=
>into executing, on the downstream host, the single command</FONT></SPAN> =
></P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>&nbsp;&nbsp;&nbsp; =
>notifyme -h =0A=
>&lt;&lt;upstream host&gt;&gt; -f &lt;&lt;feedtype&gt;&gt; -p =0A=
>&lt;&lt;pattern&gt;&gt; -o &lt;&lt;offset&gt;&gt;</FONT></SPAN> </P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>If a downstream LDM =
>process is =0A=
>unable to connect to the upstream LDM server, then the following command =
>can be =0A=
>useful in diagnosing problems:</FONT></SPAN></P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>&nbsp;&nbsp;&nbsp; =
>rpcinfo -n 388 -t =0A=
>&lt;&lt;upstream host&gt;&gt; 300029 6</FONT></SPAN> </P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>This command attempts =
>to contact =0A=
>version 6 of program 300029 (the LDM) via a TCP connection to port 388 =
>on host =0A=
>&lt;&lt;upstream host&gt;&gt;.&nbsp; Because this command is =
>non-standard, it =0A=
>might be necessary to adapt it to your system by using different =0A=
>options.</FONT></SPAN></P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>&gt; They were never =
>able to see =0A=
>files from us but did get notified of our </FONT></SPAN><BR><SPAN =0A=
>lang=3Den-us><FONT face=3DArial size=3D2>&gt; files on the =
>notifier.&nbsp; (Brice, =0A=
>please elaborate.)</FONT></SPAN> </P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial =
>size=3D2>Notifier?</FONT></SPAN> </P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>&gt; They had to get =
>ready for a =0A=
>power outage so Brice asked if I would </FONT></SPAN><BR><SPAN =
>lang=3Den-us><FONT =0A=
>face=3DArial size=3D2>&gt; send you an E-mail to find out if having two =
>ethernet =0A=
>ports could </FONT></SPAN><BR><SPAN lang=3Den-us><FONT face=3DArial =
>size=3D2>&gt; =0A=
>cause a problem with ldm.</FONT></SPAN> </P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>By default, the LDM =
>server will =0A=
>listen for incoming connections on all available interfaces.&nbsp; This =
>is, =0A=
>usually, not a problem.&nbsp; We're running the LDM on several =
>multi-homed =0A=
>computers here.</FONT></SPAN></P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>This default can be =
>overridden via =0A=
>the $ip_addr variable in the file "etc/ldmadmin-pl.conf".</FONT></SPAN> =
></P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>&gt; They use the =
>workstation as a =0A=
>bridge/firewall between their LAN and </FONT></SPAN><BR><SPAN =
>lang=3Den-us><FONT =0A=
>face=3DArial size=3D2>&gt; ours.&nbsp; He thinks it may be getting =
>confused since he =0A=
>sees mention of </FONT></SPAN><BR><SPAN lang=3Den-us><FONT face=3DArial =
>size=3D2>&gt; =0A=
>an ldm5 and ldm6.&nbsp; We only see ldm6 referenced in our log (see =0A=
></FONT></SPAN><BR><SPAN lang=3Den-us><FONT face=3DArial size=3D2>&gt; =
>attached) and =0A=
>only have one ethernet port.</FONT></SPAN> </P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>Looking at just one =
>downstream LDM =0A=
>process on host "rsaintrf", I see the following at the beginning of the =
>log =0A=
>file:</FONT></SPAN> </P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>&nbsp;&nbsp;&nbsp; Jan =
>27 21:10:40 =0A=
>ftpsvr rsaintrf[20183] NOTE: Starting Up(6.4.2): =
>rsaintrf.midds.jsc.nasa.gov:388 =0A=
>20060127201040.869 TS_ENDT {{ANY,&nbsp; ".*"}} </FONT></SPAN></P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>&nbsp;&nbsp;&nbsp; Jan =
>27 21:10:40 =0A=
>ftpsvr rsaintrf[20183] NOTE: LDM-6 desired product-class: =
>20060127210938.347 =0A=
>TS_ENDT {{ANY,&nbsp; ".*"},{NONE,&nbsp; =
>"SIG=3D9b7a056982e167351f69140376671e58"}} =0A=
></FONT></SPAN></P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>&nbsp;&nbsp;&nbsp; Jan =
>27 21:10:41 =0A=
>ftpsvr rsaintrf[20183] NOTE: Upstream LDM-6 on =
>rsaintrf.midds.jsc.nasa.gov is =0A=
>willing to be a primary feeder </FONT></SPAN></P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>&nbsp;&nbsp;&nbsp; Jan =
>27 21:21:01 =0A=
>ftpsvr rsaintrf[20183] ERROR: Terminating due to LDM failure; Connection =
>to =0A=
>upstream LDM closed </FONT></SPAN><BR><SPAN lang=3Den-us><FONT =
>face=3DArial =0A=
>size=3D2>&nbsp;&nbsp;&nbsp; Jan 27 21:21:01 ftpsvr rsaintrf[20183] NOTE: =
>LDM-6 =0A=
>desired product-class: 20060127211936.115 TS_ENDT {{ANY,&nbsp; =0A=
>".*"},{NONE,&nbsp; "SIG=3D4330b0a1b311f387d2038d03dd7faa67"}} =
></FONT></SPAN></P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>&nbsp;&nbsp;&nbsp; Jan =
>27 21:21:01 =0A=
>ftpsvr rsaintrf[20183] ERROR: Terminating due to LDM failure; Couldn't =
>connect =0A=
>to LDM on rsaintrf.midds.jsc.nasa.gov using either port 388 or =
>portmapper; : =0A=
>RPC: Program not registered </FONT></SPAN></P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>&nbsp;&nbsp;&nbsp; =
>...</FONT></SPAN> =0A=
></P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>The above indicates =
>that, after an =0A=
>initial, successful connection to the upstream LDM on host "ftpsvr" =
>(from =0A=
>21:10:41 to 21:21:01) the downstream LDM on "rsaintrf" lost the =
>connection and =0A=
>was unable to reconnect because the upstream LDM wasn't available: it =
>was unable =0A=
>to create a TCP connection to port 388 on the upstream host (because =
>nothing was =0A=
>listening on that port) and the LDM wasn't registered with the =
>portmapper on any =0A=
>other port on the upstream host.</FONT></SPAN></P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>The log file also =
>contains the =0A=
>following:</FONT></SPAN> </P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>&nbsp;&nbsp;&nbsp; Jan =
>27 21:11:03 =0A=
>ftpsvr rsaintrf[20348] NOTE: Data-product with signature =0A=
>60f97e00a793afb4f67c7dd94fe46e41 wasn't found in product-queue =0A=
></FONT></SPAN></P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>&nbsp;&nbsp;&nbsp; Jan =
>27 21:11:03 =0A=
>ftpsvr rsaintrf(feed)[20348] NOTE: Starting Up(6.4.2/6): =
>20060127205131.054 =0A=
>TS_ENDT {{ANY,&nbsp; ".*"}}, Primary </FONT></SPAN></P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>&nbsp;&nbsp;&nbsp; Jan =
>27 21:11:03 =0A=
>ftpsvr rsaintrf(feed)[20348] NOTE: topo:&nbsp; =
>rsaintrf.midds.jsc.nasa.gov =0A=
>{{ANY, (.*)}} </FONT></SPAN><BR><SPAN lang=3Den-us><FONT face=3DArial =0A=
>size=3D2>&nbsp;&nbsp;&nbsp; Jan 27 21:27:11 ftpsvr rsaintrf(feed)[20348] =
>ERROR: =0A=
>feed or notify failure; HEREIS: RPC: Unable to send; errno =3D Broken =
>pipe =0A=
></FONT></SPAN></P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>&nbsp;&nbsp;&nbsp; Jan =
>27 21:27:11 =0A=
>ftpsvr rpc.ldmd[20180] NOTE: child 20348 exited with status 7 =
></FONT></SPAN></P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>The above indicates =
>that an upstream =0A=
>LDM process was started on host "rsaintrf" feeding data-products of =0A=
>feedtype/pattern ANY/.* to a downstream LDM on host "ftpsvr" using =
>primary =0A=
>exchange mode.&nbsp; This process lasted from 21:11:03 to 21:27:11 at =
>which time =0A=
>the upstream LDM was unable to send a data-product to the downstream LDM =
>because =0A=
>the connection was broken for some reason (the reason might be found in =
>the LDM =0A=
>log file on host "ftpsvr").&nbsp; At this time, the upstream LDM on host =0A=
>"rsaintrf" exited.</FONT></SPAN></P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>I hope this =
>helps.&nbsp; Feel free =0A=
>to contact me with any questions.&nbsp; Also, if I can log onto the =
>systems in =0A=
>question as the LDM user, then I should be able to more easily diagnose =
>any =0A=
>problems.</FONT></SPAN></P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>Incidentally, Tom =
>Yoksas will also =0A=
>be at the AMS meeting in Atlanta, where he will be presenting several =
>papers on =0A=
>the LDM and Internet data distribution.&nbsp; He has considerable =
>experience =0A=
>diagnosing connectivity problems in LDM networks.&nbsp; You might tell =
>Brian =0A=
>Hoeth to look him up.</FONT></SPAN></P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>He'll have a laptop =
>and the two of =0A=
>them might be able to solve all your problems while at the =0A=
>convention.</FONT></SPAN> </P>=0A=
><P><SPAN lang=3Den-us><FONT face=3DArial size=3D2>Regards,</FONT></SPAN> =
><BR><SPAN =0A=
>lang=3Den-us><FONT face=3DArial size=3D2>Steve Emmerson</FONT></SPAN> =
><BR><SPAN =0A=
>lang=3Den-us><FONT face=3DArial size=3D2>LDM Developer</FONT></SPAN> =0A=
></P></DIV>=0A=
>=0A=
></BODY>=0A=
></HTML>
>------_=_NextPart_001_01C62738.57610B47--
Cheers,

Tom
--
+-----------------------------------------------------------------------------+
* Tom Yoksas                                             UCAR Unidata Program *
* (303) 497-8642 (last resort)                                  P.O. Box 3000 *
* address@hidden                                   Boulder, CO 80307 *
* Unidata WWW Service                             http://www.unidata.ucar.edu/*
+-----------------------------------------------------------------------------+


NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.