[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20010621: McIDAS-X remote ADDE server on Debian Linux (cont.)



>From: "David B. Bukowski" <address@hidden>
>Organization:  COD
>Keywords:  200106202104.f5KL4N116106

Dave,

re: what doesn't work on Linux that works everywhere else
>What specifically are we looking at?  I might be able to figure it
>out.  :)  If not at least go thru some digging around and find something

Here is what happens:

o a client application makes a request for sounding data to a remote
  server

o inetd (or xinetd on RedHat 7.x Linux) gets the interrupt on either
  port 500 (uncompressed ADDE) or 503 (compressed ADDE); it then
  starts whetever program it was configured to start.  This will
  be the script ~mcidas/bin/mcservsh

o mcservsh reads the contents of ~mcidas/.mcenv to get environment
  information needed to run McIDAS applications; it then execs the
  McIDAS executable 'mcserv'

o mcserv reads the request from the client and then decides on what
  ADDE server to start.  In the case of sounding requests, the
  server that is started is vpserv (vertical profile server)

o vpserv looks at the request that the client made and then acts
  on it; the way it acts on the request is fundamentally different,
  however, from the way other ADDE servers work...  if the
  client requested full sounding data (mandatory AND significant
  level upper air data), then vpserv will make ADDE server
  calls to the McIDAS point server, first for the mandatory level
  data and then for the significant level data.  All other McIDAS
  ADDE servers would read directly from data files at this point.
  vpserv is the only server that fulfills an incoming data request 
  by making two, consecutive server requests

o vpserv gets all of the mandatory level data from the point source
  server and then shuts down that transaction;  as it gets the 
  data from the point source server, it is sending it back to
  the client; this all seems to proceed smoothly.

  After all of the mandatory level upper air data has been received by
  vpserv and sent to the client, vpserv makes a request from the point
  server for the significant level data

o at some point during the read (from point server) / write (to the
  client) the connection through the port is broken.  The breaking
  of the connection is not initiated by vpserv, nor is it initiated
  by the client.  An 'strace' of the goings on of vpserv and the
  equivalent on the client machine (e.g., an strace for Linux; a
  truss for Sun Solaris; etc) shows that both vpserv and the client
  are "suprised" by the connection going away.  The client is in
  a read look that gets an error; vpserv is in a read/write loop
  when the write starts failing.

It is as though the OS has shut down the pipe through which vpserv is
communicating back to the client.  Why this would happen is unknown.  I
should point out that transactions through the remote ADDE server for
other data types (e.g., image, single point (not sounding), grid, etc.)
work with no problems in Linux.  Also, the exact same McIDAS code built
and run on RedHat 5.2 runs perfectly.  It is almost as if the shutting
down of the server request for mandatory level data from vpserv is
somehow triggering something in the OS to shut down the connection
through the port that vpserv is using to communicate with the client.

If you are game for some exploring, it would be very useful to find out
if there was any reworking of signals in versions of Linux between
RedHat 5.2 and 6 (the change may actually be between RedHat 6.0 and
6.1; I don't know and I can't test this since I don't have a RedHat 6.0
system to use for testing).  If the change was not in signals, then
what major reworking was done inbetween these versions of Linux that
would cause _something_ to shut down the connection of a process that
was communicating through an open port?

re: logs

>Jun 21 13:04:21 weather pnga2area[8565]: Starting Up
>Jun 21 13:04:21 weather pnga2area[8565]: output file
>pathname: /home/data/mcidas/AREA1140
>Jun 21 13:04:21 weather pnga2area[8565]: unPNG::    57277    322320
>5.6274
>Jun 21 13:04:21 weather pnga2area[8565]: Exiting

These logs should be going into the ~ldm/logs/ldmd.log file.  The
invocation for pnga2area does not say where logging should occur,
so it should go to the same place as the LDM logging:

MCIDAS  ^pnga2area Q. (..) (.*) (.*) (.*) (.*) (........) (....)
        PIPE    -close
        pnga2area -v -d /home/data/mcidas -r \1,\2

Now, one could force logging into ~ldm/logs/ldmd.log by specifying the
-l flat to pnga2area (and other ldm-mcidas decoders):

MCIDAS  ^pnga2area Q. (..) (.*) (.*) (.*) (.*) (........) (....)
        PIPE    -close
        pnga2area -vl /home/ldm/logs/ldmd.log -d /home/data/mcidas -r \1,\2

but this shouldn't be necessary.

Hmm...  I just logged onto weather and see that ALL LDM logging is going
to syslog:

Jun 21 12:43:01 weather cdstats[30625]: Connection from cdstats.cod.edu
Jun 21 12:43:01 weather cdstats[30625]: Connection reset by peer
Jun 21 12:43:01 weather cdstats[30625]: Exiting
Jun 21 07:43:01 weather sendmail[30626]: NOQUEUE: Null connection from cdstats.c
od.edu [10.11.0.63]
Jun 21 12:43:28 weather pqact[13833]: child 30776 exited with status 2
Jun 21 12:43:47 weather pqact[13833]: child 30898 exited with status 2

This means that syslog.conf is not setup the way we are used to for
the LDM.

A quick look on your machine shows that there are ~ldm/log/ldmd.log* files,
but they have not been updated since April 17.  Also, the ownership
on the ldmd.log.1, ldmd.log.2, etc. files was incorrect; they were owned
by root when they should be owned by ldm.

re: continuing with syslog discussion
>actually not running syslogd  running syslog-ng  been trying to figure out
>the filtering configuration on it lately so it basically logs EVERYTHING
>into debug i guess... well all the ldm stuff that normally goes in
>~ldm/logs ends up in /var/log/debug since it is the DEBUG "channel" ldm
>sends stuff to.

I just chatted with Anne Wilson about your use of syslog-ng, and we were
in agreement that we don't know enough about syslog-ng/syslog-ng.conf
to be able to help you.

re: leaving ports 500 and 503 open
>I'll leave them open but would like to know if theres specific machines or
>a netblock i can narrow access down to...  Otherwise i'll just leave them
>open.

How about right now you leave them open.  This will allow me to work
from home during the troubleshooting phase for the sounding serving problem.
Later, you could cut down access to those domains you wouldn't mind serving
data through ADDE.

Tom