[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

19991004: ldm-mcidas nids2area causing a Solaris x86 page fault?



>From: Bryan Rockwood <address@hidden>
>Organization: Creighton
>Keywords: 199910042035.OAA02665 McIDAS LDM

Bryan,

I will respond to the first part of this question and pass along the
issue of latency to Robb Kambic for separate reply.

>Figured I hadn't bugged you guys in a few months or so.  Guess I was due.

Just when I thought it was safe to go into the water ;-)

>Some really short questions, then I will be out of your hair.  First off,
>thanks for the suggestion on the SCSI hardware.  We have since upgraded to
>a 9 Gig SCSI drive and it works like a dream.

Super.

>NFS seems even more
>responsive then when I switched from Linux to Solaris.

Well, a large part of this might be that Solaris (both SPARC and x86)
support NFS 3 and Linux doesn't (yet).

>Also, if anyone is
>interested, the happy medium we ended with was Solaris for the ingest and
>Linux for the clients.  Have been mucho happy ever since.

OK.  We recently loaded an NFS-3 beta on our RedHat 6.0 system (also should
run on RedHat 5.2, but no guarantees) and got _significant_ throughput
improvement in read/write to NFS mounted drives.  It really _IS_ a
big improvement.  In my environment a build of McIDAS-X 7.6 went from
about 4 hours down to an hour and something (I didn't get good numbers).

>Now, the hassle part.  Number one has to do with the binary distros
>available for the ldm-mcidas on the ftp server.  We are running Solaris
>2.6 with the latest patches.  When I was setting things up, I grabbed the
>ldm-mcidas for Solaris 7.0.  Things seems to work well enough up until
>about three days ago.  I decided to install Apache and symbolically link
>to the logs directory so I could easily keep track of the logs and see if
>there was anything wrong (beyond the normal notification capablility).
>Later that day, I came back to see that the server had rebooted.  Being
>gone from the room when it happend, I had wondered if someone had
>restarted the machine by accident.  No one had, so I just wrote it off as
>an odd occurence.  About two hours later, I was able to catch the machine
>in the middle of a reboot.  The screen read that nids2area had caused a
>page fault and the system was going down.  It reboot and started the LDM
>once again.  I was just curious if the fact that I am using the Solaris 7
>distro would make a difference and be the cause of the reboot.

I would not have expected this, but I do know that there are significant
differences between Solaris 6 and 7 shared libraries.  This may be the
problem.  I believe that I also stuck out a binary version for Solaris
6 (aka 2.6).  Did you try that one?

>If not, I
>will probably look else where to see what might be up.  It has not done a
>repeat performance to this day, but I figured it couldn't hurt to ask.

I can't answer for sure, so I will not speculate.

I will chime in and say that you would be better off not running
nids2area.  I have include both NIDS and NOWrad (tm) servers in McIDAS
7.[56] that access the NIDS/NOWrad data in native format.  This allows
one to keep _LOTS_ more data on line (the NIDS and NOWrad native
formats are much smaller than the same data in AREA), and also to
allow the data to be shared with GEMPAK/GARP.  The other nice
thing is that the load on your server goes down since all of that
decoding doesn't have to be run.  You should check out the section:

Configuring Unidata NIDS and NOWrad ADDE Servers
http://www.unidata.ucar.edu/packages/mcidas/mcx/config_upcadde.html 

of my online McIDAS installation/configuration documentation.

I will let Robb handle the next bit.

>Second, and more importantly, we are having data reception problems.
>About the middle of the day (from 11A.M. to 3P.M.), I get mostly reclass
>messages in the log files for the satellite data.  Also, surface obs are
>noticably late in getting to the machine and some times incomplete.  This
>occurs on both the primary (chinook.unl.edu) and the secondary 
>(weather.admin.niu.edu).  Further analysis from a traceroute shows pings 
>on the order of 500ms during this time of day to both locations.  Now, I 
>would have figured this was due to our split T1 on campus, but found out 
>that about a month ago, they switched from one split T1 to two dedicated 
>T1 lines.  Secondly, both the NIDS data and the lightning data are
>on time.  I know they are smaller data sets, but pings on the 
>traceroutes are about 200 to 300 ms smaller.  Is there something I could
>do in this case?  Would seeking different servers help?  Or would it
>be best to start talks with our ISP (US West) first and go from there?
>Any guidance would be appreciated.  Thanks.

Tom