[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20010320: NOAAPORT SDI system at LSU (cont.)



>From: Denise Laitsch <address@hidden>
>Organization: SSEC
>Keywords: 200103201621.f2KGLNL13513 SDI IDD NOAAPORT

Denise,

>We've seen demods fail in the field; not an SDI to date. Maybe you have
>the first one.

OK, this is useful to know.

>If someone can tell us how the ingestor is restarted after a failure,
>it might help sort this out (but don't give up on the scope just yet).
>Do you power off the computer? Do you reset the demod, or something
>else?

Here is what happens:

o the error about clock being lost is seen and ingestion stops

o the 'inge' process can not be killed

o a reboot of the computer is done; this can be either a warm (root
  typing a 'reboot') or cold (cycling power on the PC) reboot

o the demod is not touched during this process; the serial cable
  connecting the demod to the SDI card is also not touched

o the ingestion starts up with no problems after the reboot

>Here'why. You've noted that the ingestor isn't restarting on its own
>and you can't manually restart it.

Restarting is not an option given that the 'inge' process can not even
be killed.

>That would happen because the
>ingestor still has data in its buffer and can't flush it (clock flushes
>the data). So, for example, clock could stop and the ingestor would
>hang (waiting for clock). If you started clock again, the ingestor
>would start on its own and you would see a gap in the data. Noise in
>the signal could cause this to happen.

We figured that this is how things should work.  In order to test this,
we went over and ran an experiment on the SDI box that is housed here
at UCAR.  We disconnected the serial cable from the demod.  After a
several seconds, we got the messages:

NMC 2001.079.184404:SIGNAL NOT PRESENT
NMC 2001.079.184404:clock stopped or ingestor died or hardware died

When we plugged the serial cable back in several seconds later, we got:

NMC 2001.079.184433:begin processing data
NMC 2001.079.184433:bit error
NMC 2001.079.184433:bit error
NMC 2001.079.184433:bit error
 ...

The bit error messages continued for a short bit and then stopped
when things were synced up again.

>On the other hand, if you see that clock resumes and the ingestor is
>still hung, this would point to the SDI card not flushing its buffer,
>i.e.,  a broken card.

We have been assuming that the symptoms indicate a broken card since:

o the inge process remains hung

o a reboot of the machine _without touching the demod_ results in
  a running system

>In this scenario only a boot or powering the PC
>off would start the ingestor. This might be another way of telling us
>if clock is getting to the card. Maybe someone remembers the events?

The events transpire as I listed them above.

>Just a brief summary... the ingestor is always awake and listening for
>clock and data. If the ingestor is hung and the data chain looks
>normal, try to boot or power off the PC. If the ingestor starts filing
>data immediately following the boot, the card is probably bad.

This is exactly the conclusion we arrived at using the same logic train.

So, can we get a replacement card?

Tom