[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Platforms #QXN-439339]: New hardware suggestions?

Subject: [Platforms #QXN-439339]: New hardware suggestions?
Date: Fri, 06 Oct 2006 15:57:27 -0600
Hi Pete,

re:
> We have some money ($16K or so) to spend on a new data ingest/archive
> machine, and I am curious if you have any suggestions on what to
> look for/avoid as I'm spec'ing it out.

We don't have much experience in the archive end, so all comments will be
related to data ingest and relay.

> Ideally, this will be our primary ldm ingest/feed machine, to
> replace f5.aos.wisc.edu. I'd like to have all of the data feeds
> that I currently get running through it (DDPLUS, HDS, MCIDAS, CONDUIT,
> NIMAGE, NEXRAD, NEXRAD2, etc), be able to feed several downstream
> sites that I currently feed, and have the power and io bandwidth
> to be able to store data locally in native and decoded to gempak
> and/or netcdf format, and make this data available via nfs to our
> computer classroom and other machines on our network.

So, you want this machine to serve as your toplevel relay AND do data
decoding?  Our approach was to split data decoding and serving (NFS,
ADDE) to two different machines/groups of machines.  You may already
be aware that the toplevel IDD relay that we operate here at the UPC,
idd.unidata.ucar.edu, is actually a Linux cluster composed of:

2 - accumulators, machines that request data feeds from upstream sites
1 - director, a machine that sends feed requests to back end data servers
4 - data servers, these are the machines that feed downstream sites

For a couple of years the accumulator machines consisted of a dual 1.8 Ghz
Opteron PC w/2 GB of RAM running Fedora Core Linux (1, then 3, then 4) and
a dual 1 Ghz P4 box w/3 GB of ram running FreeBSD 4.x.  We have recently
upgraded the Opteron box to a dual 2.8 Ghz Xeon EM64T (64-bit) box w/6 GB
of RAM running 64-bit Fedora Core 5 Linux.  We have plans to upgrade the
FreeBSD box in the not too distant future.

The director is currently a dual 2.8 Ghz, 32-bit Xeon machine, but it really
doesn't need to be.  We could get run a much less richly configured machine
as the director.

The dataservers are all dual 2 Ghz Opteron boxes with 14 or 16 GB of RAM.
The large amount of memory allows us to keep over two hours of ALL of the
data being relayed in the IDD in the LDM queue.

The cluster has been show to be able to relay up to 900 Mbps of data in
stress tests almost a year ago.  It currently relays over 230 Mbps of
data _on average_ to about 400 downstream connections, and routinely
has peak relay rates of 440 Mbps.  Having 4 dataservers in the cluster
and a Gpbs network allows us to act as the failover for any/all IDD
connections in the world.

The entire cluster cost under $25K to put together.  A more modest cluster
could be put together for under $15K.  The sizing of the cluster would
depend on how many downstream sites one desired to be able to feed and
how much data one wanted to have available in the LDM queue.

> I'd also like to keep on the order of a years worth of some of this
> data available online, so a large amount of storage (probably RAID5?)
> is needed as well.

I would recommend that you split your archive and ingest/relay duties
between two or more machines.  For instance, you could purchase a
machine similar to one of the data servers we are using (dual Opteron
or Xeon EM64T) with alot of memory (>=6 GB) for about $5K.  I would
dedicate a system like this to ingest and relay.  I would then get another
machine with a large RAID to do your data decoding, storage and serving
(NFS, etc.).  To save one year's of data you will need a HUGE amount
of disk storage.  For instance, the LEAD project has a 40 TB RAID system
that is intended to store at least 6 months of data.  It is uncertain
really how much data can actually be stored, but it currently appears
to be much more that 6 months (plus the LEAD RAID is storing lots and
lots of model output including ensembles).

> Does it make sense to have a single machine handle both of these tasks,

No.

> or would it maybe make more sense to get one machine that does the
> ingest/feed/decoding and short-term storage, and another that is
> aimed more at the long term storage?

I suggest moving the decoding off of the ingest/relay machine.

> I'm also wondering if you have any suggestions regarding SCSI/SATA,
> or iSCSI or Fibre Channel, etc..

On Linux boxes, external SCSI-based RAIDs appear to perform much
better than RAIDs created using internally mounted disks and a
RAID card.  The RAIDs we have attached to Sun SPARC machines are
all connected with Fiber Channel.  These appear to work much
better than the external units that are built of much cheaper
disk drives, but your milage may vary.

> What kind of machines are people currently buying for ldm
> ingest/feed/storage machines?

Penn State and U Nebraska-Lincoln are creating clusters similar
to the one that we built.  TAMU followed our lead in purchasing
dual Opteron-based machines with lots of memory for data ingest
and relay.  TAMU also split off their processing from their ingest/
relay duties, and it has apparently worked pretty well for them.

I realize that the cluster I described briefly above is not what
you were asking for, but it includes a number of features that
we feel are necessary for new ingest/relay machines:  fast processors
and lots of memory.  Again, we don't have that much experience in
archive systems (out side of LEAD where folks are in a learning
mode), so we can't say much there.


Cheers,

Tom
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: QXN-439339
Department: Support Platforms
Priority: Normal
Status: Closed
Prev by Date: [Platforms #GWY-819774]: How do I get started?
Next by Date: [LDM #WYF-519931]: LDM - Future use of LDM for next generation GOES
Previous by thread: [Platforms #GWY-819774]: How do I get started?
Next by thread: [Platforms #QXN-439339]: New hardware suggestions?
Index(es):
- Date
- Thread