[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: earth referencing data



> Organization: NOAA/PMEL
> Keywords: 199401260159.AA11032

Hi Steve,

> A quick question: I am involved with a committee that is investigating
> some options for network-wide (WAN) data sharing.  One of the strategies
> being discussed is to design an API for accessing the desired classes of
> data and then to cast that API as an RPC interface.  The netCDF API is
> under consideration as a subset of the full API.
> 
> Do you know if anyone has already cast the netCDF API as an RPC interface?
> Do you have any thoughts/recommendations on the subject?

I don't know if anyone has already done this.  We had planned to do it as
part of a joint proposal to develop a netCDF server, but the effort never
got funded.  I think it would be a good idea.  I've appended some discussion
generated earlier on this topic.

--Russ

Date: Tue, 1 Dec 92 10:58:33 -0700
From: Russ Rew <address@hidden>
Message-Id: <address@hidden>
To: address@hidden (Emmanuel Arbogast)
In-Reply-To: Emmanuel Arbogast's message of Wed, 25 Nov 92 19:46:08 PST 
<address@hidden>
Subject: netCDF

> Organization: StatSci
> Keywords: 199211260411.AA18267

Emmanuel,

Here's some email disussion we had earlier this year about a netCDF servers.
I haven't asked any of the people involved if it was OK with them that I
forwarded this discussion to someone else, but I think it should be
illuminating for background information on some of the issues.

--Russ

>From russ Tue Jul 14 08:26:39 1992
To: address@hidden
CC: support, davis, fulker, steve, ben
In-reply-to: Joe Sirott's message of Mon, 13 Jul 92 14:12:17 -0700 
<address@hidden>
Subject: netCDF data server

Hi Joe,

> Have you guys done any work on a netCDF data server? It would be nice
> to exchange netCDF data between processes without having to use
> files.

No, we haven't done any work on a netCDF data server other than discussing
the idea with the Unidata Implementation Working Group, who decided it was
lower priority than several other development tasks, including implementing
the netCDF operators we have specified.  You are the second person to bring
this up recently, so perhaps it deserves to be reexamined.  Here's some
excerpts from an email discussion I had with Tomas Johannesson (address@hidden) 
in
June on this subject (my comments are unprefixed):

tj> 1) I see that a network server for netCDF is planned (section 1.6 on p.
tj>    11 in the netcDF User's Guide). Has the date of the release of such a
tj>    server been fixed? Has a design draft been distributed? Will it
tj>    support TCP/IP?

The development of a netCDF server has dropped in priority, mostly because
the benefits of a server didn't seem to justify the work required.  The
initial ideas behind a netCDF server were to make reads from scattered parts
of a netCDF file more efficient across a network, to support memory-resident
netCDF data, and to put a netCDF interface on non-netCDF data.  These three
uses don't seem to fit together very well.  If we do implement any servers
(or if someone else contributes one), they would probably use remote
procedure calls, which will work over TCP/IP but can also be made
transport-independent.

tj> Regarding my question about a network server, I had plans to use
tj> a possible netCDF server to make "reads from scattered parts
tj> of a netCDF file more efficient across a network" as you say and also
tj> and perhaps more importantly to make a kind of database of netCDF file
tj> available over a network without the need to mount the directories
tj> containing the netCDF files over NFS on all computers that want to
tj> access the database or manually move each file with ftp (this must
tj> be an important and useful feature for other people than me).
tj> The latter possibility might allow you to open a file on a remote
tj> computer with something like
tj>   "ncopen("<computer>:/pub/netcdfdata/file1234.cdf",NC_NOWRITE)"
tj> where <computer> is a server with a large collection of netCDF files.

I hadn't realized it was so important to avoid NFS mounts.  I was under the
impression that read-only NFS mounts are relatively cheap.  We use Sun's
"automounter" here, and I understand there is a freely available NFS
automounter called "amd" that also runs on other NFS platforms.  It seems to
me that having the server's netCDF file directory mounted as needed by the
automounter daemon on all the computers that want to access the database
would be easy, and you can set the automounter timeout to a short enough
time that the filesystem gets unmounted quickly when not needed by a client.
But there may be other problems with this I'm not aware of, perhaps the
client machines are MSDOS computers without multitasking, so they can't run
daemons?  If you know of common circumstances under which having a netCDF
server would provide significant performance advantages over using NFS and
an automounter, I'd be interested.

tj> Using a TCP/IP server should be more economical as all the
tj> seeks/reads/writes are performed on the computer where the data is
tj> stored instead of going through the NFS layer, which probably needs a
tj> request to be sent between the client computer and the server computer
tj> for each and every seek/read/write call.  I tested this briefly on my
tj> computer and I found a factor on the order of two performance gain.
tj> Another point is that you don't need NFS if you have the TCP/IP server
tj> (NFS might not be installed on the computer) and you can have very
tj> strict control of the file access of the users when you are using a
tj> TCP/IP server.  I might also mention that most commercial DBMSs are
tj> based on client/server networking in order to boost performance by
tj> having the reading and writing of data taking place on the machine where
tj> the data is physically stored.  In that case an important point is that
tj> the query processing is done locally which minimized the data volume
tj> which is sent over the network.  This is probably not important for
tj> netCDF.

--Russ

>From russ Mon Jul 27 10:36:00 1992
To: fulker
Subject: [address@hidden: Re: netCDF data server]

Dave, 

I forgot to forward this netCDF server discussion to you, as I had promised.
I've concatenated three messages.  The first is Glenn's reply to Joe Sirott,
followed by Joe's reply to Glenn and finally Joe's reply to me.

--Russ

Date: Tue, 14 Jul 1992 12:34:37 -0600
From: "Glenn P. Davis" <address@hidden>
To: address@hidden, address@hidden
Subject: Re: netCDF data server

> From: Joe Sirott <address@hidden>
> 
> Some comments on your previous message:
> 
> 1) One advantage a netCDF server would have over NSF mounts is that
> many sites are not willing to export a file system to the general public;
> this means that the netCDF files are only available via anonymous ftp.
 
This is an important point. 
Note that NFS is a mature system with many people working on it.
It's security is important to the community at large, so it gets
tested very thoroughly. It's shortcoming are know and understood
by the systems community.

It is unlikely that we would come up with a system that has better access
control than NFS.  Even if we did, it is unlikely that a site that
doesn't trust NFS to control access to its stuff is going to trust our
stuff. Think about it.

> 2) A netCDF server would require the definition of a communications
> protocol between two netCDF dependent processes. This would lead to some
> interesting results. For instance, a program I developed (Freud) allows
> visualizion of data sets, but no analysis; however, if netCDf data could
> be exchanged between processes without requiring the creation of files
> (via TCP or shared memory, for instance), I could seamlessly send data
> from my program to another program (like MATLAB) and then ship it back
> after transformation.

One program, the data server, is somehow going to have to instanciate a
netcdf 'object' that the client can reference (nc_open). If multiple
objects are to available, that object must exist in a namespace.
There must be functions to find out what is in the namespace.
UNIX / NFS has a mature, well understood namespace: the file system.
It also has functions and utilities for querying and maniupulating
that namespace. Again, we would be hard pressed to come up with something
better.

It is very easy for two processes to share information via the filesystem,
NFS or not. If you want something fancier, like shared memory sort of
access to the shared object, you can 'mmap(2)'  the file.
netCDF will soon support this transparently on systems that support
mmap(2).

> Another possibility would be shipping live data from models or simulations
> across a network to a visualization program that could dynamically view
> the model results.
 
People do this now. The reader does ncsync() to get the update.

----

The point is, to do a netcdf server "right", you end up duplicating
functionality that is provided by the (network) operating system: namespace,
access control, efficient I/O blocking, etc. My opinion is that it is better
to let the OS do this.

-glenn


Date: Tue, 14 Jul 92 14:49:33 -0700
From: Joe Sirott <address@hidden>
To: address@hidden
Subject: Re: netCDF data server


 
> > 1) One advantage a netCDF server would have over NSF mounts is that
> > many sites are not willing to export a file system to the general public;
> > this means that the netCDF files are only available via anonymous ftp.
>  
> This is an important point. 
> Note that NFS is a mature system with many people working on it.
> It's security is important to the community at large, so it gets
> tested very thoroughly. It's shortcoming are know and understood
> by the systems community.
> 
> It is unlikely that we would come up with a system that has better access
> control than NFS.  Even if we did, it is unlikely that a site that
> doesn't trust NFS to control access to its stuff is going to trust our
> stuff. Think about it.
> 

That's not true, for a couple of reasons. First, my understanding (I'm not a
networking guru) is that NFS relies on RPC authentication procedures for
security.  That means that any application that uses the highest levels of
RPC security will be as secure as NFS.

Also, obviously the NFS server has to be able to write a file to a
filesystem.  This means that there is the possibility of tricking the server
by intercepting a packet destined for the server and changing the request
mode to the server. A netCDF server would be read-only -- even if someone
played around with a request to the server, it couldn't force a write to the
filesystem.

Finally, the NFS daemon has to run as root, so that it can set ownership,
etc. on files. A netCDF server would not have to be run as root, so the
damage it could do could be limited.

Now, convincing users that it`s secure might be a different matter.


> > 2) A netCDF server would require the definition of a communications protocol
> > between two netCDF dependent processes. This would lead to some interesting
> > results. For instance,
> > a program I developed (Freud) allows visualizion of data sets, but no
> > analysis; however, if netCDf data could be exchanged between processes 
> > without
> > requiring the creation of files (via TCP or shared memory, for instance), 
> > I could seamlessly send data from my program to another program (like 
> > MATLAB) 
> > and then ship it back after transformation.
> 
> One program, the data server, is somehow going to have to instanciate a
> netcdf 'object' that the client can reference (nc_open). If multiple
> objects are to available, that object must exist in a namespace.
> There must be functions to find out what is in the namespace.
> UNIX / NFS has a mature, well understood namespace: the file system.
> It also has functions and utilities for querying and maniupulating
> that namespace. Again, we would be hard pressed to come up with something
> better.
> 

I'm not sure what is so difficult about defining the namespace you refer to.
For netcdf objects that are stored as files, you can still use the Unix name
space (for a given machine); for netcdf objects from processes, the
namespace could be defined, for instance, using a string with the process id
and the variable name. A process would register with the server which could
demultiplex the data to as many clients as requested data.

> It is very easy for two processes to share information via the filesystem,
> NFS or not. If you want something fancier, like shared memory sort of
> access to the shared object, you can 'mmap(2)'  the file.
> netCDF will soon support this transparently on systems that support
> mmap(2).
> 

How do processes handshake when sharing a file in this way? I suppose you
could use file locking, but why not do it right? How do processes on multiple
machines communicate if they don't have NFS? All machines that are connected
to a network have to support SOME kind of transport protocol, but they don't
necessarily have NFS.

> > Another possibility would be shipping live data from models or simulations
> > across a network to a visualization program that could dynamically view
> > the model results.
>  
> People do this now. The reader does ncsync() to get the update.
> 
> ----
> 
> The point is, to do a netcdf server "right", you end up duplicating
> functionality that is provided by the (network) operating system: namespace,
> access control, efficient I/O blocking, etc. My opinion is that it is better
> to let the OS do this.
> 

RPC calls don't require rewriting the operating system; they're built on the
TPC/IP transport and network layers of the OSI network model. Namespace
(when connecting to another process) is specified as the 5-tuple { protocol,
server address, server process, client address, client port}, such as a file
is defined by the pathname of the file. Access is controlled by RPC
authentication methods. You can use standard I/O routines with network
communication. The OS does this already.

Sure, you can communicate between processes with files using NFS. You can
also communicate between process without NFS (process 1 writes file. process
2 spawns ftp process, and blocks until complete. Ftp process grabs file.
Process 2 gets ftp output). In fact, you can communicate between different
processes using tape, if you want. That doesn't make it a good way to do it.

> -glenn
> 

Joe S.

Date: Tue, 14 Jul 92 14:57:53 -0700
From: Joe Sirott <address@hidden>
To: address@hidden
Subject: Re: netCDF data server

 ...

> You're right, and we shouldn't imply that user votes determine our
> development priorities.  I think the issues of the usefulness of a
> netCDF server still deserve more discussion (i.e., I don't know whether
> you or Glenn is right).  We're discussing ways of getting more
> resources for netCDF development that would make it possible for us to
> take on such projects.
> 
> --Russ

I'm not arguing that a server should be your top priority. I just would like
it to be A priority, depending on your resources. I'm looking at the
possibility of creating one myself, also (networking work is GREAT to have
on your resume ;-)).

Cheers.

Joe S.