[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Thredds Security Problem



Hi all:

I think this issue is in the category of "hmm, i never thought of using the TDS that way!".

So if you review the way one can restrict dataset access at

http://www.unidata.ucar.edu/projects/THREDDS/tech/tds4.2/reference/RestrictedAccess.html

you will see this example:

<?xml version="1.0" encoding="UTF-8"?>
<catalog name="TDS Catalog" xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0">

  <service name="thisDODS" serviceType="OpenDAP" base="/thredds/dodsC/" />

1)<datasetRoot path="test" location="/data/testdata/"/> 
  <dataset name="Test Single Dataset" ID="testDataset" serviceName="thisDODS"
      urlPath="test/testData.nc" restrictAccess="tiggeData">

    <dataset name="Nested" ID="nested" serviceName="thisDODS" urlPath="test/nested/testData.nc" />
  </dataset>

2) <datasetScan name="Test all files in a directory" ID="testDatasetScan"
      path="testAll" location="/data/testdata" restrictAccess="ccsmData" >
    
    <metadata inherited="true">
      <serviceName>thisDODS</serviceName>
    </metadata> 
 
  </datasetScan>
</catalog>
Example 1) is using datasetRoot, and 2) is using datasetScan . datasetScan defines an implicit data root (in this case path="testAll" location="/data/testdata"). If one removes the datasetScan, the data root also goes away. But removing the dataset doesnt remove the datasetRoot.

Apparently ESG defines dataroots in one place, and then defines explicit <dataset> elements for each file. (This is the "hmm, i never thought of that").

Im assuming you dont want to remove the datasetRoot element because other datasets use it?

Anyway, just to clarify:
  1) removing the dataset means that a user can no longer find it in a public catalog.
  2) but if you leave the datasetRoot, and they "just know" the URL, it will get served.

So the implications of that are that you have to be careful where you put your data.

Anyway a solution might be to allow data roots to be restricted, eg:
<datasetRoot path="test" location="/data/testdata/" restrictAccess="ccsmData"/> 
I will investigate how easy that is. But this will restrict all datasets using that data root, im not sure if thats what you want.

John.

On 6/27/2011 1:22 PM, Drach, Bob wrote:
Hi Estani,

You're correct, and it's worth emphasizing this behavior to data publishers.
I've highlighted the same information in:

- the ESG publisher tutorial
- the publisher reference guide
- the installation script

As I see it the dataset roots are treated much like the DocumentRoot
directive in Apache. If there is a simple configuration to block access to
files under a dataset root unless otherwise cataloged /configured, I would
be interested to know about it.

--Bob


On 6/27/11 9:18 AM, "Estanislao Gonzalez" <address@hidden> wrote:

Hi,

I might have missed someone interested in this, so forward it properly
if you happen to know.

We have a problem with the current Thredds usage. I'm almost certainly
this is intended to be a TDS feature but it's not working for us. I'm
sending a copy to John as maybe there's some way to turn off this
default (aka. undesired for us) behavior.

So here's the problem. Everything defined in the thredds_root is served
by the fileServer. If there's no catalog (and thus no security policy
being defined) it's apparently assumed to be "unprotected".

this cause problems in multiple ways:
1) For example, the mere act of unpublishing data (removing the TDS
catalogs) makes it widely accessible.
  - a work around for this is either leaving the catalogs, and
retracting the publication from the gateway only, or removing the
thredds_root entry altogether (and publishing anything so this change is
picked up!)
2) Any other data published in one of those directory is instantly being
served, I don't think the publisher is aware of this.
3) links are followed everywhere, even outside of the defined directory.

A mere "my_path | /" added to any configuration file opens the machine
for read access to whatever tomcat can read.

Because omitting any thredds_root entry will result in those catalog
being not published at all, systems relying on multiple publishers have
to manage a central esg.ini file with all possible thredds_root entry.
This will more than probably cause that thredds_root directories that
aren't used anymore will not be deleted and will remain open.

Well, I might think harder to find other problems, but in general I
think that our intention was not to serve any file that's not contained
in any Catalog, or am I seeing things upside down?

Were you all aware of this? I wasn't...

Thanks,
Estani