[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Parsing and missparsing XML



Hi Benno,

Benno Blumenthal wrote:
> 
> In looking through my logs, I noticed that fastsearch.net has managed
> (somehow) to find  my thredds directory, but seems to be misparsing it.
> 
> 
> The top of the directory is at
> 
> http://iridl.ldeo.columbia.edu/SOURCES/thredds.xml
> 
> and that file has lines in it like
> 
> <catalogRef xlink:title="DASILVA"
> xlink:href="http://iridl.ldeo.columbia.edu/SOURCES/.DASILVA/thredds.xml"/>
> 
> The robot is hitting  urls like
> 
> http://iridl.ldeo.columbia.edu/SOURCES/.DASILVA/thredds.xml"/
> 
> 
> which I am presuming to mean that it does not understand the   '/>' notation
> to end the tag.
> 
> 
> Are we using a non-standard xml notation?   I was just following the example I
> was given.

No, that is standard XML. Perhaps the crawler is confused by the differences
between XML and HTML. But even so, I would think it would stop for the '"'.
Maybe the crawler would do better if there were a space before the "/>" but
either way is valid XML.

Ethan


> Benno
> 
> 
> 
> --
> Dr. M. Benno Blumenthal          address@hidden
> International Research Institute for climate prediction
> The Earth Institute at Columbia University
> Lamont Campus, Palisades NY 10964-8000   (845) 680-4450
> 
> 

-- 
Ethan R. Davis                       Telephone: (303) 497-8155
Software Engineer                    Fax:       (303) 497-8690
UCAR Unidata Program Center          E-mail:    address@hidden
P.O. Box 3000
Boulder, CO  80307-3000              http://www.unidata.ucar.edu/

NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.