[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Thredds - Grib2 Collection Indexing as Independent Task



you can build the indexes off the main server, it just needs a machine with access to the data. of course, a direct mount will be much faster than NFS.

more indexing should wait until we get 4.3.18 up, and after weve explored any problems that occur with a subset.

On 8/14/2013 11:23 AM, Timothy Lewis - NOAA Affiliate wrote:
John,

This dataset is static.  The concern is not so much the adding and
re-indexing of data, rather the month of indexing time the collection
needs when it is first built.  This process has a tendency to exhaust
the server's resources to the point that it cannot serve data reliably.
  Once the collection is built, indexed, and accessible, it should stay
in that state for the foreseeable future.  It's the month worth of poor
performance and, more importantly, poor user experience between now and
that point that we would like to avoid.

Thank you,

Tim

On Wed, Aug 14, 2013 at 12:06 PM, John Caron <address@hidden
<mailto:address@hidden>> wrote:

    Hi Tim:

    Once the indexes are (correctly) written for the individual files,
    they dont have to be rewritten again.

    However, for a changing dataset, you have to update the master
    indexes as often as you want the changes to be incorporated. This
    process is reletively fast, though for 30K files its non trivial. We
    are still working on ways to make this as fast as possible. Its
    likely you only want to update once a day or week, and do it at 2 am
    or something.

    How often is this data changing?


On 8/14/2013 10:58 AM, Timothy Lewis - NOAA Affiliate wrote:

        Thank you, John.  I will begin the upgrade process.

        Would there be any way to use an already indexed collection?
          Say I were
        to upgrade to the latest version, set up the feature collection,
        request
        it via the web interface, and let it finish indexing.  Could I
        then take
        the same configuration for the feature collection and place it in
        another instance of Thredds(same version and all necessary
        equivalences
        of course), start this new instance, and be able to access the
        aggregation from the new web interface without the usual
        processing cost
        of the first indexing?

        Conceptually, this seems to me like it would work because the
        ncx files
        would already be written, circumventing the bottleneck.  I do not,
        however, know how Thredds handles these collections internally.  My
        theory could easily be hiding an importantly false assumption.
          Would
        this plan ever be possible?  Practical?

        Thank you,

        Tim

        On Wed, Aug 14, 2013 at 11:24 AM, John Caron
        <address@hidden <mailto:address@hidden>
        <mailto:address@hidden
        <mailto:address@hidden>__>> wrote:

             yes, you will need to do that, there are many bug fixes in
        later
             versions, esp for GRIB. so maybe get that process started.

             You will need to delete all the .ncx files (.gbx9 are ok)
        when you
             get the new tds installed, and let them regenerate. the ncx
        files
             are much faster to recreate, as you will see.

             ill look for my notes on the 6-hourly cfsr. also ill check
        with ncdc
             who ive been working with on this dataset.


On 8/14/2013 10:11 AM, Timothy Lewis - NOAA Affiliate wrote:

                 John,

                 We are currently using Thredds 4.3.14.  It's possible
        to upgrade to
                 4.3.18, but it would take some coordinating and some
        time(1 - 2
                 weeks?).

                 This is the 6 hourly subset of CFSR.

                 Thanks,

                 Tim




On Tue, Aug 13, 2013 at 5:48 PM, John Caron <address@hidden <mailto:address@hidden> <mailto:address@hidden <mailto:address@hidden>__> <mailto:address@hidden <mailto:address@hidden> <mailto:address@hidden <mailto:address@hidden>__>__>> wrote:

                      Hi all:

                      1. i see you are using an older version of tds.  which
                 version? can
                      you upgrade to 4.3.18?

                      2. "olderThan" attribute is ok. how often do these
        files
                 add/change?

                      3. which subset of cfsr is this? monthly, hourly,
        6-hourly ??

                      John


On 8/13/2013 2:27 PM, Timothy Lewis - NOAA Affiliate wrote:

                          John,

                          I've attached the log files after following
        the steps
                 you requested.

                          I've removed the recheckAfter attribute from this
                 aggregation and
                          several others that needed it.  As I
        understand it, the
                 "olderThan"
                          attribute is in place to avoid including files
        that are
                 still being
                          written and should not induce any new scans.
          I've left
                 this
                          attribute
                          in the aggregations.  Is this correct?

                          Also, where can I find more information on the
        encoding
                 defects
                          for the
                          CFSR dataset?

                          Thank you again for your help.  We greatly
        appreciate it.

                          Sincerely,

                          Tim Lewis



                          On Mon, Aug 12, 2013 at 4:30 PM, John Caron
                          <address@hidden
        <mailto:address@hidden> <mailto:address@hidden
        <mailto:address@hidden>__>
                 <mailto:address@hidden
        <mailto:address@hidden> <mailto:address@hidden
        <mailto:address@hidden>__>__>
                          <mailto:address@hidden
        <mailto:address@hidden>
                 <mailto:address@hidden
        <mailto:address@hidden>__>

                          <mailto:address@hidden
        <mailto:address@hidden>
                 <mailto:address@hidden
        <mailto:address@hidden>__>__>__>> wrote:


> 2. change > > <collection



        
spec="/thredds02/cf_______reanalysis/**/ocnh[0-9]{2}\.______gdas\.[0-9]{10}\.grb2"


recheckAfter="5 min" olderThan="5 min"/> > > to > > <collection



        
spec="/thredds02/cf_______reanalysis/**/ocnh[0-9]{2}\.______gdas\.[0-9]{10}\.grb2"


/>

                               sorry, that should be

                                 <collection



        
spec="/thredds02/cf_______reanalysis/**/ocnh[0-9]{2}\.______gdas\.[0-9]{10}\.grb2"


/> <update startup="true"/>




-- Tim Lewis, Associate Software Engineer General Dynamics Information Technology NOAA Coastal Data Development Center 1021 Balch Boulevard, Suite 1003 Stennis Space Center, Mississippi 39529 USA

                          _228.688.2126 <tel:228.688.2126>
        <tel:228.688.2126 <tel:228.688.2126>> <tel:228.688.2126
        <tel:228.688.2126>

                 <tel:228.688.2126 <tel:228.688.2126>>>_

                          address@hidden
        <mailto:address@hidden> <mailto:address@hidden
        <mailto:address@hidden>__>
                 <mailto:address@hidden
        <mailto:address@hidden> <mailto:address@hidden
        <mailto:address@hidden>__>__>
                          <mailto:address@hidden
        <mailto:address@hidden>
                 <mailto:address@hidden
        <mailto:address@hidden>__>
        <mailto:address@hidden <mailto:address@hidden>
                 <mailto:address@hidden
        <mailto:address@hidden>__>__>__>_

                          address@hidden
        <mailto:address@hidden> <mailto:address@hidden
        <mailto:address@hidden>__>
                 <mailto:address@hidden
        <mailto:address@hidden> <mailto:address@hidden
        <mailto:address@hidden>__>__>
                          <mailto:address@hidden
        <mailto:address@hidden>
                 <mailto:address@hidden
        <mailto:address@hidden>__>
        <mailto:address@hidden <mailto:address@hidden>
                 <mailto:address@hidden
        <mailto:address@hidden>__>__>__>_






-- Tim Lewis, Associate Software Engineer General Dynamics Information Technology NOAA Coastal Data Development Center 1021 Balch Boulevard, Suite 1003 Stennis Space Center, Mississippi 39529 USA

                 _228.688.2126 <tel:228.688.2126> <tel:228.688.2126
        <tel:228.688.2126>> <tel:228.688.2126 <tel:228.688.2126>

                 <tel:228.688.2126 <tel:228.688.2126>>>_

                 address@hidden <mailto:address@hidden>
        <mailto:address@hidden <mailto:address@hidden>__>
                 <mailto:address@hidden
        <mailto:address@hidden> <mailto:address@hidden
        <mailto:address@hidden>__>__>_
                 address@hidden <mailto:address@hidden>
        <mailto:address@hidden <mailto:address@hidden>__>
                 <mailto:address@hidden
        <mailto:address@hidden> <mailto:address@hidden
        <mailto:address@hidden>__>__>_




-- Tim Lewis, Associate Software Engineer General Dynamics Information Technology NOAA Coastal Data Development Center 1021 Balch Boulevard, Suite 1003 Stennis Space Center, Mississippi 39529 USA

        _228.688.2126 <tel:228.688.2126>_

        address@hidden <mailto:address@hidden>
        <mailto:address@hidden <mailto:address@hidden>__>_
        address@hidden <mailto:address@hidden>
        <mailto:address@hidden <mailto:address@hidden>__>_




-- Tim Lewis, Associate Software Engineer General Dynamics Information Technology NOAA Coastal Data Development Center 1021 Balch Boulevard, Suite 1003 Stennis Space Center, Mississippi 39529 USA

_228.688.2126_
address@hidden <mailto:address@hidden>_
address@hidden <mailto:address@hidden>_


NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.