[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Thredds - Grib2 Collection Indexing as Independent Task



John,

This dataset is static.  The concern is not so much the adding and re-indexing of data, rather the month of indexing time the collection needs when it is first built.  This process has a tendency to exhaust the server's resources to the point that it cannot serve data reliably.  Once the collection is built, indexed, and accessible, it should stay in that state for the foreseeable future.  It's the month worth of poor performance and, more importantly, poor user experience between now and that point that we would like to avoid.

Thank you,

Tim

On Wed, Aug 14, 2013 at 12:06 PM, John Caron <address@hidden> wrote:
Hi Tim:

Once the indexes are (correctly) written for the individual files, they dont have to be rewritten again.

However, for a changing dataset, you have to update the master indexes as often as you want the changes to be incorporated. This process is reletively fast, though for 30K files its non trivial. We are still working on ways to make this as fast as possible. Its likely you only want to update once a day or week, and do it at 2 am or something.

How often is this data changing?


On 8/14/2013 10:58 AM, Timothy Lewis - NOAA Affiliate wrote:
Thank you, John.  I will begin the upgrade process.

Would there be any way to use an already indexed collection?  Say I were
to upgrade to the latest version, set up the feature collection, request
it via the web interface, and let it finish indexing.  Could I then take
the same configuration for the feature collection and place it in
another instance of Thredds(same version and all necessary equivalences
of course), start this new instance, and be able to access the
aggregation from the new web interface without the usual processing cost
of the first indexing?

Conceptually, this seems to me like it would work because the ncx files
would already be written, circumventing the bottleneck.  I do not,
however, know how Thredds handles these collections internally.  My
theory could easily be hiding an importantly false assumption.  Would
this plan ever be possible?  Practical?

Thank you,

Tim

On Wed, Aug 14, 2013 at 11:24 AM, John Caron <address@hidden
<mailto:address@hidden>> wrote:

    yes, you will need to do that, there are many bug fixes in later
    versions, esp for GRIB. so maybe get that process started.

    You will need to delete all the .ncx files (.gbx9 are ok) when you
    get the new tds installed, and let them regenerate. the ncx files
    are much faster to recreate, as you will see.

    ill look for my notes on the 6-hourly cfsr. also ill check with ncdc
    who ive been working with on this dataset.


    On 8/14/2013 10:11 AM, Timothy Lewis - NOAA Affiliate wrote:

        John,

        We are currently using Thredds 4.3.14.  It's possible to upgrade to
        4.3.18, but it would take some coordinating and some time(1 - 2
        weeks?).

        This is the 6 hourly subset of CFSR.

        Thanks,

        Tim




        On Tue, Aug 13, 2013 at 5:48 PM, John Caron
        <address@hidden <mailto:address@hidden>
        <mailto:address@hidden
        <mailto:address@hidden>__>> wrote:

             Hi all:

             1. i see you are using an older version of tds.  which
        version? can
             you upgrade to 4.3.18?

             2. "olderThan" attribute is ok. how often do these files
        add/change?

             3. which subset of cfsr is this? monthly, hourly, 6-hourly ??

             John


             On 8/13/2013 2:27 PM, Timothy Lewis - NOAA Affiliate wrote:

                 John,

                 I've attached the log files after following the steps
        you requested.

                 I've removed the recheckAfter attribute from this
        aggregation and
                 several others that needed it.  As I understand it, the
        "olderThan"
                 attribute is in place to avoid including files that are
        still being
                 written and should not induce any new scans.  I've left
        this
                 attribute
                 in the aggregations.  Is this correct?

                 Also, where can I find more information on the encoding
        defects
                 for the
                 CFSR dataset?

                 Thank you again for your help.  We greatly appreciate it.

                 Sincerely,

                 Tim Lewis



                 On Mon, Aug 12, 2013 at 4:30 PM, John Caron
                 <address@hidden <mailto:address@hidden>
        <mailto:address@hidden <mailto:address@hidden>__>
                 <mailto:address@hidden
        <mailto:address@hidden>

                 <mailto:address@hidden
        <mailto:address@hidden>__>__>> wrote:


                       > 2. change
                       >
                       >   <collection


        spec="/thredds02/cf_____reanalysis/**/ocnh[0-9]{2}\.____gdas\.[0-9]{10}\.grb2"


                      recheckAfter="5 min" olderThan="5 min"/>
                       >
                       > to
                       >
                       >   <collection


        spec="/thredds02/cf_____reanalysis/**/ocnh[0-9]{2}\.____gdas\.[0-9]{10}\.grb2"


                      />

                      sorry, that should be

                        <collection


        spec="/thredds02/cf_____reanalysis/**/ocnh[0-9]{2}\.____gdas\.[0-9]{10}\.grb2"


                      />
                        <update startup="true"/>




                 --
                 Tim Lewis, Associate Software Engineer
                 General Dynamics Information Technology
                 NOAA Coastal Data Development Center
                 1021 Balch Boulevard, Suite 1003
                 Stennis Space Center, Mississippi 39529 USA

                 _228.688.2126 <tel:228.688.2126> <tel:228.688.2126

        <tel:228.688.2126>>_

                 _address@hidden <mailto:address@hidden>
        <mailto:address@hidden <mailto:address@hidden>__>
                 <mailto:address@hidden
        <mailto:address@hidden> <mailto:address@hidden
        <mailto:address@hidden>__>__>_

                 _address@hidden <mailto:address@hidden>
        <mailto:address@hidden <mailto:address@hidden>__>
                 <mailto:address@hidden
        <mailto:address@hidden> <mailto:address@hidden
        <mailto:address@hidden>__>__>_






        --
        Tim Lewis, Associate Software Engineer
        General Dynamics Information Technology
        NOAA Coastal Data Development Center
        1021 Balch Boulevard, Suite 1003
        Stennis Space Center, Mississippi 39529 USA

        _228.688.2126 <tel:228.688.2126> <tel:228.688.2126

        <tel:228.688.2126>>_

        _address@hidden <mailto:address@hidden>
        <mailto:address@hidden <mailto:address@hidden>__>_
        _address@hidden <mailto:address@hidden>
        <mailto:address@hidden <mailto:address@hidden>__>_




--
Tim Lewis, Associate Software Engineer
General Dynamics Information Technology
NOAA Coastal Data Development Center
1021 Balch Boulevard, Suite 1003
Stennis Space Center, Mississippi 39529 USA

_228.688.2126_



--
Tim Lewis, Associate Software Engineer
General Dynamics Information Technology
NOAA Coastal Data Development Center
1021 Balch Boulevard, Suite 1003
Stennis Space Center, Mississippi 39529 USA

228.688.2126

NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.