Re: [netcdf-java] Bug (concurrency issue?) when reading NCML aggregations

so you are running a model which outputs 50-70 files that belong to a
single "run".

do you put each run in a seperate directory?

are you overwriting the files?

On Tue, Nov 17, 2015 at 1:34 PM, Clifford Harms <clifford.harms@xxxxxxxxx>
wrote:

> The data I attached is for a test case in a scenario I am trying to
> handle. I have several thousand netcdfs (some CF, some not), most of which
> are the same logical dataset broken up via a time or Z axis into datasets
> consisting of 30-50 files, which I must aggregate into a single 'logical'
> dataset (I believe this is a fairly common use case). These files are
> updated daily, but due to the amount of data involved as well as other
> environmental factors, these updates happen sporadically over a span of
> about 24 hours.
>
> So what I am trying to do here is, as the files of an aggregated dataset
> are slowly updated with newer versions of the same file, add those new
> versions to the aggregated datasets that they belong to but ensuring that
> the new data can be differentiated within the aggregation via its data
> creation time (be it a model run time or production time or whatever). This
> is where the joining of files with the joinNew dimension comes in (in this
> example, 'runtime'), as the data creation time does not exist in the
> datasets as a coordinate variable, and in some cases is not even indicated
> in global attribution.
>
> Ultimately, once all of the files for an aggregated dataset have been
> updated, the aggregation contains files that all have the same data
> creation or run time, until the next update starts.
>
> You seem to be indicating that I cannot perform a 'joinNew' aggregation
> between datasets that have coordinate variables with different sizes? If
> that is the case, and I missed it in the documentation somewhere, then what
> about aggregating the files with a joinNew first, and then aggregating
> those aggregations as 'joinExisting' along time/Z axis?
>
> There still is the issue, though, of the random behavior (an exception for
> some reads, for other reads an array of values) which indicates a
> concurrency problem. If the read worked consistently, instead of only half
> of the time, that would still be useful to me as my code could easily
> determine which values in the returned array were valid.
> At any rate, thanks for responding so quickly
>
> On Sat, Nov 14, 2015 at 5:35 PM, John Caron <jcaron1129@xxxxxxxxx> wrote:
>
>> Hi Clifford:
>>
>>   <aggregation type="joinNew" dimName="runtime">
>>     <netcdf  coordValue="0" location="ncom-relo-mayport_u_miw-t000.nc"/>
>>     <netcdf coordValue="24">
>>       <aggregation type="joinExisting" dimName="time">
>>         <netcdf location="ncom-relo-mayport_26_u_miw-t001.nc"/>
>>         <netcdf location="ncom-relo-mayport_26_u_miw-t000.nc"/>
>>       </aggregation>
>>     </netcdf>
>>
>> ncom-relo-mayport_u_miw-t000.nc only has 1 time coordinate, but the
>> inner aggregation has 2, so these are not homogeneous in the sense that
>> Ncml aggregation requires.
>>
>> could you explain more what you are trying to do?
>>
>> John
>>
>>
>> On Fri, Nov 13, 2015 at 11:24 PM, Clifford Harms <
>> clifford.harms@xxxxxxxxx> wrote:
>>
>>> I've posted the report, sample data, sample xml, and sample code on
>>> github -> https://github.com/Unidata/thredds/issues/276
>>>
>>>
>>> --
>>> Clifford M. Harms
>>>
>>> _______________________________________________
>>> netcdf-java mailing list
>>> netcdf-java@xxxxxxxxxxxxxxxx
>>> For list information or to unsubscribe, visit:
>>> http://www.unidata.ucar.edu/mailing_lists/
>>>
>>
>>
>
>
> --
> Clifford M. Harms
>
  • 2015 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdf-java archives: