[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

THREDDS/DELESE connections: dataset definition



I hope that the following notes could contribute to the discussion on
the dataset definition.


Sincerely,

Stefano Nativi


------------------------------------------------------------------------------------------------------------------------------------------------------------------
A dataset is a "set of data" -and it can be considered as data as well.
Therefore, the proper question should be what is data? Or better which
are the elementary components of data -and hence of dataset.

According to ISO/IEC JTC1 (see below):
"Humans are aware of anything that exists in the natural world through
its properties. Data represents the properties of these things.
Specification of data elements, the basic units of data, involves
documenting relevant characteristics of each data element to ensure its
representation of the natural world item is consistent and accurate."

The ISO/IEC JTC1 adopted the following useful definitions (Term:
Definition):

Data: A representation of facts, concepts, or instructions in a
formalized manner, suitable for communication, interpretation, or
processing by humans or by automatic means. [ISO 2382-4]
Data Dictionary: A database used for data that refers to the use and
structure of other data; that is, a database for the storage of metadata
[ANSI X3.172-1990]. See also data element dictionary.
Data Element: A unit of data for which the definition, identification,
representation, and permissible values are specified by means of a set
of attributes.
Data Element Concept: A concept that can be represented in the form of a
data element, described independently of any particular representation.
Data Element Dictionary: An information resource that lists and defines
all relevant data elements. See also register.
Data Model: A description of the organization of data in a manner that
reflects an information structure.

According to ISO/IEC:
"...There are many constructs used to organize data. There are data
composites, entities, files, object classes, objects, records,
relations, relationships, rows, segments, subject areas, tables, and
tuples. None of these are analogous to data elements, but may include or
be supported by some database implementation or logical modeling
equivalent of data elements."
...... In a database, a data element may be implemented as a field or
column. In Chen's ER data model, it is an attribute..... A data element
....is a unit of data representing a single fact about a type of object
... in the natural world. ....Data elements are thus defined as relevant
to the user within the user's universe of discourse. Data elements are
electronic or written representations of the properties of natural?world
object classes.



As a matter of fact the European SHEMAS forum (see below) distinguishes among the following "Materials" -as far as metadata standardization is concerned:

 > Dataset, Databases
 > Documents, Text
 > Images (film, photograph, slide)
 > Map
 > Multimedia resources
 > Software

Therefore, according to SCHEMAS, Maps, Images, Multimedia resources,
Documents, and Software are not dataset materials -as far as metadata. Are they right?



Eventually, there is an open issue to consider: shall a dataset include
metadata?
For instance, according to the NCSA-HDF5 tutorial: "A dataset is a
multidimensional array of data elements, together with supporting metadata."


-------------------------------------
ISO/IEC JTC1: In the field of information technology, ISO and IEC have
established a joint technical committee, ISO/IEC JCT1. Draft
International Standards adopted by the joint technical committee are
circulated to national bodies for voting. Publication as an
International Standard requires approval of at least 75% of the national
bodies casting a vote.

---------------------------------------------
SCHEMAS: Forum for Metadata Schema Implementers: is a project funded by
the European Commission in the framework of the IST (Information Society
Technologies) Programme.