Information required by the Archive System

This section gives an overview of requirements of the archive in terms of the information that should accompany each dataset.

In order to prepare your data for the archive you will need to understand how to format your metadata and data appropriately. The distinction between metadata and data is largely arbitrary. When working with this archive a rule of thumb is that data is typically columns of tabulated values and metadata is information about who, how, why, when, where and what research was done.

It is worth noting that data can be of any type (e.g. numeric, binary encoding, characters, categories, Boolean) and that the archive makes no distinction between raw data and data that is your final output. You are encouraged to provide data from all stages of your research and not just your final outputs. This gives users the opportunity to replicate and re-use both your data and method for further research.

Managing Metadata

Your metadata is typically information that is stored in many places. It describes who did the work, how they did it, where it took place, the method, quality procedures and much more. Traditionally scientists store this information in text files, lab books, field books, web pages, documents, CDs and in their heads.

The aim of the archive system is to capture structured metadata about the work that was done to maximum the opportunity for others to be able to discover and understand it without having access to the individual that created the data.

Managing Data

Your data is typically tables within spreadsheets or binary/text files structured according to a set of rules. We encourage you, for data that is relatively small in volume (i.e. < 50 Mbytes), to write your data into simple 2-d structures within CSV files before uploading them to the archive. For users working with Excel this should be doable using some straightforward Macros in order to re-format results that you already manage within your spreadsheets.

The benefits of converting your data to this simple CSV format are that (1) it is easy for a computer to read and work with the data, (2) the format can be easily read by any user accessing the data without any specialist software and (3) the format is not tied to a specific version of an operating system or package that may not be available in future.

NOTE: We are not discouraging the use of existing formats (such as Excel spreadsheets) for data exchange within projects. We request only that the data is “exported” into CSV before being placed in the archive. Excel can also be used to “import” a CSV file that has been extracted from the archive.

Supplementary Files

Some data and metadata does not fit neatly into the archive structure (described more in the next section). Some examples are: (i) a PDF document that provides an important part of your experiment design layout; (ii) a Shapefile containing some relevant geometries; (iii) a set of text configuration files used in running a model. The archive system includes a place to put these files so that the information can be retained without the overhead of re-formatting it into the metadata or CSV structures that we recommend. These are known as Supplementary Files, which are described in full below.

Go to next section