This section provides guidance to Data Providers on how to go about providing data to the archive. The first stages involve the consolidation of the data and metadata that you have generated followed by familiarisation with the conceptual model classes. The next stage (Stage 5) is to prepare the CSV files of numerical data for upload into the archive.
Stage 1: What data do I have?
At the start of the process it is important to make some decisions about the dataset that you wish to commit to the archive. We recommend that you provide as much metadata and data as possible, including intermediate processing steps that were undertaken in generating your final outputs. This includes what you might consider as “raw data”. Deciding on the scope of your dataset and the logical separation between different processes is a very useful starting point.
For example, you might have a dataset made up of three distinct components:
- Making measurements of X at L
- Making measurements of Y at M
- Calculating Z using X and Y as inputs
Whilst the results of Z are your final outputs the dataset as a whole is only fully understood by describing how X, Y and Z came about. Using our conceptual model, you could think of these as three separate Data Components: 2 Measurements and 1 Synthesis.
Stage 2: What are my data files?
The second step involves the separation between metadata (to be captured in classes of the conceptual model) and data (to be uploaded in CSV or another format). It is useful to review your dataset to consider what information you, and potential users, think of as data. You should start thinking about how the data and metadata will be handled in this stage.
You will need to convert your data files into a set of CSV files. This will typically involve re-structuring output files (or worksheets in the case of Excel spreadsheets) into a simple tabulated structure. No nesting of tables is allowed and any annotations, images or charts should also be removed. At this stage you may find that some of the contents of your existing data files is actually what we call metadata and it should be used when populating the metadata templates. In terms of charts and images, you might well wish to provide these to the archive in the form of Supplementary Files, which can be added in PDF format.
Stage 3: How do I map my metadata to the classes of the Conceptual Model?
Now you come to the hardest part of the process which involves the translation of the metadata and data you already have into the classes of the conceptual model. At this stage it is recommended that you spend time reviewing the above descriptions of the classes in the model, and that you pay special attention to the 5 types of Data Component.
Here are a number of approaches that you might find useful. They may not all be applicable to your particular research but we recommend working through them to help you look at the problem from different perspectives.
Approach 1: Focus on the Dataset
Your Dataset is made up of one or more Data Components. It is useful to divide up your research into the various processes that you undertook to get the final outputs. For example, you may have collected raw data from a monitoring station, then performed quality assurance checks on the data.
Approach 2: Focus on your Data files
Each of your data files represents part of, or the entire, output associated with a Data Component. Think about which type of Data Component produced the output(s), was it a measurement, a simulation etc. Many Datasets will be composed of multiple Data Components so don’t be surprised if you have many. Remember that some of your content will be more appropriate to map to Supplementary Files rather than Data Components.
Approach 3: Focus on the Activity
Another approach is to start with the high-level view of the work you have done by thinking in terms of the Activity. This is typically the project or programme that drove you to perform your research. You may know that this Activity has already been described in the archive but you will often define your own Activity that provides information about why the research was undertaken. The Dataset is really just a container for a set of Data Components without a great deal of metadata. The Activity tends to describe the over-arching work that required the Dataset to be produced and the Data Components contain the details about exactly what was done and how.
Once you have spent some time working through the 3 approaches described above you should have a reasonable picture of your:
- Activity
- Dataset
- Data Components
- Supplementary Files
Before moving on, you might want to ask if you have too many Data Components. If you have more than 20 Data Components (in a single Dataset) then it will be a significant undertaking to document them all thoroughly. It is therefore wise to consider whether you have defined the Data Components at a level that is too fine-grained. If this is the case, consider whether any of the Data Components could be appropriately represented as Supplementary Files. Also, consider if sets of Data Components can actually be grouped together to make a single Data Component.
Stage 5: Generating suitable output files
Please refer to the next section of the guidance notes for details on generating suitable output files.