Skip to main content

The Data Publication Process

Publication of data involves making the Datasets associated with a certain Activity or Sub-Activity publicly available. Publication is managed from the Activity record as it is the root node of a given tree of digital objects. Publication cannot take place from the Dataset level as it would mean that the accompanying Activity information could remain in an unpublished state and so users of the data would not be able to see it.

When a user thinks that their data is ready for publication they should go to the Activity object from which they will be publishing and look for the button labelled 'Pre-Publication Check'

The stages of publication are:

  1. Pre-Publication Check
  2. Submission for Publication
  3. Publication of the Data
  4. DOI Minting for Datasets

There is an optional fifth publication stage of ODI certification if the user requests it.

Stage 1: The Pre-Publication Check

The pre-publication check is a system check of all digital objects slated for publication, it checks for forms that have been incorrectly completed or fields which have been left blank by the user. It will only check the digital objects in scope for publication, "in scope" means that they are child objects of the Activity. It is possible for various objects in the archive to refer to internal documentation or similar objects held under an different Activity, however the scoping check will mean that these will not be published. This is an important fact to consider when publishing your data, if your data refers to an experimental protocol document or similar held under a different Activity you may want to consider uploading that document to somewhere appropriate under the Activity which you are intending to publish in order for it to be available to the public with your dataset. in other words, if you have documentation or experimental protocols that will be referred to throughout your various research Activities then you should try to publish these first.

Stage 2: Submitting for Publication

Once the data has passed its pre-publication check you can submit it for publication. Doing so will freeze input on the digital objects you are trying to publish while they are referred to a member of the AEDA staff for review. The review is necessary as not every issue will have been picked up automatically by the pre-publication check. An example might be that a user has given their Data Components or Datasets non-meaningful titles such as DS1, DS2 etc. Titles must be meaningful in the same way that journal articles must have meaningful titles to aid in the identification and retrieval of the information contained within them. This is the sort of thing that a human must check prior to publication. The archive staff will also check that the data is correctly structured, meaning that issues such as compound phenomena (see this section for info on compound phenomena) have been correctly separated, CSV headings are correctly labelled and appropriately specific etc.

Submission for publication also involves choosing an appropriate licence for the data to be published under. As data licensing is usually not something other than information professionals or intellectual property lawyers are familiar with the archive staff can provide advice on which one to choose. Simply email for help on choosing the correct licence for your data.

Stage 3: Publication of the Data

If the archive staff deem that the information submitted for publication isn't suitable then they will refer it back to the user with details of why it has failed. The digital objects will then unfreeze so the user can edit them taking on board the reasons for failure. This effectively means returning the data to its state prior to Stage 1 of the publication process.

If the archive staff deem that the information is of sufficient quality to be published then publication may proceed. Publication involves the cloning of all the digital objects in scope beneath the Activity being published from. These cloned objects are then put into the appropriate collection on the public-facing side of the archive and will become available to the public in general. Once published like this the cloned objects are impossible for the user to alter or edit in any way. However the original objects visible in the user's account will become available for editing and alteration just as they were prior to the pre-publication check in Stage 1. This is so that mistakes or errors in the published data can be corrected and also that updates can be made by - for example - adding new data to a Dataset for later publication.

Stage 4: Minting the DOI

The final stage of publication involves the archive staff submitting some of the metadata from a Dataset to Datacite in order to create a Digital Object Identifier (DOI) for that Dataset. Minting a DOI for a Dataset requires the archive to maintain a copy of that data in perpetuity and so this step is done last of all so as to allow the maximum opportunity to spot any errors or omissions in the data. Adding a DOI to a Dataset means that it will be listed in the Datacite metadata store and will be easier to find in scholarly indexes and similar resources. It also means the DOI can be used in citations and Datasets published this way become additions to the record of scholarly work of researchers creating the data.

Optional Stage 5: Adding and ODI Certificate

The Open Data Institute offers a certification process. for open data. The certificates offered come in four levels: Raw, Pilot, Standard, and Expert. AEDA currently supports all four levels of certification from a technical perspective. However, the precise level your data can achieve depends very much on the way in which you have structured it and inputted it into the archive. Most data added to AEDA is likely to achieve a Pilot level certificate based solely on the fact that the AEDA data model forces you to structure your data in a manner automatically consistent with Pilot level certification. Standard and Expert level certificates can be achieved but may require the data to be structured slightly differently. The AEDA developer team is working behind the scenes to improve the archive so that data entered into it will gain higher level certificates automatically, unfortunately the ODI certification process is always likely to involve some work from the user providing the data in the first instance.

If you want your data to have an ODI certificate then email the archive staff to discuss what would be involved in achieving this for your data.

Advantages of getting an ODI certificate for your data. A certificate form the ODI is an independent mark of the quality of your data and allows others to have some assurance as to how open and easily reusable your data is. Part of the specification for the certificates involves such things as machine readability, which would allow your data to be easily used by developers creating applications that are built off of open data. It also involves - particularly at the higher certificate levels - an amount of user engagement with your data so that it can be discussed and feedback on it given. If you can get an ODI certificate for your data it is highly recommended that you do so.

Publishing subsequent versions of the data

Once a set of digital objects has been published by the archive and had a DOI minted for it that data is frozen and cannot then be altered. This only refers to the publicly available data, you can continue to edit and improve or add to those very same digital objects once logged into AEDA. At a later time if you wish to publish these opbjects again - say if you've appended new numbers to the end of a CSV file, or corrected some errors in the data that has been published - then you can go back to the Activity and begin the whole data publication process again, this will create a new version of the data available to the public, The old version must still be kept, but the new one will now also be available.

Go to next section
Go to previous section