Formatting Metadata for Open Sharing

Code by James Osborne from Pixabay

Just as with research data, we also need to properly format metadata before open sharing so that they become understandable and useful to other researchers. We must foresee the method of creation and description of metadata in the research data management plan. The three main ways of recording metadata are data papers, ReadMe files, and standardized, machine-readable metadata schemas. Which one of these is most suitable for you depends on your research area and the repository where you will deposit your research data.

Data Papers

Data papers most closely resemble the traditional forms of scientific reporting and are therefore the easiest to understand and most convenient for researchers. Like other forms of scientific publications, they go through a peer-review process, which makes them the most reliable of all methods of recording metadata. At the same time, in accordance with the Slovenian Typology of Documents/Works for Bibliography Management in the COBISS System, they can be catalogued as either 1.01 Original Scientific Article or 1.03 Short Scientific Article, depending on their length and complexity.

Data articles describe data sets in detail but usually do not include any interpretation or discussion of the data, as original scientific articles do. They may contain raw or processed data and/or provide a link to the repository where the data is deposited. Other typical components of a data article include:

  • information about the authors,
  • abstract,
  • a description of the materials and methods used to collect the data,
  • instructions for re-use,
  • authorship/contributorship statement,
  • statement on ethical aspects and conflict of interests,
  • acknowledgements,
  • references.

Data journals usually do not prescribe minimum standards for reporting experiments or data, as these vary between research fields. Therefore, when preparing data articles, it is useful to follow the domain-specific principles for describing provenance that have been prepared by the scientific community for a given research field.

Publishing Data Papers

Some scientific journals offer data papers as one of the possible publication formats (e.g., Ecology published by the Ecological Society of America, The International Journal of Robotics Research published by SAGE, Transportation published by Springer). On the other hand, there are also specialized journals dedicated only to the publication of datasets (e.g., Data in Brief by Elsevier, Scientific Data by Springer Nature, Earth System Science Data by Copernicus Publishing). As with repositories, it is recommended to choose a domain-specific journal if one exists, otherwise choose a general one.

An interesting option for publishing data articles is the European platform Open Research Europe, where data papers are called Data Notes. The advantage of Open Research Europe compared to traditional scientific journals is the free and transparent review and publication process. Contributions are first published in the form of preprints, which go through a public and permanently open review process. In case of non-acceptance, they are not permanently rejected but can always be supplemented with new, improved versions. Updates and corrections then re-enter the review process, and accepted submissions can also be updated. You can see more about Open Research Europe in the lecture as part of the Open Academy of CTK UL (the lecture is in Slovenian):

ReadMe Files

The ReadMe file is a necessary accompanying document to the dataset that you deposit in a repository (unless the repository prescribes otherwise). It is a simple text file in the .txt format that contains all the basic information about the dataset. To ensure interoperability, avoid proprietary formats such as Microsoft Word. The ReadMe file must contain:

The dataset description must also include:

  • a brief description of the contents of each individual file or groups of related files,
  • explanations of the file format if it is not widely used or clearly recognizable from the file extension,
  • explanations regarding the software required to open the files if the formats are proprietary or specific,
  • relationships between files if the data set contains several files that link to each other,
  • explanation of the content and structure of file folders,
  • the dates the files were created and updated (versioned), along with an explanation of the updates,
  • information about related data that was obtained but not included in the data set.

More detailed information about ReadMe files can be found at the Cornell University website where you can also download the ReadMe sample file template in English.

Machine-Readable Metadata Schemas

For some research areas or repositories, there are standardized metadata schemas that aim to uniformly structure metadata in a way that is both human-understandable and machine-readable. The majority of their content is intended to standardize the minimum domain-specific information on the provenance of the data. Standardized metadata schemas are available in specific formats, such as .csv (e.g., Darwin Core), .xml (e.g., Dublin Core) or similar. For some, you need special software or editing tools typically available on the websites of organizations that maintain metadata schemas (e.g., AVM Tagging Tool for the Astronomy Visualization Metadata Standard).

A list of domain-specific metadata schemas with links to websites with all relevant information can be found on the UK Digital Curation Center and Research Data Alliance websites.

 

Last update: 10 June 2022

Skip to content