The FAIR Principles
Open research data must be published or otherwise accessible in a way that enables their findability, accessibility, interoperability and reusability, or, in short, according to the FAIR principles. FAIR principles are generally used for sharing all scientific findings, but they are especially important when sharing research data.
The process of converting data into a format compliant with FAIR principles is called FAIRification and includes complex tasks:
- data generation,
- analysis of generated data,
- choosing (or creating) an appropriate semantic model for data and metadata,
- ensuring access to data,
- choosing an appropriate license to allow reuse,
- generation of rich metadata and
- publishing data together with rich metadata and an appropriate license that enable reuse.
You can read more about the FAIRification process at GoFair- FAIRification Process and FAIR Cookbook websites. You can also check how much your existing data meets the FAIR principles using the FAIR Data Self Assessment Tool developed by the Australian Research Data Commons.
In this article, we will take a closer look at each of the FAIR principles.
The findability principle of represents the provision of easy findability of metadata and data, both for physical users and for search algorithms (robots).
The findability principle consists of the following sub-principles:
F1. Data and metadata are assigned a globally unique and persistent identifier (e.g., DOI, Handle)
Sub-principle F1 is probably the most important of all sub-principles, as other aspects of the FAIR principles are difficult to achieve without globally unique and persistent digital identifiers for data. Unique and persistent identifiers (PIDs) provide permanent accessibility to each metadata record and each data set, and they can also contain other information (e.g., information about the license or author affiliation). Persistent identifiers represent online links to metadata and data in digital form (e.g., URL).
Persistent identifiers must meet two important criteria:
- They must be unique (no one else can use/assign an already used tag without referencing that used tag).
- They must ensure permanent independence from possible changes in the web addresses of repositories or other websites where data is archived.
Permanent identifiers are assigned by dedicated services, e.g., Crossref. Many repositories generate and assign persistent identifiers automatically when archiving data and metadata.
When choosing a repository to store data and metadata, pay attention to whether the repository assigns persistent identifiers. Let this be one of the key criteria when choosing a repository.
F2. Data are described with rich metadata (more details also in sub-principle R1 below)
FAIRification of data requires the creation of enriched and expanded metadata, including descriptive information about the context of the created or reused data and a description of their origin (provenance). Sub-principle F2 helps users find data more easily, correctly evaluate the context, and thus also enables reuse.
When creating metadata, especially when describing the context of the data creation, it is important to use the appropriate vocabulary. For this purpose, you can use vocabularies, ontologies and taxonomies already created by the research community. Often sources of relevant vocabularies are part of domain repository services.
There are many online tools available to help you create metadata. We provide an example of a generated taxonomy, an example of an ontology database in the field of molecular genetics, and an example of instructions for creating metadata and using standard metadata schemas.
F3. Metadata clearly and explicitly include the permanent identifier of the data they describe
This is a simple but important sub-principle in finding and using data. Metadata and the data they describe are usually in the form of various files or digital objects. Metadata must necessarily contain a unique permanent identifier that leads users to the data or datasets.
As we already mentioned in sub-principle F1, permanent identifiers of digital objects (e.g., DOI, Handle) are often assigned by the repository itself. We recommend that, as one of the criteria when choosing a repository for archiving your data, you also consider whether the repository enables allocation of permanent identifiers. This makes your work significantly easier and also reduces costs.
F4. Data and metadata are registered or indexed in searchable bibliographic indexes (e.g., in repositories or library catalogues)
Permanent identifiers and rich metadata by themselves do not guarantee the findability of your data. You also need to take care of dissemination. Repositories have much greater dissemination power if the content is indexed in search engines (e.g., Google or Google Scholar). The library catalogue is also an important channel for disseminating information about data, so it is important to record metadata in the form of a bibliographic record there as well.
There are already metadata aggregators for data repositories with the possibility of accessing the data itself. We anticipate that this way of accessing data will develop even more in the future. Therefore, it is crucial that you follow sub-principles F1, F2 and F3 as much as possible when handling data, as this will be the basis for easier harvesting and indexing of metadata in such tools.
The accessibility principle represents the provision of data accessibility, including possible authentication and authorization procedures.
The accessibility principle consists of the following sub-principles:
A1. Data and metadata are retrievable by their identifier using a standardised communications protocol
Access to data under sub-principle A1 is limited by specialized tools or communication methods. This principle focuses on how data and metadata can be retrieved based on their identifiers. Obstacles that must be removed when accessing data are, for example, communication protocols that would need to be installed separately, that are poorly documented, or that require a lot of manual operation.
If access to data is restricted for various reasons (e.g., patent potential or legal provisions), it is necessary to provide contact information in the metadata with a description of the data access options (e.g., access at the institution's headquarters, access in a secure room, etc.).
A1.1 The protocol is open, free, and universally implementable
By following this sub-principle, you increase the possibility of reusing your data. The access protocol should be open (open source), free of charge and internationally established. Your goal should be that at least your metadata can be accessed by anyone with a computer connected to the World Wide Web. Examples of such communication protocols are:
- HTTP, FTP, SMTP …
- telephone/mobile phone (enables only partial possibilities of obtaining information)
- teleconference systems (e.g., Zoom, Skype, Webex ...).
A1.2 The protocol allows for an authentication and authorisation procedure, where necessary
Sub-principles A do not in themselves guarantee open access, as even the most protected data can meet the requirements of the FAIR principles if other conditions are met. Repositories require authentication and authorization from users for various reasons. The level of complexity of authentication and authorization to access your data can be one of the criteria when choosing a repository.
A2. Metadata are accessible, even when the data are no longer available
For various reasons (e.g., permanent storage costs), data is often no longer available after a certain period of time. However, it is important that the metadata still remain accessible. In this way, information about your research remains in the repositories, especially if the metadata is enriched and the source of the data is well described. This information may be important for other researchers. In connection with sub-principle F4, it is also important that metadata, after indexing in various indexes, give a reliable picture of the (previously existing) data.
The interoperability principle provides the possibility of integrating data with other data and the possibility of using applications or work processes for the needs of analysis, storage and processing.
The interoperability principle consists of the following sub-principles:
I1. Data and metadata use a formal, accessible, shared, and broadly applicable language for knowledge representation
Physical users must be able to use and interpret the data, therefore metadata and data must be written in an understandable and generally accessible language that is used in the scientific community in a specific research field. On the other hand, metadata and data are also used in automated, computer-assisted processes of exchange and reading. Therefore, metadata and data must be machine-readable, and these processes must be performed without special algorithms, online translators or similar tools.
Interoperability therefore represents the possibility of exchanging metadata and data between different systems without intermediaries. To achieve this, it is necessary to use generally recognized and used vocabularies, taxonomies and ontologies, as well as appropriate standardized metadata models.
Metadata can be created using various pre-prepared dictionaries, vocabularies, code registers and ontologies. Examples of ontology libraries are, e.g., Bioportal, The Open Biological and Biomedical Ontology (OBO) Foundry, Ontologies for e-Government and others. More on this can be found in the article Where to Publish and Find Ontologies? A Survey of Ontology Libraries. Various metadata schemes useful for describing scientific works from various scientific fields can be found already prepared in some collections of metadata schemes. Two of these are Schema and Dublin Core.
When creating metadata, it is also important to consider the principles of search engine optimisation (SEO). We anticipate that many users will search for data using general or specialized search engines, which index the contents of the repositories. Google already launched a dataset search engine called Dataset Search in 2018. Therefore, when creating metadata, you can also use Google's tools Google Trends and Google Ngram Viewer to check how often users search for certain keywords.
I2. Data and metadata use vocabularies that follow FAIR principles
Controlled vocabularies and dictionaries used should be documented and accessible via persistent identifiers. Documentation about the passwords and dictionaries used should be easily accessible to anyone who will use your data. You can read more about the provision of sub-principle I2 on the FAIR Data Point website.
I3. Data and metadata include qualified references to other (meta)data
The goal of this sub-principle is to create links and references between individual metadata and data. These references should provide and explain the context as much as possible, preferably in the form of permanent identifiers. Reuse of data must also be cited.
The reusability principle ensures the possibility of data reuse. To achieve this, the data and metadata must be described in sufficient detail to enable reproducibility or reuse for other purposes. There are three key aspects to reuse, namely:
- metadata and data must be licensed in a way that allows reuse,
- when re-using data, it is essential that the user is familiar with the description of the method of data creation or with a description of the origin (provenance) of the data,
- metadata and data must enable a scientific level of reuse.
The reusability principle consists of the following sub-principles:
R1. Data and metadata are richly described with a plurality of accurate and relevant attributes
This sub-principle is related to sub-principle F2, but focuses on the ability of users to decide whether the data are useful in the selected context of their research. Therefore, you must provide metadata that describes the context of data creation, e.g., description of experiment protocols, brand and type of instrument, conditions of data generation, etc.
Since you cannot predict the context of other research where your data will be reused, be very generous in creating metadata, especially in providing context. Information that is irrelevant at first glance and in your context may be of use to other researchers. An effective communication format for this purpose is the so-called data article published in a dedicated data science journal.
R1.1. Data and metadata are released with a clear and accessible data usage license
This sub-principle provides copyright interoperability, which defines the rights that other users have when using your data. When it comes to open research data, we usually use open licenses, e.g., Creative Commons. Licenses must be understandable to both physical users and computers for machine reading and data mining.
R1.2. Data and metadata are associated with detailed provenance
This sub-principle is one of the keys to reuse. When reusing data, other users must be aware of the circumstances of the data creation (who created the data, with what, under what conditions).
Well-documented provenance provides important considerations for data reuse. The format largely depends on the scientific field, the type of research and, above all, the method of data generation. You can read more about the minimum requirements regarding the description of data provenance in our article on provenance.
R1.3. Data and metadata meet domain-relevant community standards
Reuse is easier if the datasets are similar. Here, we are not talking about content, but about standard ways of organizing data, properly established and recognized data formats, and the use of known vocabularies and ontologies in the creation of metadata. In most research fields, the community of researchers uses certain minimum information standards, as this is necessary for the exchange of information (e.g., Minimum information about a proteomics experiment - MIAPE or Minimum information about a microarray experiment - MIAME).
In some scientific fields, these standards are less formal, but nevertheless, published data should be described in language that allows reuse in the community. In some cases, you may have valid and specific reasons for deviating from the standards when archiving in repositories (e.g., special or new data format). This should be recorded in the metadata.
Last update: 7 September 2022
- F: Findability – najdljivost
- A: Accessibility – dostopnost
- I: Interoperability – interoperabilnost
- R: Reusability – ponovna uporabnost