Research repositories are part of the digital infrastructure that supports open science. As such, they must conform to certain professional guidelines or standards, comply with applicable legal and ethical restrictions (e.g., allow varying degrees of content openness) and provide secure, permanent storage. We refer to such repositories as trustworthy.
Three groups of repositories can be considered trustworthy:
- certified repositories (according to CoreTrustSeal, DIN 31644 or ISO 16363 certificates),
- domain-specific repositories recognized and used by the research community in a particular research field, e.g., HEPData, Crystallography Open Database, PubChem…),
- generalist and institutional repositories that have the characteristics of trustworthy repositories (e.g., Zenodo).
Trustworthy repositories transparently provide accurate information about the organisation and technical characteristics of their services (e.g., access and retrieval of content, secure storage, long-term provision of services, including technical support and funding). Such repositories provide appropriate metadata that include information about the origin of the content, are machine-readable and of sufficient quality to enable discovery, reuse and citation of data. Trustworthy repositories also assign persistent, unique digital object identifiers (e.g., DOI, Handle, PURL, etc.) to content so that content is unambiguously citable and cited.
Below, we will have a look at the 16 key requirements that a repository must fulfill to become CoreTrustSeal-certified.
The main role of a repository is to preserve and provide access to the data it manages, which must be clearly understood by data givers and users. A repository is responsible for managing digital objects and must therefore provide an appropriate environment for storing digital material over time.
The repository is the holder of all valid licenses for data access and use. It monitors data compliance with these licenses and maintains a dialogue with users. This applies both to the licenses established by the repository itself and to any codes of practice for information sharing and reuse that are generally accepted in a particular research field.
3. Continuity of Access
The repository has a plan to ensure permanent and uninterrupted preservation and access to its contents. This plan must provide for the future management of the repository (including under extraordinary circumstances such as disasters or termination of funding) and for organisational succession. It should include options for relocation, transfer of activities to another organization or return of data to their authors. The plan must be both medium-term (three to five years) and long-term (more than five years).
4. Confidentiality and Ethics
The repository ensures, to the greatest extent possible, that data are stored, organised, accesed and used in accordance with the ethical norms and practices of a particular research field, which is of key importance for responsible science. It must have a plan to manage the risk of disclosing confidential information, such as identifying the individual who participated in the research or the exact location of an endangered plant or animal species. It must also provide instructions to data providers and users on how to handle sensitive data.
5. Organisational Infrastructure
The repository is adequately funded, has an effective management system and employs a sufficient number of qualified staff with expertise and experience in the field of data archiving. It is usually hosted by an established institution in a relevant subject area, which ensures long-term existence and stability. Sufficient funds are available for technical infrastructure and staff, ideally for a period of three to five years. The staff are provided with continuous education and professional development.
6. Expert Guidance
The repository adapts to development of data types, their quantity and production speed. To best serve its users, it regularly updates its operations with the most effective new technologies. In doing so, it collaborates with internal or external experts in the field of information technology, archival science, data analytics and scholarly research, who provide feedback on the operation of the repository. It also communicates with data providers and users.
7. Data Integrity and Authenticity
The repository ensures the integrity and authenticity of data through an appropriate system for managing data and metadata throughout their lifecycle in the repository. To protect the integrity of data and metadata, it is necessary to document all intentional changes, including the source of the changes and the reasons for them. This may include establishing the identity of the data providers. The repository must have mechanisms in place to detect unauthorized access, unintentional changes and errors in files, and to restore correct versions of data and metadata. Data authenticity is ensured by controlling the reliability of the originally deposited data and its provenance (origin), including preserving existing relationships between datasets, metadata and their subsequent versions resulting from versioning.
To ensure the relevance and understandability of data to users, the repository accepts data and metadata based on specific criteria and ensures proper data management to preserve its properties. The data to be archived are carefully selected, and the repository has mechanisms for handling rejected data. The repository also defines recommended file formats and a set of metadata that must accompany the data, preferably in the form of a standardized metadata schema. Part of the evaluation and selection process for data and metadata can be automated.
9. Documented Storage Procedures
The repository documents data archiving processes and procedures. It must maintain data and metadata from the point of deposition through the entry and processing procedures to the point of access in accordance with the ISO 14721 standard (Reference Model for an Open Archival Information System, OAIS). It should have a management strategy for different storage locations and multiple copies, if any, to ensure consistency of storage. It must also have a strategy to control the ageing or deterioration of data carriers and to manage risks (unforeseen events).
10. Preservation Plan
The repository assumes responsibility for the long-term storage of data and performs this function in a planned and documented manner. Data providers and users must understand its role and must grant it certain rights based on a contract or agreement. This primarily relates to data administration, storage, copying, modification, migrations and accessibility to third parties. All procedures must be accurately documented and completed and comply with archival standards.
11. Data Quality
The repository staff has the appropriate expertise and experience to assess the technical quality of the data and metadata and to ensure that end users have sufficient information to assess the quality of content. It has a quality control strategy in place to ensure the completeness and understandability of the deposited data, as well as measures for handling data that do not meet the criteria (i.e., whether the repository returns them to the authors or they are corrected by repository staff).
Archiving follows a well-defined procedure from data entry to user access, preferably according to the ISO 14721 standard (Reference Model for an Open Archival Information System, OAIS). The work process is adapted to the role and activity of the repository.
13. Data Discovery and Identification
The repository enables users to find data and refer to them through persistent identifiers. To this end, it must establish machine harvesting of metadata, a searchable catalog of metadata, and recommended citation formats. Preferably, the repository is also included in one or more general or domain-specific digital resource registries.
14. Data Reuse
The repository enables the reuse of data and ensures that they are accompanied by appropriate metadata that make them easier to understand. It must ensure that the data will be understandable and reusable in the long term despite technological changes (e.g., discontinuation of file formats and software and emergence of new ones) and the evolution of knowledge in a particular research field. This includes information support in case of file format changes.
15. Technical Infrastructure
The repository is based on well-supported operating systems and other infrastructure hardware and software that are suitable for the services it provides to its user community. This is specified by the ISO 14721 standard (Reference Model for an Open Archival Information System, OAIS). The repository maintains records of software used and system documentation. Bandwidth and connectivity are sufficient for the needs of the user community. The repository also has an infrastructure development plan, a disaster management plan (e.g., providing backup copies and file recovery options in the event of an outage) and a long-term business plan.
The technical infrastructure of the repository ensures the protection of the organization and its data, products, services and users. The repository must analyse the potential threats, assess their risks and establish an effective security system. It must anticipate the damage that could be caused by malicious acts, human or technical errors. It must assess the likelihood and impact of such scenarios, set acceptable levels of risk and provide measures to protect against threats.
Last update: 15 June 2022
- Repository's Mission
- Continuity of Access
- Confidentiality and Ethics
- Organisational Infrastructure
- Expert Guidance
- Data Integrity and Authenticity
- Documented Storage Procedures
- Preservation Plan
- Data Quality
- Data Discovery and Identification
- Data Reuse
- Technical Infrastructure