International Journal of Digital Curation

ESRC Research Data Policy in a Changing Landscape

Tue, 28 Jan 2025 00:00:00 +0000

The Economic and Social Research Council (ESRC) is the UK’s largest funder of economic, social, behavioural, and human data science. ESRC research data policy is ‘intended to support ESRC grant holders who collect, produce and re-use data by defining researchers’ roles and responsibilities, as well as the roles and responsibilities of the ESRC and its data service providers.’ This paper reports on an independent review of this policy, commissioned and funded by the ESRC, and carried out by the authors, to make recommendations for an updated policy in the light of changes to both the data and legal/policy landscape. Following an initial scoping review, the study comprised an online survey of stakeholder views that was followed up by a series of focus groups, and an analysis of a sample of data management plans (DMPs). In this brief report, we concentrate on those aspects of our review, both in process and in substance, which are of most relevance to the data curation and data management community, and outline the next steps in the policy review process.

Data Makers and Users' Views on Useful Paradata

Isto Huvila, Lisa Andersson, Olle Sköld, Ying-Hsang Liu — Tue, 11 Feb 2025 00:00:00 +0000

Understanding and making data (re)usable requires adequate documentation of the data but also information on how it has been created, curated, manipulated and used, termed in data documentation literature as paradata. This paper reports results of a survey study (N=91) of data creating and (re)using archaeologists' views of what data creation, curation, manipulation and use related information (termed here as paradata) they consider important when they are working with data. Data makers' and users' perceptions align to a considerable degree. It is important to have an explanation of the original general context of data creation and knowing the purpose, procedures and methods of data making, analysis and documentation. The findings underline that there is a need to continue developing and testing ideas how to capture and document paradata, and to find ways how to help data makers adopt proven practices to facilitate paradata making. Simultaneously, it is crucial that the paradata aimed at facilitating data use is relevant for data users rather than, for instance, technical or administrative details considered useful primarily by data makers.

Starting with the Digital Doesn’t Make it Easier

Amanda Boczar — Tue, 28 Jan 2025 00:00:00 +0000

As organizations continue to overwhelmingly abandon all forms of paper-based record keeping, libraries are still adapting to increased offers of born digital archival donations. Simple misunderstandings or disconnects between the units facilitating donations and maintaining born-digital collections creates pain-points for donor relations and can result in a lack of transparency over how their records may be processed. To facilitate better donor transparency and cross-area collaboration over born digital records, Special Collections and archives need comprehensive policies and shifts in training and collaboration paradigms. This paper analyses the intersections of born digital archiving, collection development polices, donor relations, human-supported AI tools, and digital records education within American academic libraries to propose a functional toolkit for born digital acquisitions. Unrealistic expectations of collection processing, retention, growth, and publication onto openly accessible platforms can quickly overwhelm a libraries’ digital collections’ team due to size, need for digital forensics work, copyright limitations, or other capacity-related issues. Intertwined within this discussion is an additional discourse over the need to carefully curate our digital spaces not only for practical cost reasons, but due to the environmental costs of massive data storage solutions. Through an analysis of the elements stated above, the paper will reflect on the need to integrate born digital materials into archival acquisition procedures and provide practical solutions to meet this need.

Data Curation: Introducing a Competency Framework for the Social Sciences

Kathrin Behrens, Tatiana Kvetnaya — Fri, 13 Jun 2025 00:00:00 +0000

Research data management includes more than the question how researchers handle their data. In the sense of the FAIR principles, it is also about the sustainable safeguarding and organized reusability of research data. For social science, data-intensive research, research data centers and their data curating staff are therefore becoming increasingly important: data curators usually take on curation-specific tasks such as data preparation, securing research data in suitable archival environments, ensuring data accessibility, and the related control of the conditions of data re-use by third parties. Hence, they are specialized in the entire data curation process and, in particular, take on tasks of archiving and providing research data for reuse. Although the standards of comprehensive research data management are becoming more and more specific, this trend has not yet arrived in the corresponding training and further education measures. As a result, there is a gap between the growing demands on data curators and the development of competencies in the field of research data management with a focus on data curation. The competency framework presented in this article is intended to help close this gap: based on a Data Curation Lifecycle Model, a competency framework has been developed to support the development of targeted training and continuing education programs in the field of data curation, the formulation of learning objectives, and the evaluation of the corresponding trainings. The article points out the necessity to advance the development of competencies for this field, illustrates the schematic substructure of the data curation lifecycle, describes the development as well as the central core elements of the presented competency framework and discusses its perspectives. Overall, this competence framework is aimed in particular at (future) data curators, or as a schematic basis for the training of the relevant personnel. The focus is primarily on the data-intensive discipline of social sciences, although large parts can certainly be adapted for other disciplines and the corresponding data curation. The competency framework and this companion article are thereby intended to assist in advancing the sustainable professionalization of the previously understudied competency field of data curation.

Research Data Lifecycle (RDLC)

Jie Jiang, Danielle Maurici-Pollock, Rong Tang — Sun, 09 Feb 2025 00:00:00 +0000

In this paper, we report the results of a study examining 78 Research and Data Lifecycle (RDLC) models located in a review of the literature. Through synthesis-analysis and the nominal group technique, we investigated the RDLC models from the point of view of their disciplinary focus, use cases, model creators, as well as the specific stages and shapes. Our study revealed that the majority of the disciplinary focus for the models was generic, science, or multi-disciplinary. Models originating in the social sciences and humanities are less common. The use cases varied in a wide spectrum, with a total of 34 different scenarios. The creators and authors of the RDLC models came from more than 20 countries with the majority of the models created as a result of collaboration within or across different organizations. Our stage and shape analysis also outlined key characteristics of the RDLC models by showing the commonalities and variations of named stages and varying structures of the models. As one of the first empirical investigations examining the deep substance of the RDLC models, our study provides significant insights into the context and setting where the models were developed, as well as the details with regard to the stages and shapes, and thereby identified gaps that may impact the use and value of the models. As such, our study establishes a foundation for further studies on the practical utilization of the RDLC models in research data management practice and education.

Reproducible preservation of databases through executable specifications

Ivar Rummelhoff, Thor Kristoffersen, Bjarte M. Østvold — Mon, 02 Jun 2025 00:00:00 +0000

We propose a new preservation method for relational data and a corresponding tool. The method involves writing a specification that can later be executed by the tool without user interaction, transforming the input files and databases into an encapsulated package suitable for archiving. Thus, the transformation steps become reproducible, which facilitates automation by reusing the specifications and allows for an iterative process, where for each iteration the specification is extended or adjusted and then executed to check that the result is closer to fulfilling future access requirements.

Using Metadata to Promote Transparency in Health Research

Megan Chenoweth, John Kubale — Mon, 17 Feb 2025 00:00:00 +0000

Data sharing is a key strategy for fostering transparency, reproducibility, and trust in scientific research. Data sharing is endorsed and even required by many funders, such as the National Institutes of Health (NIH) in the United States. However, many NIH-funded projects face obstacles to data sharing, either to protect research participants’ privacy, safeguard proprietary data, or remain compliant with data use agreements. Yet event researchers who cannot openly share data still benefit from openness and transparency into one another’s work, and to making their own research more transparent where possible.

The Social, Behavioral, and Economic COVID Coordinating Center at ICPSR (SBE CCC) has launched a new archive aimed at addressing these challenges within the domain of social, behavioral, and economic (SBE) research into the COVID-19 pandemic. In September 2023, SBE CCC launched the COVID measures archive with the dual goals of a) offering researchers the ability to compare measures across SBE studies of COVID while b) protecting contributors’ needs for privacy and confidentiality in health research. The COVID measures archive primarily holds variable-level metadata, which provides visibility into the individual variables and measures employed in studies without necessitating the sharing of confidential or restricted data.

This brief report describes the features of the COVID measures archive and illustrates how it can be used to foster transparency and consistency across SBE COVID studies.

Managing Retractions and their Afterlife: A Tripartite Framework for Research Datasets

Renata Curty — Mon, 09 Jun 2025 00:00:00 +0000

Retractions serve as a critical, albeit last-resort, post-publication correction mechanism in scholarly publishing, playing an important role in upholding the integrity of the scientific record. By formally retracting flawed or misleading research, the scientific community mitigates the harm caused by errors or misconduct that may have escaped detection during peer review. While retractions of research articles have been extensively discussed across scientific disciplines and are well-integrated into most publishers' workflows, the retraction of research datasets remains underexplored and rarely implemented. This paper seeks to address this gap by reviewing recent developments in this area, analyzing a sample of publicly available retracted dataset records considering existing recommendations and guidelines, and putting forward a few points for discussion—particularly for cases where datasets have been published and correction is no longer feasible, or when all efforts to amend the dataset have been exhausted. These considerations are framed into three main categories: (1) preventive actions and timely response, (2) purposeful damage control, and (3) community engagement and shared standards. Although still preliminary, this framework aims to help entertain future debates and inform actionable strategies for addressing the unique challenges of managing retracted datasets where scientific rigor has been compromised. By contributing to the discussion on dataset retractions, this work seeks to better equip data curators, repository managers, and other stakeholders with tools to enhance accountability and transparency throughout the data preservation process, while also helping to mitigate the error cascade effect in science.

Volumetric Video: Preservation and Curation Challenges of an Emerging Medium

Zack Lischer-Katz, Bryan Carter, Rashida Braggs — Tue, 06 May 2025 00:00:00 +0000

Volumetric video is an emerging media format that uses multiple cameras to record live-action subjects and produce three-dimensional, time-based digital media. The resulting digital objects encode visual and spatial information, colour, textures, and sound in a format that allows for users to view the subject from any angle and use the assets in video games, virtual reality, augmented reality, or films. The technology has been pioneered by Hollywood production companies but is now being experimented with by digital humanities scholars. As it becomes more popular, information institutions, particularly academic libraries and others that support researchers, will likely need to support this new format throughout its lifecycle, which may draw on research data management, digital preservation, and repository services. This article introduces volumetric video capture, discusses some of its current applications outside of the commercial film industry, and outlines the curation and preservation challenges that this new media format presents. The paper compares two different production workflows that result in different output qualities: professional and prosumer studio-based workflows. The analysis explores the digital curation challenges that volumetric video raises within these workflows, with considerations for selection and appraisal criteria, file format sustainability, metadata requirements, legal/ethical considerations, and directions forward for future research in digital curation.

Data Management Plans: a Resource to Shape Institutional Data Management Services

Willeke de Haan, Veerle Van den Eynden — Mon, 21 Apr 2025 00:00:00 +0000

At KU Leuven, a university in the Flemish region of Belgium, data management plans have become an important resource to drive and shape the development of data management support, services, and training. With 8,000 researchers and 7,000 PhD students in fundamental and applied research across a comprehensive range of disciplines, KU Leuven is the largest university in Belgium. Public research funding is provided by the federal and regional governments, mainly via the Research Foundation Flanders (FWO) and via research funding allocated to universities based on excellence criteria through the Special Research Fund (BOF) and the Industrial Research Fund (IOF).

Since 2018, FWO and BOF-IOF incorporated data management into their policies, requiring researchers to submit Data Management Plans (DMPs) to their institutional research office. Since then, the number of DMPs that are developed each year has increased exponentially, from 150 in 2018 to nearly 700 per year now. The Research Coordination Office at KU Leuven decided to review all DMPs to provide feedback to ensure high-quality plans. To manage the submission, monitoring, review, and preservation of this volume of DMPs efficiently, an online platform was developed that is integrated with the university’s research information systems.

Initially, the focus of the DMP review was on supporting the development of DMPs, as this was a new concept for researchers. The review process has significantly improved the quality of DMPs. Later, support shifted to provide advice on best practices in data management. Reviews of over 2600 DMPs provide a rich source of information to develop services and training. Based on findings from DMP reviews, the IT department developed an interactive storage guide; ethical and legal compliance in research projects can be monitored; new data management training modules are developed; and a collection of example DMPs has been developed. In addition, the growing DMP collection is a rich source of information on researchers’ data practices, providing the baseline information to develop further services. Future plans include implementing artificial intelligence in DMP reviews to automate problem detection and exploring machine-actionable DMPs for an institutional data register.

Supporting the Research Data Management Journey of a Postgraduate Student at the University of St Andrews

Federica Fina, Haley Eckel, Panagiota Spanou, Jackie Proven — Tue, 18 Feb 2025 00:00:00 +0000

Most research funders have requirements for data management plans and open data to foster good research data management practices. In order to embed these practices in the postgraduate research (PGR) student journey we have introduced the requirement for a data management plan as part of the first-year progress review and the encouragement to make data underpinning theses publicly available. To support students through these processes we provide a suite of training workshops and are available for one-to-one consultations. User feedback and frequently asked questions are used to review and improve our support offering.

This brief report discusses the planning and implementation processes for data management plan requirement and encouragement of underpinning data. It dives deeper into the workflows, especially for the data deposit, and describes training and support available to students. Statistics on training uptake, data management plan submissions and annual trends for data deposit are also presented. The report concludes with lessons learnt and the team’s plans for the near future.

The Standardized Data Management Plan for Educational Research

Sebastian Netscher, Harald Kaluza, Reiner Mauer, Kati Mozygemba, Karsten Stephan — Tue, 11 Feb 2025 00:00:00 +0000

Although there is an increasing number of tools and support opportunities, research data management is still challenging. Conventional templates of data management plans (DMP) guide users, but hardly support them in implementing and realizing data management. Instead, users of conventional templates require more tailored guidance to better understand how to manage their data according to the needs of their research discipline, and its methods and practises, e.g., regarding data sharing. To provide more tailored, discipline-specific guidance, Science Europe (2018) suggests developing and using so-called Domain Data Protocols, i.e., a model DMP for a given discipline or community. The project Domain Data Protocols for Empirical Educational Research was one of the first to turn this concept into a practically useable DMP template tailored to educational research by developing the Standardised Data Management Plan for Educational Research (Stamp). The Stamp is designed to assist researchers in managing their data, appropriately, and to ensure shareable data according to the FAIR Data Principles. Due to its flexible structure, its checklist and auxiliary materials, the Stamp tackles most of the challenges of conventional DMP templates. Providing tailored, discipline-specific guidance and enabling to manage various types of data, the Stamp is an innovative approach to further professionalize data management.

Event Notifications and Event Logs

Patrick Hochstenbach, Ruben Verborgh, Herbert Van de Sompel — Wed, 26 Mar 2025 00:00:00 +0000

The “Event Notifications in Value-Adding Networks” specification provides an interoperable fabric that can be used in scholarly communication to exchange messages among data nodes that make scholarly artifacts available to the network and service nodes that add value to these artifacts. For example, a data repository can have a request-response conversation with a long-term archive that results in the latter relaying the coordinates of an archived version of the dataset to the repository. The push-oriented notification protocol is based on W3C Recommendations, both regarding the messaging protocol and payloads. Implementations of the protocol are in various stages of maturity, the most advanced being the COAR Notify effort that focuses on overlay peer review as a service. An important consequence, and actual design goal, of the conversational interoperability approach, is the ability it provides to bi-directionally interlink the scholarly artifact and the service result in real-time, providing an attractive alternative to current interlinking approaches that by and large are heuristic-based and generate results with significant delays. Another consequence is the ability to publish an Event Log for each scholarly artifact that lists all event notifications that were exchanged about it, providing full transparency about its entire life cycle, including where and how it was registered, archived, reviewed, commented upon, etc. This paper describes essential aspects of the Event Notification protocol and illustrates it using a scenario. It then describes the Event Logs concept and illustrates it by means of that same scenario. It then gives an overview of challenges related to specifying Event Logs that are currently under investigation and largely relate to equipping them with affordances to make them verifiable and trustworthy.

TROV - A Model and Vocabulary for Describing Transparent Research Objects

Wed, 12 Feb 2025 00:00:00 +0000

The Transparent Research Object Vocabulary (TROV) is a key element of the Transparency Certified (TRACE) approach to ensuring research trustworthiness. In contrast with methods that entail repeating computations in part or in full to verify that the descriptions of methods included in a publication are sufficient to reproduce reported results, the TRACE approach depends on a controlled computing environment termed a Transparent Research System (TRS) to guarantee that accurate, sufficiently complete, and otherwise trustworthy records are captured when results are obtained in the first place. Records identifying (1) the digital artifacts and computations that yielded a research result, (2) the TRS that witnessed the artifacts and supervised the computations, and (3) the specific conditions enforced by the TRS that warrant trust in these records, together constitute a Transparent Research Object (TRO). Digital signatures provided by the TRS and by a trusted third-party timestamp authority (TSA) guarantee the integrity and authenticity of the TRO. The controlled vocabulary TROV provides means to declare and query the properties of a TRO, to enumerate the dimensions of trustworthiness the TRS asserts for a TRO, and to verify that each such assertion is warranted by the documented capabilities of the TRS. Our approach for describing, publishing, and working with TROs imposes no restrictions on how computational artifacts are packaged or otherwise shared, and aims to be interoperable with, rather than to replace, current and future Research Object standards, archival formats, and repository layouts.

In Sharing We Trust. Taking Advantage of a Diverse Consortium to Build a Transparent Data Service in Catalonia

Clara Llebot, Mireia Alcalá, Lluís M. Anglada i de Ferrer — Tue, 28 Jan 2025 00:00:00 +0000

The Consorci de Serveis Universitaris de Catalunya (CSUC) is a consortium that serves 13 universities and 33 research centers in Catalonia and neighboring communities. In 2017 the Consortium created an Open Science department to collaborate with universities and research centers on facilitating the adoption of Open Science requirements. Even though CSUC also offers services to researchers directly (for example, its supercomputing resources), this report will focus on CSUC’s work with its member institutions to create and offer data management services. We will explain how CSUC has led the creation of a robust shared governance system, and how it takes advantage of the diversity of its members to create useful, high quality, and transparent services for all researchers in the Catalan research system. Through sharing each other’s experiences, values and priorities, the result is better than separate ad-hoc solutions. The process also creates a community of practitioners that develop expertise together with the help of professional development opportunities organized by CSUC, like recurrent self-learning labs focused on data curation tools, techniques and processes.

Realising Open Data Principles In UK Research Institutions

Pen-Yuan Hsing, Jessica Wheeler, Lorna Duncan, Rosalind Strang, Neil Jacobs — Tue, 29 Apr 2025 00:00:00 +0000

We report on the state of open research data (ORD) policy and practice across UK research institutions through the STAR (Sustainable & TrAnsparent Research data) project. Through qualitative interviews, focus groups, and workshops involving 52 university staff across 21 UK institutions, we investigated the progress and challenges in ORD practices since 2016 publication of the Concordat on Open Research Data.

We observed that while institutions have made progress establishing ORD specialist roles, developing policies, and creating repository infrastructures, systematic monitoring processes and widespread adoption remain stalled. Key challenges include capacity constraints in institutional repositories, limited workload recognition, insufficient funding for long-term archiving, and varying disciplinary interpretations of ORD relevance.

Based on workshops with participants, we recommend recognition of ORD in academic career frameworks, development of disciplinary-relevant data sharing practices, improved infrastructure for monitoring ORD practices, and enhanced support for external disciplinary repositories. The study emphasizes the need for a values-driven rather than compliance-driven approach to ORD implementation, calling for deeper engagement with diverse academic communities to ensure ORD requirements remain meaningful and relevant across disciplines. These findings provide insights for research institutions and funding bodies in developing more effective and inclusive ORD policies.

Two Decades, Same Story? Insights and Future Directions in Long Tail Data Curation

Inna Kouper, Gretchen Stahlman — Thu, 05 Jun 2025 00:00:00 +0000

This paper examines the evolution of the concept of long tail research data in the scholarly literature. The “long tail” concept, originally used to describe “niche” digital products that have a significant market share when taken as an aggregate, was first applied to research data in 2007 to refer to a vast array of smaller, heterogeneous data collections that cumulatively represent a substantial portion of scientific knowledge. These datasets are frequently overlooked due to inadequate data management practices and institutional support. Bridging the discussions on data curation in library & information science (LIS) and domain-specific contexts, this paper identifies several themes in these discussions and offers insights, or provocations, that encourage researchers to rethink the existing frameworks and methods and find new approaches that would help both researchers and data professionals. This review seeks to enhance understanding of long tail data as both a concept and a field, while also informing current and future research and practice.

Researchers and Research Data: Improving and Incentivising Sharing and Archiving

Minna Ventsel, Beth Montague-Hellen — Tue, 28 Jan 2025 00:00:00 +0000

There has been a lot of discussion within the scientific community around the issues of reproducibility in research, with questions being raised about the integrity of research due to failure to reproduce or confirm the findings of some of the studies. Researchers need to adhere to the FAIR (findable, accessible, interoperable, and reusable) principles to contribute to collaborative and open science, but these open data principles can also support reproducibility and issues around ensuring data integrity. This article uses observations and metrics from data sharing and research integrity related activities, undertaken by a Research Integrity and Data Specialist at the Francis Crick Institute, to discuss potential reasons behind a slow uptake of FAIR data practices. We then suggest solutions undertaken at the Francis Crick institute which can be followed by institutes and universities to improve the integrity of research from a data perspective. One major solution discussed is the implementation of a data archive system at the Francis Crick Institute to ensure the integrity of data long term, comply with our funders’ data management requirements, and to safeguard our researchers against any potential research integrity allegations in the future.

Developing Specialized Data Curation Curricula to Meet Growing Demands: A Community-based and Evolving Approach

Tue, 13 May 2025 00:00:00 +0000

Data curation is “the encompassing work and actions taken by curators of a data repository in order to provide meaningful and enduring access to data” (Johnston et al., 2018a). It can be multifaceted and complex based on the types of data, the expertise of the curator, disciplinary expectations, and repository policies. With evolving data sharing practices and standards, ensuring data curators and stewards have access to high-quality, extensible instruction on specific data types is essential for supporting the goals of open research and accessible data sharing, particularly in the landscape where funders (National Institutes of Health, 2020) and journals (Naughton & Kernohan, 2016) are mandating data publication for the purpose of reproducibility, reuse, and external validation. In brief, data need to be curated for effective re-use and in alignment with the FAIR principles (Wilkinson et al., 2016). The Data Curation Network (DCN) (Johnston et al., 2018b) has been actively developing education and training programs to expand capacity in data curation along multiple axes. This paper will explore the progression of the DCN’s education program based primarily within the United States, highlighting a recent effort to develop specialized data curation education for four specialized data types. We will conclude with lessons learned, reflections on growth of education efforts in the DCN more broadly, and potential next steps.

Traveling on the same road while navigating different terrain: Institutional data services and repositories across the US

Tue, 13 May 2025 00:00:00 +0000

This paper describes the Repository Readiness initiative and subsequent Summit for Academic Institutional Readiness in Data Sharing (STAIRS). These efforts examined the current state of institutional research data services and repositories at US academic institutions. Using federal memos and directives published in 2022 as a foundation, members of the Data Curation Network hosted a virtual learning series to identify areas of collaboration across institutions. The themes of this learning series led to STAIRS, which brought representatives from 32 institutions, to discuss the need for and potential benefits of deeper institution-wide engagement and increasing cross-institutional collaborations to support research data sharing efforts. Through discussions at the summit, three key themes emerged: difficulties in scaling services, need for shared resources and training materials, and importance of cross-institutional collaboration. The participant discussions, combined with data from pre- and post-summit surveys, suggest gaps in staffing, resources, and formal policies in data management in institutional settings. The authors conclude with recommendations for funding agencies to support institutional data services through incentives for collaboration, improved communication with program officers, and additional research into data sharing requirements across different institutional types.

From Building a First-Generation Digital Library Infrastructure to Reimagining Discovery

Stuart Snydman, Martha Whitehead — Wed, 25 Jun 2025 00:00:00 +0000

Twenty-five years ago, Harvard University was in the early stages of a project to build a first-generation digital library infrastructure. The project was carefully named the Library Digital Initiative (LDI), signifying that ‘digital’ would be an integral and integrated aspect of ‘library’ and not a separate entity. The initiative aimed to develop knowledge and expertise relating to digital objects, as well as technical infrastructure to create, curate, access and preserve them, and to integrate the new digital collections with Harvard’s extensive tangible collections.

Today, we still benefit from the foresight of this first-generation development and the subsequent ones it spawned, but we are also at a pivotal point of reflecting on lessons learned and opportunities to be seized as we rebuild and reimagine our digital infrastructure and services in a vastly expanded data ecosystem. Predicting what libraries will look like two decades ahead is always conjecture. What we do know, however, is that while the themes and challenges from the past two decades endure, the way we are tackling them is different. This paper examines what has changed since early library digital initiatives, and the imperatives we see for the future.

Making reproducibility a reality by 2035?

Rebecca Taylor-Grant, Matthew Cannon, Allyson Lister, Susanna-Assunta Sansone — Thu, 29 May 2025 00:00:00 +0000

This paper describes a project which identified practical and pragmatic ways to increase the FAIRness and reproducibility of published research. Academic journals have supported Open Science through the implementation of data sharing policies for over ten years; some evidence has since emerged on the additional time, resources and expertise that policy enforcement requires as part of an editorial workflow. A series of publisher workshops facilitated by the EC-funded TIER2 project aimed to identify the key checks needed to enforce strengthened journal data sharing policies and to understand which editorial roles have the capacity to undertake such enforcement. The intended outcome of this work was to establish the workflows and resourcing which can support academic journals to enforce stronger data sharing policies in future.

Improving the Transparency of Data Access Conditions in the SSH Domain

Deborah Thorpe, Ricarda Braukmann, Angelica Maineri — Thu, 24 Apr 2025 00:00:00 +0000

The benefits of making data available for reuse are recognised by many. While some datasets can be openly available, others contain sensitive or personal data that needs to be protected. To allow the sharing of the latter datasets, many trustworthy digital repositories provide options to publish data restricted access. However, detailed standardised information about the access conditions for these restricted access datasets is often lacking from the metadata. Researchers interested in reusing these datasets can thus not judge whether they are eligible to reuse data and under what conditions.

To get a better understanding of how we can increase the transparency of access conditions, this paper aimed to investigate the access conditions and procedures that are commonly applied by depositors within the (Dutch) Social Sciences and Humanities (SSH) research community. The results of a survey that was conducted (n=45) indicated that various conditions are applied, and while some datasets can have few restrictions, for others reuse is highly restricted. Most respondents limit reuse to research purposes and prohibit commercial use. Some datasets are available for students or teaching, but often with additional requirements. A large majority of respondents required a motivation letter to evaluate before allowing reuse. Notably, respondents often chose ‘It depends’ when asked whether a specific condition was applied, showing a lot of nuances in the conditions and the evaluation of access requests. An important result from our survey was that clear procedures and decision-making guidelines seem to be lacking for many respondents. Requests are often evaluated ad-hoc and through email. Decisions are said to require an evaluation of the quality of the application, yet the evaluation criteria seem to be rarely specified and explicitly communicated at the time of data deposit.

Based on the results of the small-scale survey, we conclude with a set of six recommendations directed at data owners and repositories outlining how information about access conditions and procedures can be made more transparent in the future. This work should be seen as a starting point to improve the accessibility and reusability of restricted access datasets in the SSH domain.