International Journal of Digital Curation

Media Digitization and Preservation Initiative: A Case Study Organizations and consortia around the world have used the DCC Curation Lifecycle Model as a tool to ensure that all the necessary stages of digital curation are undertaken, to define roles and responsibilities, and to build a framework of standards and technologies for digital curation. Yet, research on the application of the model to large-scale digitization projects as a way of understanding their efforts at digital curation is scant. This paper reports on findings of a qualitative case study analysis of Indiana University Bloomington’s multi-million-dollar Media Digitization and Preservation Initiative (MDPI), employing the DCC Curation Lifecycle Model as a lens for examining the scope and effectiveness of its digital curation efforts. Findings underscore the success of MDPI in performing digital curation by illustrating the ways it implements each of the model’s components. Implications for the application of the DCC Curation Lifecycle Model in understanding digital curation for mass digitization projects are discussed as well as directions for future research.</p> Devan Ray Donaldson Allison McClanahan Leif Christiansen Laura Bell Mikala Narlock Shannon Martin Haley Suby ##submission.copyrightStatement## 2018-12-23 2018-12-23 13 1 91 113 10.2218/ijdc.v13i1.502 Privacy Concerns in Qualitative Video Data Reuse <div class="WordSection1"> <p class="abstract-western">In this article, we examine how data producers’ and reusers’ privacy concerns shape their views about data sharing and reuse in the field of education, with an emphasis on video records of practice. We find that data producers and reusers were concerned about the risks that qualitative data, and video records of practice in particular, present to themselves, their colleagues, and the subjects represented in the data. Specifically, they emphasized risks relating to the privacy the subjects – teachers and students who appear in the videos. In response to these risks, data producers have engaged in a number of strategies to minimize risk and/or mitigate potential harm including: (1) education and training; (2) using informed consent to facilitate and/or restrict data sharing; and (3) limiting data capture/production. We discuss the implications that our findings have for digital repositories, and for efforts to facilitate the sharing and reuse of qualitative video data in education.</p> </div> Rebecca D. Frank Allison R. B. Tyler Anna Gault Kara Suzuka Elizabeth Yakel ##submission.copyrightStatement## 2019-01-19 2019-01-19 13 1 47 72 10.2218/ijdc.v13i1.492 Measuring FAIR Principles to Inform Fitness for Use <p class="abstract-western">For open science to flourish, data and any related digital outputs should be discoverable and re-usable by a variety of potential consumers. The recent FAIR Data Principles produced by the Future of Research Communication and e-Scholarship (FORCE11) collective provide a compilation of considerations for making data findable, accessible, interoperable, and re-usable. The principles serve as guideposts to ‘good’ data management and stewardship for data and/or metadata. On a conceptual level, the principles codify best practices that managers and stewards would find agreement with, exist in other data quality metrics, and already implement. This paper reports on a secondary purpose of the principles: to inform assessment of data’s FAIR-ness or, put another way, data’s fitness for use. Assessment of FAIR-ness likely requires more stratification across data types and among various consumer communities, as how data are found, accessed, interoperated, and re-used differs depending on types and purposes. This paper’s purpose is to present a method for qualitatively measuring the FAIR Data Principles through operationalizing findability, accessibility, interoperability, and re- usability from a re-user’s perspective. The findings may inform assessments that could also be used to develop situationally-relevant fitness for use frameworks.</p> Bradley Wade Bishop Carolyn Hank ##submission.copyrightStatement## 2018-12-22 2018-12-22 13 1 35 46 10.2218/ijdc.v13i1.630 Participatory Prototype Design: Developing a Sustainable Metadata Curation Workflow for Maternal Child Health Research <div class="WordSection1"> <p class="abstract-western"><span style="color: #000000;">T</span><span style="color: #000000;">his paper describes the findings from a participatory prototype design project, where the authors worked with maternal and child health (MCH) researchers and stakeholders to develop a MCH metadata profile and sustainable curation workflow. This work led to the development of three prototypes: 1) a study catalogue hosted in Dataverse, 2) a metadata and research records repository hosted in REDCap and 3) a metadata harvesting tool/dashboard hosted within the Shiny RStudio environment. We present a brief overview of the methods used to develop the metadata profile, curation workflow and prototypes. Researchers and other stakeholders were participant-collaborators throughout the project. The participatory process involved a number of steps, including but not limited to: initial project design and grant writing; scoping and mapping existing practices, workflows and relevant metadata standards; creating the metadata profile; developing semi-automated and manual techniques to harvest and transform metadata; and end project sustainability/future planning. In this paper, we discuss the design process and project outcomes, limitations and benefits of the approach, and implications for researcher-oriented metadata and data curation initiatives.</span></p> </div> Amanda Harrigan Saurabh Vashishtha Sharon Farnel Kendall Roark ##submission.copyrightStatement## 2018-12-28 2018-12-28 13 1 248 270 10.2218/ijdc.v13i1.534 Giving datasets context: a comparison study of institutional repositories that apply varying degrees of curation <p class="abstract-western">This r<span style="color: #000000;">esearch study compared four academic libraries’ approaches to curating the metadata of dataset submissions in their institutional repositories and classified them in one of four categories: no curation, pre-ingest curation, selective curation, and post-ingest curation. The goal is to understand the impact that curation may have on the quality of user-submitted metadata. The findings were 1) the metadata elements varied greatly between institutions, 2) repositories with more options for authors to contribute metadata did not result in more metadata contributed, 3) pre- or post-ingest curation process could have a measurable impact on the metadata but are difficult to separate from other factors, and 4) datasets submitted to a repository with pre- or post-ingest curation more often included documentation.</span></p> Amy Koshoffer Amy E. Neeser Linda Newman Lisa R Johnston ##submission.copyrightStatement## 2018-12-21 2018-12-21 13 1 15 34 10.2218/ijdc.v13i1.632 Complexities of Digital Preservation in a Virtual Reality Environment, the Case of Virtual Bethel <p class="abstract-western" lang="en-US">The complexity of preserving virtual reality environments combines the challenges of preserving singular digital objects, the relationships among those objects, and the processes involved in creating those relationships. A case study involving the preservation of the Virtual Bethel environment is presented. This case is active and ongoing. The paper provides a brief history of the Bethel AME Church of Indianapolis and its importance, then describes the unique preservation challenges of the Virtual Bethel project, and finally provides guidance and preservation recommendations for Virtual Bethel, using the National Digital Stewardship Alliance Levels of Preservation. Discussion of limitations of the guidance and recommendations follow.</p> Angela P. Murillo Lydia Spotts Andrea Copeland Ayoung Yoon Zebulun M Wood ##submission.copyrightStatement## 2018-12-21 2018-12-21 13 1 1 14 10.2218/ijdc.v13i1.631 Disciplinary data publication guides <p class="abstract-western">Many academic disciplines have very comprehensive standard for data publication and clear guidance from funding bodies and academic publishers. In other cases, whilst much good-quality general guidance exists, there is a lack of information available to researchers to help them decide which specific data elements should be shared. This is a particular issue for disciplines with very varied data types, such as engineering, and presents an unnecessary barrier to researchers wishing to meet funder expectations on data sharing.&nbsp;<span style="color: #000000;">This&nbsp;</span><span style="color: #000000;">article&nbsp;</span><span style="color: #000000;">outlines a project to provide simple, visual, discipline-specific guidance on data publication, undertaken at the University of Bristol at the request of the Faculty of Engineering</span><span style="color: #000000;">.</span></p> Zosia Beckles Stephen Gray Debra Hiom Kirsty Merrett Kellie Snow Damian Steer ##submission.copyrightStatement## 2018-12-27 2018-12-27 13 1 150 160 10.2218/ijdc.v13i1.603 Operationalizing the Replication Standard <p class="abstract-western">In response to widespread concerns about the integrity of research published in scholarly journals, several initiatives have emerged that are promoting research transparency through access to data underlying published scientific findings. Journal editors, in particular, have made a commitment to research transparency by issuing data policies that require authors to submit their data, code, and documentation to data repositories to allow for public access to the data. In the case of the American Journal of Political Science (AJPS) Data Replication Policy, the data also must undergo an independent verification process in which materials are reviewed for quality as a condition of final manuscript publication and acceptance.</p> <p class="abstract-western">Aware of the specialized expertise of the data archives, AJPS called upon the Odum Institute Data Archive to provide a data review service that performs data curation and verification of replication datasets. This article presents a case study of the collaboration between AJPS and the Odum Institute Data Archive to develop a workflow that bridges manuscript publication and data review processes. The case study describes the challenges and the successes of the workflow integration, and offers lessons learned that may be applied by other data archives that are considering expanding their services to include data curation and verification services to support reproducible research.</p> Thu-Mai Lewis Christian Sophia Lafferty-Hess William G Jacoby Thomas Carsey ##submission.copyrightStatement## 2018-12-23 2018-12-23 13 1 114 124 10.2218/ijdc.v13i1.555 The impact on authors and editors of introducing Data Availability Statements at Nature journals <p class="abstract-western">This article describes the adoption of a standard policy for the inclusion of data availability statements in all research articles published at the Nature family of journals, and the subsequent research which assessed the impacts that these policies had on authors, editors, and the availability of datasets. The key findings of this research project include the determination of average and median times required to add a data availability statement to an article; and a correlation between the way researchers make their data available, and the time required to add a data availability statement.</p> Rebecca Grant Iain Hrynaszkiewicz ##submission.copyrightStatement## 2018-12-27 2018-12-27 13 1 195 203 10.2218/ijdc.v13i1.614 Data Curation Network: A Cross-Institutional Staffing Model for Curating Research Data <p class="abstract-western">Funders increasingly require that data sets arising from sponsored research must be preserved and shared, and many publishers either require or encourage that data sets accompanying articles are made available through a publicly accessible repository. Additionally, many researchers wish to make their data available regardless of funder requirements both to enhance their impact and also to propel the concept of open science. However, the data curation activities that support these preservation and sharing activities are costly, requiring advanced curation practices, training, specific technical competencies, and relevant subject expertise. Few colleges or universities will&nbsp;be able to hire and sustain all of the data curation expertise locally that its researchers will require, and even those with the means to do more will benefit from a collective approach that will allow them to supplement at peak times, access specialized capacity when infrequently-curated types arise, and stabilize service levels to account for local staff transition, such as during turn-over periods. The Data Curation Network (DCN) provides a solution for partners of all sizes to develop or to supplement local curation expertise with the expertise of a resilient, distributed network, and creates a funding stream to both sustain central services and support expansion of distributed expertise over time. This paper presents our next steps for piloting the DCN, scheduled to launch in the spring of 2018 across nine partner institutions. Our implementation plan is based on planning phase research performed from 2016-2017 that monitored the types, disciplines, frequency, and curation needs of data sets passing through the curation services at the six planning phase institutions. Our DCN implementation plan includes a well-coordinated and tiered staffing model, a technology-agnostic submission workflow, standardized curation procedures, and a sustainability approach that will allow the DCN to prevail beyond the grant-supported implementation phase as a curation-as-service model.</p> Lisa R Johnston Jake Carlson Cynthia Hudson-Vitale Heidi Imker Wendy Kozlowski Robert Olendorf Claire Stewart Mara Blake Joel Herndon Timothy M. McGeary Elizabeth Hull ##submission.copyrightStatement## 2018-12-26 2018-12-26 13 1 125 140 10.2218/ijdc.v13i1.616 Making Everything Available. British Library Research Services and Research Data Strategy <p class="abstract-western">The way that researchers generate, analyse and share information keeps evolving at a rapid pace. To ensure that it is well equipped to serve its global user base for years to come, the British Library is transforming the way it works too, from the physical buildings to its digital service portfolio. One key programme, Everything Available, will ensure the Library’s continued support for research with services to enable access to information in an open and timely manner. This paper will describe the activities planned within Everything Available, with a particular focus on the aims of the Library’s recently refreshed Research Data Strategy. It will give an insight into the challenges and opportunities faced by a National Library in providing relevant services in an ‘open’ world.</p> Rachael Kotarski Torsten Reimer ##submission.copyrightStatement## 2018-12-27 2018-12-27 13 1 161 169 10.2218/ijdc.v13i1.605 Building Open-Source Digital Curation Services and Repositories at Scale <p class="abstract-western" lang="en-US">The focus of this article is to share several in-progress research and development open-source approaches that seek to design, build, and test digital curation services and repositories that have the potential to scale (the IMLS-funded Fedora DRAS-TIC and the NSF-funded Brown Dog). We also discuss the creation of a big records testbed of justice, human rights, and cultural heritage collections (100 TB and 100 million records), the emergence of Computational Archival Science (CAS), and the resulting efforts at integrating digital curation education and research.&nbsp;We ultimately seek to develop a sustainable community of users and developers, with solutions that serve the international library, archives, and scientific data management communities. We are also focused on digital curation training and education in these innovative environments.</p> Richard Marciano Gregory Jansen Will Thomas Sohan Shah Michael Kurtz ##submission.copyrightStatement## 2018-12-27 2018-12-27 13 1 170 182 10.2218/ijdc.v13i1.621 Incorporating Software Curation into Research Data Management Services: Lessons Learned <p class="abstract-western">Many large research universities provide research data management (RDM) support services for researchers. These may include support for data management planning, best practices (e.g., organization, support, and storage), archiving, sharing, and publication. However, these data-focused services may under-emphasize the importance of the software that is created to analyse said data. This is problematic for several reasons. First, because software is an integral part of research across all disciplines, it undermines the ability of said research to be understood, verified, and reused by others (and perhaps even the researcher themselves). Second, it may result in less visibility and credit for those involved in creating the software. A third reason is related to stewardship: if there is no clear process for how, when, and where the software associated with research can be accessed and who will be responsible for maintaining such access, important details of the research may be lost over time.</p> <p class="abstract-western">This article presents the process by which the RDM services unit of a large research university addressed the lack of emphasis on software and source code in their existing service offerings. The greatest challenges were related to the need to incorporate software into existing data-oriented service workflows while minimizing additional resources required, and the nascent state of software curation and archiving in a data management context. The problem was addressed from four directions: building an understanding of software curation and preservation from various viewpoints (e.g., video games, software engineering), building a conceptual model of software preservation to guide service decisions, implementing software-related services, and documenting and evaluating the work to build expertise and establish a standard service level.</p> Fernando Rios ##submission.copyrightStatement## 2018-12-28 2018-12-28 13 1 235 247 10.2218/ijdc.v13i1.608 Keep calm and fill in your DMP: Lessons Learnt from a Swiss DMP-template initiative <p class="abstract-western"><span style="color: #000000;"><span style="font-size: small;">Aligning with other funders such as Horizon 2020, the Swiss National Science Foundation (SNSF) requires researchers</span></span><span style="color: #000000;"><span style="font-size: small;">who apply for project funding to provide a Data Management Plan (DMP) as an integral part of their research proposal.</span></span><span style="color: #000000;"><span style="font-size: small;">In an attempt to assist and guide researchers filling out this document, and to provide a service as efficient as possible, the libraries of the Ecole Polytechnique Fédérale de Lausanne (EPFL) and ETH Zurich took the lead to elaborate on a DMP template with content suggestions and recommendations. In this practice paper, we will describe the collaborative effort between </span></span><span style="color: #000000;"><span style="font-size: small;">the two Swiss federal institutes of technology, namely EPFL and ETH Zurich,&nbsp;</span></span><span style="color: #000000;"><span style="font-size: small;">as well as some partners of the national Data Life Cycle Management (DLCM) project, which resulted in a very helpful document as reported by our researchers.</span></span></p> Lorenza Salvatori Ana Sesartic Nathalie Lambeng Eliane Blumer ##submission.copyrightStatement## 2018-12-27 2018-12-27 13 1 215 222 10.2218/ijdc.v13i1.617 Data Mining Research with In-copyright and Use-limited Text Datasets: Preliminary Findings from a Systematic Literature Review and Stakeholder Interviews <p class="abstract-western">Text data mining and analysis has emerged as a viable research method for scholars, following the growth of mass digitization, digital publishing, and scholarly interest in data re-use. Yet the texts that comprise datasets for analysis are frequently protected by copyright or other intellectual property rights that limit their access and use. This article discusses the role of libraries at the intersection of data mining and intellectual property, asserting that academic libraries are vital partners in enabling scholars to effectively incorporate text data mining into their research. We report on activities leading up to an IMLS-funded National Forum of stakeholders and discuss preliminary findings from a systematic literature review, as well as initial results of interviews with forum stakeholders. Emerging themes suggest the need for a multi-pronged distributed approach that includes a public campaign for building awareness and advocacy, development of best practice guides for library support services and training, and international efforts toward data standardization and copyright harmonization.</p> Megan Senseney Eleanor Dickson Beth Namachchivaya Bertram Ludäscher ##submission.copyrightStatement## 2018-12-27 2018-12-27 13 1 183 194 10.2218/ijdc.v13i1.620 A Landscape Survey of ActiveDMPs Stephanie Simms Sarah Jones Tomasz Miksa Daniel Mietchen Natasha Simons Kathryn Unsworth ##submission.copyrightStatement## 2018-12-27 2018-12-27 13 1 204 214 10.2218/ijdc.v13i1.629 Data Stewardship addressing disciplinary data management needs <p class="abstract-western"><span style="color: #000000;">One of the biggest challenges for multidisciplinary research institutions which provide data management support to researchers is addressing disciplinary differences (Akers and Doty,</span><span style="color: #006b6b;"><span lang="zxx"><a class="western">2013</a></span></span><span style="color: #000000;">). Centralised services need to be general enough to cater for all the different flavours of research conducted in an institution. At the same time, focusing on the common denominator means that subject-specific differences and needs may not be effectively addressed. In 2017, Delft University of Technology (TU Delft) embarked on an ambitious Data Stewardship project, aiming to comprehensively address data management needs across a multi-disciplinary campus. </span>In this article we describe the principles behind the Data Stewardship project at TU Delft, the progress so far, identify the key challenges and explain our plans for the future.</p> Marta Teperek Maria J. Cruz Ellen Verbakel Jasmin Böhmer Alastair Dunning ##submission.copyrightStatement## 2018-12-27 2018-12-27 13 1 141 149 10.2218/ijdc.v13i1.604 Embedded Metadata Patterns Across Web Sharing Environments <p class="abstract-western">This research project tried to determine how or if embedded metadata followed the digital object as it was shared on social media platforms by using EXIFTool, a variety of social media platforms and user profiles, the embedded metadata extracted from selected New York Public Library (NYPL) and Europeana images, PDFs from open access science journals, and captured mobile phone images. The goal of the project was to clarify which embedded metadata fields, if any, migrated with the object as it was shared across social media.</p> Santi Thompson Michele Reilly ##submission.copyrightStatement## 2018-12-27 2018-12-27 13 1 223 234 10.2218/ijdc.v13i1.607 Research Data Management Practices: Synergies and Discords between Researchers and Institutions <p class="abstract-western">The aim of this study was to explore the synergies and discords in attitudes towards research data management (RDM) drivers and barriers for both researchers and institutions. Previous work has studied RDM from a single perspective, but not compared researchers’ and institutions’ perspectives. We carried out qualitative interviews with researchers as well as institutional representatives to identify drivers and barriers, and to explore synergies and discords of both towards RDM. We mapped these to a data lifecycle model and found that the contradictions occur at early stages in the lifecycle of data and the synergies occur at the later stages. This means that for future successful RDM, the points of discord at the start of the data lifecycle must be overcome. Finally, we conclude by proposing key recommendations that could help institutions when addressing both researcher and institutional RDM needs.</p> Sally Vanden-Hehir Helena Cousijn Hesham Attalla ##submission.copyrightStatement## 2018-12-23 2018-12-23 13 1 73 90 10.2218/ijdc.v13i1.499