International Journal of Digital Curation <p>The IJDC publishes peer-reviewed papers, articles and editorials on digital curation, research data management and related issues. &nbsp;</p> University of Edinburgh en-US International Journal of Digital Curation 1746-8256 <p>Copyright for papers and articles published in this journal is retained by the authors, with first publication rights granted to the University of Edinburgh. It is a condition of publication that authors license their paper or article under a <a href="" rel="license">Creative Commons Attribution Licence</a>.<br><br><a href="" rel="license"><img style="border-width: 0;" src="" alt="Creative Commons License"></a></p> Tuuli project: accelerating data management planning in Finnish research organisations <p class="BodyText2">Many research funders have requirements for data sharing and data management plans (DMP). DMP tools are services built to help researchers to create data management plans fitting their needs and based on funder and/or organisation guidelines. Project Tuuli (2015–2017) has provided DMPTuuli, a data management planning tool for Finnish researchers and research organisations offering DMP templates and guidance. In this paper we describe how project has helped both Finnish researchers and research organisations adopt research data management best practices. As a result of the project we have also created a national Tuuli network. With growing competence and collaboration of the network, the project has reached most of its goals. The project has also actively promoted DMP support and training in Finnish research organisations.</p> Minna Ahokas Mari Elisa Kuusniemi Jari Friman ##submission.copyrightStatement## 2018-02-11 2018-02-11 12 2 107 115 10.2218/ijdc.v12i2.512 Building Tools to Support Active Curation: Lessons Learned from SEAD <p class="abstract-western">SEAD – a project funded by the US National Science Foundation’s DataNet program – has spent the last five years designing, building, and deploying an integrated set of services to better connect scientists’ research workflows to data publication and preservation activities. Throughout the project, SEAD has promoted the concept and practice of “active curation,” which consists of capturing data and metadata early and refining it throughout the data life cycle. In promoting active curation, our team saw an opportunity to develop tools that would help scientists better manage data for their own use, improve team coordination around data, implement practices that would serve the data better over time, and seamlessly connect with data repositories to ease the burden of sharing and publishing.</p> <p class="abstract-western">SEAD has worked with 30 projects, dozens of researchers, and hundreds of thousands of files, providing us with ample opportunities to learn about data and metadata, integrating with researchers’ workflows, and building tools and services for data. In this paper, we discuss the lessons we have learned and suggest how this might guide future data infrastructure development efforts.</p> Dharma Akmon Margaret Hedstrom James D. Myers Anna Ovchinnikova Inna Kouper ##submission.copyrightStatement## 2018-01-02 2018-01-02 12 2 76 85 10.2218/ijdc.v12i2.552 Reuse for Research: Curating Astrophysical Datasets for Future Researchers <p class="abstract-western"><span style="color: #000000;">“Our data are going to be valuable for science for the next 50 years, so please make sure you preserve them and keep them accessible for active research for at least that period.”</span></p> <p class="abstract-western">These were approximately the words used by the principal investigator of the Kepler Asteroseismic Science Consortium (KASC) when he presented our task to us. The data in question consists of data products produced by KASC researchers and working groups as part of their research, as well as underlying data imported from the NASA archives.</p> <p class="abstract-western">The overall requirements for 50 years of preservation while, at the same time, enabling reuse of the data for active research presented a number of specific challenges, closely intertwining data handling and data infrastructure with scientific issues. This paper reports our work to deliver the best possible solution, performed in close cooperation between the research team and library personnel.</p> Anders Sparre Conrad Rasmus Handberg Michael Svendsen ##submission.copyrightStatement## 2017-12-30 2017-12-30 12 2 37 46 10.2218/ijdc.v12i2.516 Integration of an Active Research Data System with a Data Repository to Streamline the Research Data Lifecyle: Pure-NOMAD Case Study <p class="abstract-western"><span style="color: #000000;">Research funders have introduced requirements that expect researchers to properly manage and publicly share their research data, and expect institutions to put in place services to support researchers in meeting these requirements. So far the general focus of these services and systems has been on addressing the final stages of the research data lifecycle (archive, share and re-use), rather than stages related to the active phase of the cycle (collect/create and analyse). As a result, full integration of active data management systems with data repositories is not yet the norm, making the streamlined transition of data from an active to a published and archived status an important challenge. In this paper we present the integration between an active data management system developed in-house (NOMAD) and Elsevier’s Pure data repository used at our institution, with the aim of offering a simple workflow to facilitate and promote the data deposit process. The integration results in a new data management and publication workflow that helps researchers to save time, minimize human errors related to manually handling files, and further promote data deposit together with collaboration across the institution</span><span style="color: #000000;">.</span></p> Simone Ivan Conte Federica Fina Michalis Psalios Shyam Ryal Tomas Lebl Anna Clements ##submission.copyrightStatement## 2018-04-19 2018-04-19 12 2 210 219 10.2218/ijdc.v12i2.570 When Scientists Become Social Scientists: How Citizen Science Projects Learn About Volunteers <p class="abstract-western">Online citizen science projects involve recruitment of volunteers to assist researchers with the creation, curation, and analysis of large datasets. Enhancing the quality of these data products is a fundamental concern for teams running citizen science projects. Decisions about a project’s design and operations have a critical effect both on whether the project recruits and retains enough volunteers, and on the quality of volunteers’ work. The processes by which the team running a project learn about their volunteers play a critical role in these decisions. Improving these processes will enhance decision-making, resulting in better quality datasets, and more successful outcomes for citizen science projects. This paper presents a qualitative case study, involving interviews and long-term observation, of how the team running Galaxy Zoo, a major citizen science project in astronomy, came to know their volunteers and how this knowledge shaped their decision-making processes. This paper presents three instances that played significant roles in shaping Galaxy Zoo team members’ understandings of volunteers. Team members integrated heterogeneous sources of information to derive new insights into the volunteers. Project metrics and formal studies of volunteers combined with tacit understandings gained through on- and offline interactions with volunteers. This paper presents a number of recommendations for practice. These recommendations include strategies for improving how citizen science project team members learn about volunteers, and how teams can more effectively circulate among themselves what they learn.</p> Peter Darch ##submission.copyrightStatement## 2018-04-23 2018-04-23 12 2 61 75 10.2218/ijdc.v12i2.551 Are the FAIR Data Principles fair? <p class="abstract-western">This practice paper describes an ongoing research project to test the effectiveness and relevance of the FAIR Data Principles. Simultaneously, it will analyse how easy it is for data archives to adhere to the principles. The research took place from November 2016 to January 2017, and will be underpinned with feedback from the repositories.</p> <p class="abstract-western"><span style="color: #000000;">The FAIR Data Principles feature 15 facets corresponding to the four letters of FAIR - Findable, Accessible, Interoperable, Reusable. These principles have already gained traction within the research world. The European Commission has recently expanded its demand for research to produce open data. The relevant guidelines</span><sup><span style="color: #000000;"><a class="sdfootnoteanc" name="sdfootnote1anc"></a>1</span></sup><span style="color: #000000;">are explicitly written in the context of the FAIR Data Principles. Given an increasing number of researchers will have exposure to the guidelines, understanding their viability and suggesting where there may be room for modification and adjustment is of vital importance.</span></p> <p class="abstract-western"><span style="color: #000000;">This practice paper is connected to a dataset</span><span style="color: #000000;">(Dunning et al.,</span><span style="color: #006b6b;"><span lang="zxx"><a class="western">2017</a></span></span><span style="color: #000000;">) containing the original overview of the sample group statistics and graphs, in an Excel spreadsheet. Over the course of two months, the web-interfaces, help-pages and metadata-records of over 40 data repositories have been examined, to score the individual data repository against the FAIR principles and facets. The traffic-light rating system enables colour-coding according to compliance and vagueness. The statistical analysis provides overall, categorised, on the principles focussing, and on the facet focussing results.</span></p> <p class="abstract-western">The analysis includes the statistical and descriptive evaluation, followed by elaborations on Elements of the FAIR Data Principles, the subject specific or repository specific differences, and subsequently what repositories can do to improve their information architecture.</p> <div id="sdfootnote1"> <p class="western"><a class="sdfootnotesym-western" name="sdfootnote1sym"></a>(1) H2020 Guidelines on FAIR Data Management:<span style="color: #006b6b;"><span lang="zxx"><a class="western" href=""></a></span></span></p> </div> Alastair Dunning Madeleine de Smaele Jasmin Böhmer ##submission.copyrightStatement## 2018-04-23 2018-04-23 12 2 177 195 10.2218/ijdc.v12i2.567 A Framework for the Preservation of a Docker Container <p>Reliably building and maintaining systems across environments is a continuing problem. A project or experiment may run for years. Software and hardware may change as can the operating system. Containerisation is a technology that is used in a variety of companies, such as Google, Amazon and IBM, and scientific projects to rapidly deploy a set of services repeatably. Using Dockerfiles to ensure that a container is built repeatably, to allow conformance and easy updating when changes take place are becoming common within projects. Its seen as part of sustainable software development. Containerisation technology occupies a dual space: it is both a repository of software and software itself. In considering Docker in this fashion, we should verify that the Dockerfile can be reproduced. Using a subset of the Dockerfile specification, a domain specific language is created to ensure that Docker files can be reused at a later stage to recreate the original environment. We provide a simple framework to address the question of the preservation of containers and its environment. We present experiments on an existing Dockerfile and conclude with a discussion of future work. Taking our work, a pipeline was implemented to check that a defined Dockerfile conforms to our desired model, extracts the Docker and operating system details. This will help the reproducibility of results by creating the machine environment and package versions. It also helps development and testing through ensuring that the system is repeatably built and that any changes in the software environment can be equally shared in the Dockerfile. This work supports not only the citation process it also the open scientific one by providing environmental details of the work. As a part of the pipeline to create the container, we capture the processes used and put them into the W3C PROV ontology. This provides the potential for providing it with a persistent identifier and traceability of the processes used to preserve the metadata. Our future work will look at the question of linking this output to a workflow ontology to preserve the complete workflow with the commands and parameters to be given to the containers. We see this provenance within the build process useful to provide a complete overview of the workflow.</p> Iain Emsley David De Roure ##submission.copyrightStatement## 2018-04-02 2018-04-02 12 2 125 135 10.2218/ijdc.v12i2.509 Developing a Digital Archive for Symbolic Resources in Urban Environments - the Latina Project <p class="abstract-western"><span style="color: #000000;">The project described in this paper was funded to establish the foundation for a digital archival resource for researchers interested in the way people interact with urban environments through graphic communications. The research was internally funded by Loughborough University as part of its Research Challenge Programme and involved two members of academic staff and two library staff.[1]&nbsp;</span><sup><span style="color: #000000;"><a class="sdfootnoteanc" name="sdfootnote1anc"></a></span></sup><span style="color: #000000;">Two PhD students also participated.</span></p> <p class="abstract-western">The archive consists of a small number of images and will act as a proof of concept, not only for this project but also for current and future funding applications. It is hoped that an extended archive will be useful not only to visual communication researchers, but also historians, architects, town planners and others. This paper will describe the data collection process, the challenges facing the project team in data curation and data documentation, and the creation of the pilot archive.</p> <p class="abstract-western">The creation of the archive posed challenges for both the researchers and Library staff. For the researchers:</p> <ul> <li> <p class="abstract-western">Choosing a small number of images as a discrete collection but which also demonstrated the utility of the project to other disciplinary areas;</p> </li> <li> <p class="abstract-western">Acquiring the necessary knowledge and skills to enable good curation and usability of the digital objects, e.g. file formats, metadata creation;</p> </li> <li> <p class="abstract-western">Understanding what the technical solution enabled and where compromises would have to be made.</p> </li> </ul> <p class="abstract-western">For library staff:</p> <ul> <li> <p class="abstract-western">Demonstrating the utility of the Data Repository;</p> </li> <li> <p class="abstract-western">Understanding the intellectual background to the project and the purpose&nbsp;of the Data Archive within the project;</p> </li> <li> <p class="abstract-western">Clearly explaining the purpose of metadata and documentation.</p> </li> </ul> <p class="abstract-western">The Latina Project has demonstrated the value of a true partnership between the academic community and the professional services. All parties involved have learnt from the creation of the pilot archive and their practices have evolved. For example, it has made the researchers think more carefully about data curation questions and the professional services staff identify more closely with the research purposes for data creation. By working together so closely and sharing ideas from our different perspectives we have also identified potential technical developments which could be explored in future projects. All members of the group hope that the relationships built during this project will continue through other projects. [1] Academic staff: Drs Harland and Liguori. Library staff: Gareth Cole and Barbara Whetnall.</p> Robert Harland Antonia Liguori Gareth Cole ##submission.copyrightStatement## 2018-04-02 2018-04-02 12 2 136 145 10.2218/ijdc.v12i2.511 Creating a Community of Data Champions <p class="abstract-western">Research Data Management (RDM) presents an unusual challenge for service providers in Higher Education. There is increased awareness of the need for training in this area but the nature of the discipline-specific practices involved make it difficult to provide training across a multi-disciplinary organisation. Whilst most UK universities now have a research data team of some description, they are often small and rarely have the resources necessary to provide targeted training to the different disciplines and research career stages that they are increasingly expected to support.</p> <p class="abstract-western">This practice paper describes the approach taken at the University of Cambridge to address this problem by creating a community of Data Champions. This collaborative initiative, working with researchers to provide training and advocacy for good RDM practice, allows for more discipline-specific training to be given, researchers to be credited for their expertise and creates an opportunity for those interested in RDM to exchange knowledge with others. The ‘community of practice’ model has been used in many sectors, including Higher Education, to facilitate collaboration across organisational units and this initiative will adopt some of the same principles to improve communication across a decentralised institution. The Data Champions initiative at Cambridge was launched in September 2016 and this paper reports on the early months, plans for building the community in the future and the possible risks associated with this approach to providing RDM services.</p> Rosie Higman Marta Teperek Danny Kingsley ##submission.copyrightStatement## 2018-02-11 2018-02-11 12 2 96 106 10.2218/ijdc.v12i2.562 Introducing Safe Access to Sensitive Data at the University of Bristol <p class="abstract-western"><span style="color: #000000;">T</span><span style="color: #000000;">he economic and societal benefits of making research data available for reuse and verification are now widely understood and accepted. However, there are some research studies, particularly those involving human participants, which face particular challenges in making their data openly available due to the sensitivities of the data. Despite its potential value to society this material is invariably kept locked away due to concerns over its inappropriate disclosure. The University of Bristol’s Research Data Service has developed the institutional infrastructure, including policies and procedures, required to safely grant access to sensitive research data in a way that is transparent, secure, sustainable and crucially, replicable by other institutions.</span></p> <p class="abstract-western">This paper looks at the background and challenges faced by the institution in dealing with sensitive data, outlines the approach taken and some of the outstanding issues to be tackled.</p> <p>This paper looks at the background and challenges faced by the institution in dealing with sensitive data, outlines the approach taken and some of the outstanding issues to be tackled.</p> Debra Hiom Stephen Gray Damian Steer Kirsty Merrett Kellie Snow Zosia Beckles ##submission.copyrightStatement## 2017-12-30 2017-12-30 12 2 26 36 10.2218/ijdc.v12i2.506 Evaluating the Effectiveness of Data Management Training: DataONE’s Survey Instrument <div class="WordSection1"> <p class="abstract-western">Effective management is a key component for preparing data to be retained for future long term access, use, and reuse by a broader community. Developing the skills to plan and perform data management tasks is important for individuals and institutions. Teaching data literacy skills may also help to mitigate the impact of data deluge and other effects of being overexposed to and overwhelmed by data.</p> <p class="abstract-western">The process of learning how to manage data effectively for the entire research data lifecycle can be complex. There are often multiple stages involved within a lifecycle for managing data, and each stage may require specific knowledge, expertise, and resources. Additionally, although a range of organizations offers data management education and training resources, it can often be difficult to assess how effective the resources are for educating users to meet their data management requirements.</p> <p class="abstract-western">In the case of Data Observation Network for Earth (DataONE), DataONE’s extensive collaboration with individuals and organizations has informed the development of multiple educational resources. Through these interactions, DataONE understands that the process of creating and maintaining educational materials that remain responsive to community needs is reliant on careful evaluations. Therefore, the impetus for a comprehensive, customizable Education EVAluation instrument (EEVA) is grounded in the need for tools to assess and improve current and future training and educational resources for research data management.</p> <p class="abstract-western">In this paper, the authors outline and provide context for the background and motivations that led to creating EEVA for evaluating the effectiveness of data management educational resources. The paper details the process and results of the current version of EEVA. Finally, the paper highlights the key features, potential uses, and the next steps in order to improve future extensions and revisions of EEVA.</p> </div> Chung-Yi Hou Heather Soyka Vivian Hutchison Isis Sema Chris Allen Amber Budden ##submission.copyrightStatement## 2017-12-31 2017-12-31 12 2 47 60 10.2218/ijdc.v12i2.508 Researcher Training in Spreadsheet Curation <p>Spreadsheets are commonly used across most academic discplines, however their use has been associated with a number of issues that affect the accuracy and integrity of research data. In 2016, new training on spreadsheet curation was introduced at the University of Sydney to address a gap between practical software skills training and generalised research data management training. The approach to spreadsheet curation behind the training was defined and the training's distinction from other spreadsheet curation training offering described.\par<br>The uptake of and feedback on the training were evaluated. Training attendance was analysed by discipline and by role. Quantitative and qualitative feedback were analysed and discussed. Feedback revealed that many attendees had been expecting and desired practical spreadsheet software skills training. Issues relating to whether or not practical skills training should and can be integrated with curation training were discussed. While attendees were found to be predominantly from science disciplines, qualitative feedback suggests that humanities attendees have specific needs in relation to managing data with spreadsheets that are currently not being met. Feedback also suggested that some attendees would prefer the curation training to be delivered as a longer, more in depth, hands on workshop.\par<br>The impact of the training was measured using data collected from the University's Research Data Management Planning (RDMP) tool and the Sydney eScholarship Repository. RDMP descriptions of spreadsheet data and records of tabular datasets published in the repository were analysed and assessed for quality and for accompanying data documentation. No significant improvements in data documentation or quality were found, however it is likely too soon after the launch of the training program to have seen much in the way of impact.\par<br>Identified next steps include clarifying the marketing material promoting to the training to better communicate the curation focus, investigating the needs of humanities researchers working with qualitative data in spreadsheets, and incorporating new material into the training in order to address those needs. Integrating curation training with practical skills training and modifying the training to be more hands on are changes that may be considered in future, but will not be implemented at this stage.</p> Gene Lyddon Melzack ##submission.copyrightStatement## 2018-04-01 2018-04-01 12 2 116 124 10.2218/ijdc.v12i2.507 Sharing Selves: Developing an Ethical Framework for Curating Social Media Data <p class="abstract-western">Open sharing of social media data raises new ethical questions that researchers, repositories and data curators must confront, with little existing guidance available. In this paper, the authors draw upon their experiences in their multiple roles as data curators, academic librarians, and researchers to propose the STEP framework for curating and sharing social media data. The framework is intended to be used by data curators facilitating open publication of social media data. Two case studies from the Dryad Digital Repository serve to demonstrate implementation of the STEP framework. The STEP framework can serve as one important ‘step’ along the path to achieving safe, ethical, and reproducible social media research practice.</p> Sara Mannheimer Elizabeth A. Hull ##submission.copyrightStatement## 2018-04-18 2018-04-18 12 2 196 209 10.2218/ijdc.v12i2.518 Setting up a National Research Data Curation Service for Qatar: Challenges and Opportunities <p class="abstract-western" lang="en-GB"><span style="color: #00000a;">Over the past decade, Qatar has been making considerable progress towards developing a sustainable research culture for the nation. The main driver behind Qatar’s progress in research and innovation is Qatar Foundation for Education, Science, and Community Development (QF), a private, non-profit organization that aims to utilise research as a catalyst for expanding, diversifying and improving the country’s economy, health and environment. While this has resulted in a significant growth in the number of research publications produced by Qatari researchers in recent years, a nationally co-ordinated approach is needed to address some of the emerging but increasingly important aspects of research data curation, such as management and publication of research data as important outputs, and their long-term digital preservation. Qatar National Library (QNL), launched in November 2012 under the umbrella of QF, aims to establish itself as a centre of excellence in Qatar for research data management, curation and publishing to address the research data-related needs of Qatari researchers and academics. This paper describes QNL’s approach towards establishing a national research data curation service for Qatar, highlighting the associated opportunities and key challenges.</span></p> Arif Shaon Armin Straube Krishna Roy Chowdhury ##submission.copyrightStatement## 2018-04-02 2018-04-02 12 2 146 156 10.2218/ijdc.v12i2.515 Is Democracy the Right System? Collaborative Approaches to Building an Engaged RDM Community <p class="abstract-western">When developing new products, tools or services, one always need to think about the end users to ensure a wide-spread adoption. While this applies equally to services developed at higher education institutions, sometimes these services are driven by policies and not by the needs of end users. This policy-driven approach can prove challenging for building effective community engagement. The initial development of Research Data Management support services at the University of Cambridge was policy-driven and subsequently failed in the first instance to engage the community of researchers for whom these services were created.</p> <p class="abstract-western">In this practice paper, we describe the initial approach undertaken at Cambridge when developing RDM services, the results of this approach and lessons learnt. We then provide an overview of alternative, democratic strategies employed and their positive effects on community engagement. We summarise by performing a cost-benefit analysis of the two approaches. This paper might be a useful case study for any institutions aiming to develop central support services for researchers, with conclusions applicable to the wider sector, and extending beyond Research Data Management services.</p> Marta Teperek Rosie Higman Danny Kingsley ##submission.copyrightStatement## 2018-02-11 2018-02-11 12 2 86 95 10.2218/ijdc.v12i2.561 Encouraging and Facilitating Laboratory Scientists to Curate at Source <p class="abstract-western">Computers and computation have become essential to scientific activity and significant amounts of data are now captured digitally or even “born digital”. Consequently, there is more and more incentive to capture the full experiment records using digital tools, such as Electronic Laboratory Notebooks (ELNs), to enable the effective linking and publication of experiment design and methods with the digital data that is generated as a result. Inclusion of metadata for experiment records helps with providing access, effective curation, improving search, and providing context, and further enables effective sharing, collaboration, and reuse.</p> <p class="abstract-western">Regrettably, just providing researchers with the facility to add metadata to their experiment records does not mean that they will make use of it, or if they do, that the metadata they add will be relevant and useful. Our research has clearly indicated that researchers need support and tools to encourage them to create effective metadata. Tools, such as ELNs, provide an opportunity to encourage researchers to curate their records during their creation, but can also add extra value, by making use of the metadata that is generated to provide capabilities for research management and Open Science that extend far beyond what is possible with paper notebooks.</p> <p class="abstract-western">The Southampton Chemical Information group, has, for over fifteen years, investigated the use of the Web and other tools for the collection, curation, dissemination, reuse, and exploitation of scientific data and information. As part of this activity we have developed a number of ELNs, but a primary concern has been how best to ensure that the future development of such tools is both usable and useful to researchers and their communities, with a focus on curation at source. In this paper, we describe a number of user research and user studies to help answer questions about how our community makes use of tools and how we can better facilitate the capture and curation of experiment records and the related resources.</p> Cerys Willoughby Jeremy Frey ##submission.copyrightStatement## 2017-12-30 2017-12-30 12 2 1 25 10.2218/ijdc.v12i2.514 Archiving Large-Scale Legacy Multimedia Research Data: A Case Study <p class="Abstract">In this paper we provide a case study of the creation of the DCAL Research Data Archive at University College London. In doing so, we assess the various challenges associated with archiving large-scale legacy multimedia research data, given the lack of literature on archiving such datasets. We address issues such as the anonymisation of video research data, the ethical challenges of managing legacy data and historic consent, ownership considerations, the handling of large-size multimedia data, as well as the complexity of multi-project data from a number of researchers and legacy data from eleven years of research.</p> Claudia Yogeswaran Kearsy Cormier ##submission.copyrightStatement## 2018-04-02 2018-04-02 12 2 157 176 10.2218/ijdc.v12i2.484