Generation of a Skeleton Corpus of Digital Objects for the Validation and Evaluation of Format Identification Tools and Signatures

Ross Spencer

2013, Vol. 8, No. 1, pp. 120-130

doi:10.2218/ijdc.v8i1.249


Abstract


To preserve digital information it is vital that the format of that information can be identified, in-perpetuity. This is the major focus of research within the field of Digital Preservation. The National Archives of the UK called for the Digital Preservation and Digital Curation communities to develop a test corpus of digital objects to help further develop tools to aid this purpose. Following that call, an attempt has been made to develop the suite.

This paper initially outlines a methodology to generate a skeleton corpus using simple user-generated digital objects. It then explores the lessons learnt in the generation of a corpus using scripting language techniques from the file format signatures described in The National Archives PRONOM technical registry. It will also discuss the use of the digital signature for this purpose, the benefits of developing a test corpus using this technique. Finally, this paper will outline a methodology for future research before exploring how the community can best make use of the output of this project and how this project needs to be taken forward to completion.


Full Text: PDF

The International Journal of Digital Curation. ISSN: 1746-8256
The IJDC is published by the University of Edinburgh
and is a publication of the Digital Curation Centre.