Survival of the Transition of Records and Their Usability over Several Decades
MetadataVis full innførsel
The theme of this thesis is preserving digital information over several decades or forever. The thesis is concerned with the provision of a suitable preservation strategy to help curators rescue digital information from continuous changes in hardware/software technologies. Digital information is fragile and is difficult to preserve. In the past decade, people are gradually becoming aware of the seriousness of the problem when more and more information is captured or created by electronic devices. Consequently, there is concern about the digital black hole that could occur in the future if we do not take any action on digital preservation. Many projects have been carried out to explore how to safely preserve existing digital information from the past decades. In these projects, researchers tested various solutions. Some have tried to preserve old hardware and software. Some have tried to create applications that can simulate the old hardware. Others have tried to transfer digital information between different technology generations. The most used and popular solution is the transferal that is called migration. Migration could change digital information, but it is widely agreed that carefully designed migration will not harm digital information during the transferal. Curators have designed some migration procedures and implemented some migration solutions. However, there are still issues that are not solved. This thesis makes the following contributions: (i) State of art review on preservation strategies. Information is the purveyor of culture and knowledge. It usually has longer lifetime than its storage’s lifetime and its owner’s lifetime. Digital information has shorter lifetime than the previous, because it affected by rapid technique development. Digital information must be transferred from one technique generation to another. This work includes a wide literature survey on preservation strategies. Ten strategies are analysed for keeping digital information alive. Curators can know about the strategies and adopt the best strategy in terms of their organization situation. (ii) A multi-criteria decision-making approach to support the selection of solutions for migration. Migration could be storage migration, format migration and application migration. When curators plan to do a migration, they face many migration issues. It is important for curators to carefully select the most suitable solution. This work designs a selection framework using a multi-criteria decision-making approach. A case study is used to validate the framework assisted by the National Library of Norway. The case study focuses on format migration, because format migration is the most complex in migrations. A wrong format migration decision will not only waste money and time, but could also cause irreversible damage to the digital information. Using the framework, it is possible to assess target formats, transformation programs and migration results with a set of mathematical formulas, so that we finally get a recommended solution to format migration. (iii) A set of requirements to ensure the quality of migration metadata. Digital information is different from traditional information. Digital information is composed of 0 and 1 bits, but it is organized according to a protocol. With the rapid technical evolution, the protocol will become obsolescent and might not be supported by the new digital environment. In order to continuously handle the digital information, curators should transform the digital object in accordance with a new protocol using a migration procedure. To do that transformation, the curators must have the necessary instructions to know about how the information is digitized. Furthermore, they must document the migration actions to certify the authentication of the digital object. This documentation should be embedded into or be packaged with the digital information. This work has designed eleven requirements for migration metadata quality. The requirements cover various aspects that can help curators do the migration and keep the quality of the digital information. Both a case study and a survey are used to validate the qualitative requirements. Curators can use these requirements as a general checklist to ensure that the digital information can be trusted over time. (iv) A tool to estimate the time needed for information migration. In a preservation organization, curators could use various kinds of storage and formats for digital information. It becomes hard for the curators to estimate the migration time if they want to replace their system. Therefore, this work has designed a set of formulas for different kinds of migration solutions, and several experiments have been done at the National Library of Norway and at NTNU’s Department of Computer and Information Science to evaluate the accuracy of the formulas. We found that the real migration times are often close to the computed low bound. Thus, it is easy for the curators to calculate the migration time upper bound and the lower bound with the formulas, because they can obtain the necessary input parameters values of the formulas by checking hardware or software specifications or by running some benchmark applications. The migration actions can be taken in time and on time. (v) A tool for extracting migration metadata. The application will scan the preservation system to generate a report about what techniques are being used in terms of the stored preservation metadata. In our experiments, we analyze the MEST preservation metadata schema because the National Library of Norway is using it. We show how to map the schema elements to our quality requirements. The application can help curators fetch the overview of the preservation system. They thus will know the migration scale, such as how many techniques are embedded within the file, and how many files are going to be migrated and the size of the total result. This tool has demonstrated satisfactory performance and given good results in the experiment.