Intelligent Scheduled Backup Using Duplicity
MetadataVis full innførsel
Digital information has rapidly become an important part of everyday human life. Consequently, backup solutions are important to ensure digital property is safely stored and protected. This thesis will do an in-depth study of Duplicity, a backup solution providing encrypted, bandwidth-efficient backup to both local and remote storage. The first part of the thesis investigate Duplicity in different use case scenarios, reporting on the advantages and disadvantages of the software. Research is done to explore how various options affect backup and restoration time. Particularly, the impact of encryption, compression, and incremental backup chains are of interest. Tests are also conducted with four cloud storage providers to investigate if the choice of cloud provider has a large impact on Duplicity's performance. Encryption's impact on backup execution time is concluded to be minimal. Users should rather perform analysis of data content to identify if execution time may be decreased through compression level settings. Investigation of incremental backup properties clearly shows some of the issues that arise with the use of incremental backups. While incremental backup techniques save bandwidth and storage cost when performing backups, the resources spent while restoring is greatly increased. Finally, an original system for intelligent distributed backup to be used together with Duplicity is introduced. The system utilize erasure codes as the cornerstone of a minimalistic client application that distributes partial data to different storage hosts. The main objective of the system is to increase the availability and reliability of backups. System requirements and vital components are identified through analysing the systems main objectives. The ideas and architecture lead to a proof of concept prototype. Open source libraries and self-written source code show how the key components solve the objectives; increased availability and reliability. Statistical analysis and calculations are utilized to show the availability properties of the system. Consequently, it is concluded that a backup solution using Duplicity and erasure codes is able to provide reliable distributed backup through encoding of the original data.