Privacy preserving for Big Data Analysis
MetadataVis full innførsel
- Master's theses (TN-IDE) 
The Safer@Home  project at the University of Stavanger aims to create a smart home system capturing sensor data from homes into it’s data cluster. To provide assistive services through data analytic technologies, sensor data has to be collected centrally in order to effectively perform knowledge discovery algorithms. This Information collected from such homes is often very sensitive in nature and needs to be protected while processing or sharing across the value chain. Data has to be perturbed to protect against the disclosure and misuse by adversaries. Anonymization is the process of perturbing data by generalizing and suppresing identifiers which could be a potential threat by linking them with publicly available databases. There is a great challenge of maintaining privacy while still retaining the utitlity of the data. This thesis evaluates various anonymization methods that suits our require- ments. We present the software requirement specification of an anonymization framework and provide the practical implementation of a well accepted privacy preserving anonymization algorithm called Mondrian . To quantify the in- formation loss during the anonymization process, a framework is proposed to evaluate the anonymized dataset. Moreover, it proposes the distributed method for solving the anonymization process using the Hadoop MapReduce framework to make a scalable system for big data analysis.
Master's thesis in Computer Science