Prediction of Protein Function using semantic Fingerprints - Multivariate Data Classification by Artificial Neural Networks involving dimensional Reduction
MetadataVis full innførsel
- Institutt for fysikk 
In many fields of today's scientific research the amount of knowledge is far smaller than the amount of accessible data and often too limited to make meaningful analyses and draw reasonable conclusions. Hence, statistical knowledge inference becomes more and more popular in multivariate data analysis and machine learning tools from artificial intelligence are applied. However, there are issues when dealing with increased sparsity, high-dimensionality and a lack of training samples which pose new requirements on data analysis tools.Particularly the experimental determination of protein function is challenging and cumbersome. Theoretical predictions are challenging due to the variety and complexity of macromolecules. However, recent massive knowledge integration approaches in systems biology resulted in curated semantic knowledge-based systems that could make relevant problems more and more feasible.One of these problems is the identification of specific DNA-binding RNA polymerase II transcription factors (DbTFs). In this work semantic knowledge-based systems are exploited for DbTF prediction and the feasibility of the approach is explored. This master's project involves the design, implementation, rigorous testing and optimisation of a specific methodology to classify putative DbTFs by an artificial neural network approach. From 2655 candidate proteins of the TFcheckpoint database a selection of 54 proteins is classified as DbTFs with a relative classification error of less than 10%.