Using 3D Graphics to Train Object Detection Systems
MetadataShow full item record
Recent advancements in machine learning, and in particular deep neural networks, have yielded excellent object detection models. However, these techniques require vast datasets of labeled training images, which are prohibitively labor intensive to produce. This thesis explores an alternative approach to obtaining labeled training data, namely using 3D models of objects and modern game engines to generate automatically labeled synthetic training data. A simple approach for generation similar to the one used by Peng et al. (2014) is presented requiring minimal user input, making dataset generation virtually free. The real-time CNN object detection model YOLO is trained with our synthetic data to detect cars, and its performance is evaluated on real images of cars from the KITTI and PASCAL VOC public datasets, with up to 11.9% and 22.2% AP respectively. This is significantly lower than state-of-the-art detection systems that use natural image training data, but on par with the winner of the PASCAL VOC challenge in 2008, and we outline multiple avenues for further research that we believe could significantly boost the performance. Performance of models trained on datasets with different features are evaluated and compared.It is found that aspect ratio, realistic background imagery, and object occlusion are important factors for performance. This is partially contradictory to the findings of Peng et al. (2014) where they find their object detection system to be largely invariant to the background imagery. This discrepancy is likely caused by differences between the two object detection systems employed. We argue that synthetic datasets can be valuable for training of detectors of novel categories where there is a lack of training data, as well as a technique for controlled experiments to get insight on how CNNs responds to different attributes in training data.