The problem of how to recognize and localize objects on images is well-studied and gained very promising result in computer vision research area; specifically after the neural networks are widely used in the field. However, these researches generally requires fully supervised setting which is the use of bounding box annotations of class examples. Such a need to reach high performance is generally very costly. Instead, we propose a framework in which we don’t directly learn the localization of specific classes; the model learns to how to learn localization.
Object Recognition has been one of the most increasingly important issues in Image Processing, finding applications in Computer Vision, Photography, Multimedia Retrieval, Detection and Data Classification. Based on the bag-of-words model, we demonstrate the efficiency of an advanced algorithm capable of performing objet recognition for both indoor and outdoor images. We use SIFT for retrieving objects’ local features, k-means Clustering for Codebook formation, train the system with SVM using some manually labeled images, and test the it with similar data set.