The problem of how to recognize and localize objects on images is well-studied and gained very promising result in computer vision research area; specifically after the neural networks are widely used in the field. However, these researches generally requires fully supervised setting which is the use of bounding box annotations of class examples. Such a need to reach high performance is generally very costly. Instead, we propose a framework in which we don’t directly learn the localization of specific classes; the model learns to how to learn localization. Thus, we are able to localize unseen test classes which are not joined by training classes. With the help of our proposed framework, we can eliminate the need of bounding boxes for test classes via knowledge transfer and learning to learn concepts.