Visual Monitoring of Retail shelves (or as a matter of fact any object detect problem) can be solved very well by Deep Learning methods like RCNN, YOLO or SSD. Creating annotated datasets for these technologies is tedious, time taking and expensive. In our new research, we have come up with a method which can detect and localize objects in images, given just a dataset of object instances to train (and not localized object instances for training like RCNN, SSD and YOLO). Such techniques are called weakly supervised object detection techniques.
So for example if one needs to locate redbull cans in images, all they have to do is to train the algorithm on a few photos of individual redbull cans. Our network has two stages, where we use a Fully Convoluted architecture trained to classify the object as a primary segmentation map and then rectify this map by a Convolutional Encoder Decoder trained on a synthetic dataset.