How does a recognition system work?

Everything is relative

In physical world you can measure the distance between two people by the classic Eucledian distance. What about photos? On photos of these people you can define a very primitive metric that still satisfies the definition of metrics in general. "Is it the same picture or not" - translate it to {0, 1} and you are done. However people usually can tell when they see a familiar face. Can this similiarity be also quantified like physical distances between two people? What would be our basic measures? Hair or skin color? Age or gender? Maybe the number of teeths... Actually it is very difficult to capture these features with traditional face recognition approaches, and even if we had a perfect system, we have to get some expertise on subjects we want to identify. It is overwhelming and difficult to find features that matter, that shows the greatest variance between different subjects, and change slightly for the same subject. Since Deep Learning methods are especially good at finding features why should we keep our GPUs on leash? If we have multiple pictures from different subjects, then we can train our network to project the image (a bunch of RGB values) into a feature space, where every image of the same subject would "look" similar. While keeping projections of the same subject close, we should also train it to embedd / encode images of different people in such manner that it would be easy to see the difference. How do we formulate this constraint? We just use the generalization of the same distance as we would use to measure the distance in the physical world: take the sum of squares of every coordinate's difference (and take the square root..whatever).

SAME SUBJECT

So what we will end up is a clever algorithm that can put images into a space where portraits of the same person can be grouped together by drawing a circle. Benefit? If we get a new image, a query image, which we don't know yet who it belongs to, we can find its closest neighbours.

DIFFERENT SUBJECT

What about newcomers? We trained our network to pull similar images under a certain treshold distance and push samples of different subjects further than the treshold distance (outside of the circle, if you'd like). Therefore we are not just able to tell who looks the most likely to our newcomer, but if there are no such neighbouring images in our database that would be closer than the threshold distance, than we can say that the query image is unrecognised. That's all.

If you are interested in developing face recognition systems go to: github page
For further questions contact me: itkdeeplearn@gmail.com