Which Painting Do You Look Like? Comparing Faces Using Python and OpenCV


Many years ago, as I was wandering around the Louvre, I came across a painting which bore an uncanny resemblance to me!

Comparing Faces

Spooky, eh?

Yeah, yeah, it's not the greatest likeness ever, but people who know me seem to think I look like the chap on the left.

This got me thinking... Wouldn't it be great if when you entered an art gallery, a computer could tell you which painting you look most like?

Well, I think it would be great. This is my blog, so what I say goes!

Getting The Data

I'm using the Tate's Open Data Set to grab scans of all their artwork. ~60,000 images in total.

Finding Faces

Not all paintings are of people. Some artsy types like to paint landscapes, dogs, starry nights, etc.

Using Python and OpenCV, we can detect faces in paintings. Then crop out the face and save it. The complete code is on GitHub - but here are the essentials.

import sys, os
import cv2
import urllib
from urlparse import urlparse

def detect(path):
    img = cv2.imread(path)
    cascade = cv2.CascadeClassifier("haarcascade_frontalface_alt.xml")
    rects = cascade.detectMultiScale(img, 1.3, 4, cv2.cv.CV_HAAR_SCALE_IMAGE, (20,20))

    if len(rects) == 0:
        return [], img
    rects[:, 2:] += rects[:, :2]
    return rects, img

def box(rects, img, file_name):
    i = 0   #   Track how many faces found
    for x1, y1, x2, y2 in rects:
        print "Found " + str(i) + " face!"  #   Tell us what's going on
        cut = img[y1:y2, x1:x2] #   Defines the rectangle containing a face
        file_name = file_name.replace('.jpg','_')   #   Prepare the filename
        file_name = file_name + str(i) + '.jpg'
        file_name = file_name.replace('n','')
        print 'Writing ' + file_name
        cv2.imwrite('detected/' + str(file_name), cut)   #   Write the file
        i += 1  #   Increment the face counter

def main():
    #   all.txt contains a list of thumbnail URLs
    for line in open('all.txt'):
        file_name = urlparse(line).path.split('/')[-1]
        print "URL is " + line

        if (urllib.urlopen(line).getcode() == 200):
            #   Download to a temp file
            urllib.urlretrieve(line, "temp.jpg")
            #   Detect the face(s)
            rects, img = detect("temp.jpg")
            #   Cut and kepp
            box(rects, img, file_name)
        else:
            print '404 - ' + line

if __name__ == "__main__":
    main()

We now have a directory of files. Each file is a separate face. We assume that no two faces are of the same person - this is important for the next stage...

Building Eigenfaces

Imagine that a picture of your face could be represented by a series of properties. For example

  • How far apart your eyes are.
  • Distance from nose to mouth.
  • Ratio of ear length to nose width.
  • etc.

That is, in grossly simplified terms, what an Eigenface is.

If I have a database of Eigenfaces, I can take an image of your face and compare it with all the others and find the closest match.

We'll split this process into two parts.

Generate the EigenFaces

We need the arrange the images so that each unique face is in its own directory. If you know that you have more than one picture of each person, you can put those images in the same directory.

E.G.

   |-path
    -|-Alice
      | |-0.jpg
      | |-1.jpg
      |
      |-Bob
      | |-0.jpg
      |
      |-Carly
      ...

This code is adapted from Philipp Wagner's work.

It takes a directory of images, analyses them, and creates an XML file containing the Eigenfaces.

WARNING: This code will take a long time to run if you're using thousands of images. On a dataset of 400 images, the resulting file took up 700MB of disk space.

import os
import sys
import cv2
import numpy as np

def normalize(X, low, high, dtype=None):
    """Normalizes a given array in X to a value between low and high."""
    X = np.asarray(X)
    minX, maxX = np.min(X), np.max(X)
    # normalize to [0...1].
    X = X - float(minX)
    X = X / float((maxX - minX))
    # scale to [low...high].
    X = X * (high-low)
    X = X + low
    if dtype is None:
        return np.asarray(X)
    return np.asarray(X, dtype=dtype)

def read_images(path, sz=None):
    X,y = [], []
    count = 0
    for dirname, dirnames, filenames in os.walk(path):
        for subdirname in dirnames:
            subject_path = os.path.join(dirname, subdirname)
            for filename in os.listdir(subject_path):
                try:
                    im = cv2.imread(os.path.join(subject_path, filename), cv2.IMREAD_GRAYSCALE)
                    # resize to given size (if given)
                    if (sz is not None):
                        im = cv2.resize(im, sz)
                    X.append(np.asarray(im, dtype=np.uint8))
                    y.append(count)
                except IOError, (errno, strerror):
                    print "I/O error({0}): {1}".format(errno, strerror)
                except:
                    print "Unexpected error:", sys.exc_info()[0]
                    raise
            count = count+1
    return [X,y]

if __name__ == "__main__":
    if len(sys.argv) < 1:
        print "USAGE: eigensave.py "
        sys.exit()
    # Now read in the image data. This must be a valid path!
    [X,y] = read_images(sys.argv[1], (256,256))
    # Convert labels to 32bit integers. This is a workaround for 64bit machines,
    y = np.asarray(y, dtype=np.int32)

    # Create the Eigenfaces model.
    model = cv2.createEigenFaceRecognizer()
    # Learn the model. Remember our function returns Python lists,
    # so we use np.asarray to turn them into NumPy lists to make
    # the OpenCV wrapper happy:
    model.train(np.asarray(X), np.asarray(y))

    # Save the model for later use
    model.save("eigenModel.xml")

After that has run - assuming your computer hasn't melted - you should have a file called "eigenModel.xml"

Compare A Face

So, we have a file containing the Eigenfaces. Now we want to take a photograph and compare it to all the other faces in our model.

This is called by running:

python recognise.py /path/to/images photo.jpg 100000.0

The "100000.0" is a floating-point number which determines how close you want the match to be. A value of "100.0" would be identical. The larger the number, the less precise the match.

import os
import sys
import cv2
import numpy as np

if __name__ == "__main__":
    if len(sys.argv) < 4:
        print "USAGE: recognise.py  sampleImage.jpg threshold"
        print "threshold is an float. Choose 100.0 for an extremely close match.  Choose 100000.0 for a fuzzier match."
        print str(len(sys.argv))
        sys.exit()

    # Create an Eign Face recogniser
    t = float(sys.argv[3])
    model = cv2.createEigenFaceRecognizer(threshold=t)

    # Load the model
    model.load("eigenModel.xml")

    # Read the image we're looking for
    sampleImage = cv2.imread(sys.argv[2], cv2.IMREAD_GRAYSCALE)
    sampleImage = cv2.resize(sampleImage, (256,256))

    # Look through the model and find the face it matches
    [p_label, p_confidence] = model.predict(sampleImage)

    # Print the confidence levels
    print "Predicted label = %d (confidence=%.2f)" % (p_label, p_confidence)

    # If the model found something, print the file path
    if (p_label > -1):
        count = 0
        for dirname, dirnames, filenames in os.walk(sys.argv[1]):
            for subdirname in dirnames:
                subject_path = os.path.join(dirname, subdirname)
                if (count == p_label):
                    for filename in os.listdir(subject_path):
                        print subject_path

                count = count+1

That will spit out the path to the face that most resembles the photograph.

Who Am I?

Well, it turns out that my nearest artwork in the Tate's collection is...

Sir John Drake Face edent

Sir John Drake!

So, there you have it. My laptop isn't powerful enough to crunch through the ~3,000 faces found in The Tate's collection. I'd love to see how this works given a powerful enough machine with lots of free disk space. If you fancy running the code - you'll find it all on my GitHub page.


Share this post on…

22 thoughts on “Which Painting Do You Look Like? Comparing Faces Using Python and OpenCV”

  1. William says:

    Hi Terence, i found this proyect very useful for my FGP.
    I´m having a problem when running the script eingesave.py.
    I´m using Python 2.7 on Windows 8 64 bits.
    I have all the libraries installed and my own database format. Jpg.
    When I run this script in the command windows, an error appears:

    "OpenCV Error: Assertion failed (ssize.area ()> 0) in unknown function, file .. .. .. src opencv modules imgproc src imgwarp.cpp, line 1723
    Unexpected error:
    Traceback (most recent call last):
    File "C: Users William Desktop THESIS Python facerec2 eigensave.py", line 118, in
    [X, y] = read_images (sys.argv [1], (256,256))
    File "C: Users William Desktop THESIS Python facerec2 eigensave.py", line 87, in read_images
    cv2.resize im = (im, sz)
    cv2.error: .. .. .. src opencv modules imgproc src imgwarp.cpp: 1723: error: (-215) ssize.area ()> 0 "

    What can be the problem?

    Thank you very much.

    Reply
  2. I managed to fix the read_images problem, now the problem is as follows:

    File "C:UsersWilliamDesktopTESISPythonfacerec2eigensave.py", line 105,
    in
    y = np.asarray(y, dtype=np.int32)
    File "C:Python27Libsite-packagesnumpycorenumeric.py", line 462, in asarr
    ay
    return array(a, dtype, copy=False, order=order)
    ValueError: invalid literal for long() with base 10: 'C:UsersWilliamDeskto
    pTESISPythonfacerec2datawils1.jpg'

    I read that is somithing about converting label to integer.

    Thanks

    Reply
  3. says:

    i have a error here i run recognise.py
    OpenCV error: unspecified error (file can't be opened for writing!)
    traceback (most recent call last):
    file "recognise.py", line 35, in
    model.load("eigenModel.xml")
    cv2.error: ...facerec.cpp:398
    help me, thanks.

    Reply
    1. Terence Eden says:

      The error message is quite clear - your computer cannot write to the file. Ensure that your permissions are set correctly.

      Reply
  4. says:

    Good day!

    First of all, congratulations on this! This is so interesting and I did some modifications on this one.
    I do have some questions, what is the confidence formula? what is it based? Thanks in advance!

    Reply
  5. jacop says:

    I tried it on macos 10.10 and ı get this error ? thanks
    Traceback (most recent call last):
    File "newfaces.py", line 47, in
    [X,y] = read_images(sys.argv[1], (256,256))
    File "newfaces.py", line 31, in read_images
    im = cv2.resize(im, sz)
    cv2.error: /opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_release_tarballs_ports_graphics_opencv/opencv/work/opencv-2.4.10/modules/imgproc/src/imgwarp.cpp:1968: error: (-215) ssize.area() > 0 in function resize

    Reply
  6. Vadim Mironov says:

    Hello, Thank you! Scripts work very well on Centos 7 x64.
    But I have one question:

    # Look through the model and find the face it matches
    [p_label, p_confidence] = model.predict(sampleImage)

    It returns the best match, or first that fits the threshold ?

    Thanks Again!

    Reply
  7. anusha says:

    Hey I used the same code as ur's...Im getting This Result for any Images (0, 0.0)
    Predicted label = 0 (confidence=0.00) ....Any help?

    Reply
    1. Terence Eden says:

      Looks like you either don't have the correct version of OpenCV installed, or you're not sending it any images.

      Reply
  8. April lee says:

    Hi,

    I have receive this error:

    [X, y] = read_images (sys.argv [1], (256,256))
    IndexError: list index out of range

    How do I solve the problem?

    Thank you very much

    Reply
    1. Terence Eden says:

      It looks like you're trying to read from a list element which doesn't exist. I'd suggest taking a quick look around some Python tutorials to see why.

      Reply
      1. April lee says:

        Hi,

        I managed to solve the problem, but is there any way which it can match more accurately?

        Thank you

        Reply

What links here from around this blog?

What are your reckons?