top of page


Brief Introduction

This solution provides the following:

These are specially trained dlib shape predictors with 1-2 ms inference time and 1-5 average pixel error. 
They are able to return with high accuracy the points of the face, eye points, iris and even pupil contour points. 


The example below shows how the system fits landmark points to the pupil.  The person in the video suffers from a medical condition that makes his right pupil much larger than his left. This video shows how accurate my system can be.

Measuring pupil dilation can be used to improve emotion recognition or to measure cognitive workload: link

This can be used for Driver Monitoring systems, but also for user experinece tests. 


Technical detailes

Input (video or image to process, capable of processing): ​

  • mjpeg stream

  • rtsp stream

  • USB camera devices

  • video files (avi, mp4, mkv formats supported)

  • standalone image files (.png, .jpg formats supported)


  • Processed video frame

  • The faces in the frame (boinding boxes)

  • For each face:

    • Unique Tracking ID (when processing a video file, the same ID on each frame belongs to the same person)

    • 5 basic facial landmark points

    • 14 other facial landmark point

    • 4 eye lid landmark point

    • 4 iris landmark point

    • 4 pupil landmark point

  • The system is able to to write the processed video to a video file. 

The demo video was recorded on a HP Laptop 15-DA0042NH (Processor: Intel(R) Core(TM) i7-8550U CPU, RAM: 8 Gb). 

It used 600 Mb RAM and the CPU usage was 50% during the recording. 

The input video was captured using a Xiaomi CMSXJ22A web camera. The input resolution was 1080p.

During recording, the system processing speed was about 55 FPS. When processing a single face, the system can maintain this speed on this hardware. When processing multiple faces, the system may be slower. The visualization was added to the video afterwards. The visualization in the video can be done live, but may slow down processing. 

The system is written entirely in C++ and uses the following libraries/technologies:


Face landmark detector:

  • Avarage sample error: 5.58927 pixel

  • Inference time using CPU: 2 ms (on HP Laptop 15-DA0042NH (Processor: Intel(R) Core(TM) i7-8550U CPU))

  • return these points:
    0. the tip of the nose
    1. right corner of right eye
    2. upper eye lid of right eye
    3. left corner of right eye
    4. lower eye lid of right eye
    5. center of right iris
    6. right corner of left eye
    7. upper eye lid of left eye
    8. left corner of left eye
    9. lower eye lid of left eye
    10. center of left iris
    11. right corner of the mouth
    12. upper part of the lip
    13. left corner of the mouth
    14. lower part of the lip


Eye landmark detector:

  • Avarage sample error: 1.52895 pixel

  • Inference time using CPU: 2 ms (on HP Laptop 15-DA0042NH (Processor: Intel(R) Core(TM) i7-8550U CPU))

  • return these points:
    0. right corner of the eye
    1. upper eye lid
    2. left corner of the eye
    3. lower eye lid
    4. right side of the iris
    5. upper part of the iris
    6. left side of the iris
    7. lower part of the iris
    right side of the pupil
    9. upper part of the pupil
    10. left side of the pupil
    11. lower part of the pupil

bottom of page