US-7236615

Synergistic face detection and pose estimation with energy-based models

PublishedJune 26, 2007

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

A method for human face detection that detects faces independently of their particular poses and simultaneously estimates those poses. Our method exhibits an immunity to variations in skin color, eyeglasses, facial hair, lighting, scale and facial expressions, and others. In operation, we train a convolutional neural network to map face images to points on a face manifold, and non-face images to points far away from that manifold, wherein that manifold is parameterized by facial pose. Conceptually, we view a pose parameter as a latent variable, which may be inferred through an energy-minimization process. To train systems based upon our inventive method, we derive a new type of discriminative loss function that is tailored to such detection tasks. Our method enables a multi-view detector that can detect faces in a variety of poses, for example, looking left or right (yaw axis), up or down (pitch axis), or tilting left or right (roll axis). Systems employing our method are highly-reliable, run at near real time (5 frames per second on conventional hardware), and is robust against variations in yaw (±90°), roll(±45°), and pitch(±60°).

Patent Claims

1 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A computer-implemented method of face detection and pose estimation, the method comprising the following steps: training a convolutional neural network to map facial images to points on a face manifold, parameterized by facial pose, and to map non-facial images to points away from the face manifold; and simultaneously determining, whether an image is a face from its proximity to the face manifold and an estimate of facial pose of that image from its projection to the face manifold; wherein the training step further comprises the step(s) of: optimizing a loss function of three variables, wherein said variables include image, pose, and face/non-face characteristics of an image; wherein the loss function is represented by: Loss ⁡ ( W ) = 1  S 1  ⁢ ∑ i ⁢ ⁢ εS 1 ⁢ L 1 ⁡ ( W , Z i , X i ) + 1  S 0  ⁢ ∑ i ⁢ ⁢ εS 1 ⁢ L 0 ⁡ ( W , X i ) ; where S 1 is the set of training faces, S 0 is the set of non-faces, L 3 (W,Z 1 ,X 1 ) and L 0 (W,X 1 ) are loss functions for a face sample (with a known pose) and non-face sample, respectively.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G06V

Patent Metadata

Filing Date

March 31, 2005

Publication Date

June 26, 2007

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search