Facial Expression Recognition Utilizing Unsupervised Learning

PublishedSeptember 29, 2020

Assigneenot available in USPTO data we have

InventorsYu Luo Xin Lu Jen-Chan Jeff Chien

Technical Abstract

Patent Claims

18 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A computer-implemented method for classifying a facial expression captured in an image frame, the method comprising: predicting, by a trained classifier, a facial expression for each video frame of a plurality of video frames; identifying one or more pairs of anchor frames included in the plurality of video frames, based on the predicted facial expression for each video frame; for a particular pair of anchor frames, determining a distribution of predicted facial expressions between the particular pair of anchor frames; for the particular pair of anchor frames, calibrating the predicted facial expression for one or more video frames that are between the particular pair of anchor frames by interpolating between the predicted facial expressions for the particular pair of anchor frames, thereby generating a set of video frames having calibrated expression labels; and predicting, by the trained classifier after having been supplementally trained using the set of video frames having calibrated expression labels, a facial expression for an input image.

Plain English Translation

This invention relates to computer-implemented methods for classifying facial expressions in video frames. The problem addressed is the inconsistency in facial expression predictions across consecutive video frames, which can lead to inaccurate or erratic classification results. The method involves using a trained classifier to predict facial expressions for each frame in a video sequence. Anchor frames—key frames with reliable expression predictions—are identified within the sequence. For each pair of anchor frames, the method determines the distribution of predicted expressions between them. The expressions of intermediate frames are then calibrated by interpolating between the expressions of the anchor frames, generating a refined set of frames with calibrated labels. This refined dataset is used to supplementally train the classifier, improving its accuracy. The trained classifier is then applied to predict expressions in new input images. The approach ensures smoother and more consistent expression classification across video sequences by leveraging anchor frames and interpolation.

Claim 2

Original Legal Text

2. The method of claim 1 , wherein the trained classifier is initially trained using a plurality of labelled images.

Plain English Translation

A system and method for image classification involves training a classifier using labeled images to recognize patterns or features in new, unlabeled images. The classifier is initially trained on a dataset of labeled images, where each image is associated with a known category or class. During training, the classifier learns to identify distinguishing features that correlate with each class. Once trained, the classifier can analyze new, unlabeled images and assign them to one or more classes based on the learned patterns. The method may include preprocessing steps to enhance image quality or extract relevant features before classification. The trained classifier can be refined or updated over time with additional labeled data to improve accuracy. This approach is useful in applications such as medical imaging, object recognition, and automated quality control, where accurate classification of images is critical. The system may also incorporate techniques to handle variations in image quality, lighting conditions, or orientation to ensure robust performance across different scenarios.

Claim 3

Original Legal Text

3. The method of claim 1 , wherein the trained classifier is a neural network.

Plain English Translation

A system and method for classifying data using a neural network-based classifier. The invention addresses the challenge of accurately categorizing input data by leveraging machine learning techniques, specifically neural networks, to improve classification performance. The method involves training a neural network classifier on a dataset to learn patterns and relationships within the data. Once trained, the neural network processes new input data to generate classification outputs. The neural network architecture is designed to handle complex data structures and adapt to various classification tasks, such as image recognition, text analysis, or other pattern recognition problems. The use of a neural network enhances the system's ability to generalize from training data and achieve high accuracy in real-world applications. The method may include preprocessing steps to prepare the input data for the neural network, such as normalization or feature extraction, and post-processing steps to refine the classification results. The neural network classifier can be fine-tuned or retrained as needed to adapt to new data or changing conditions. This approach provides a robust and scalable solution for automated data classification in diverse domains.

Claim 4

Original Legal Text

4. The method of claim 1 , wherein each anchor frame is identified based on the predicted facial expression of that frame satisfying a confidence threshold.

Plain English Translation

A system and method for analyzing video content to identify and process anchor frames based on facial expression confidence. The technology addresses the challenge of efficiently detecting and selecting key frames in video data where facial expressions meet a specified reliability threshold. The method involves capturing video frames containing human faces, analyzing each frame to predict facial expressions, and evaluating the confidence level of these predictions. Only frames where the predicted facial expression confidence exceeds a predefined threshold are designated as anchor frames. These anchor frames are then used for further processing, such as emotion analysis, video summarization, or real-time feedback applications. The approach ensures that only high-confidence facial expression data is retained, improving the accuracy and reliability of subsequent analysis. The system may integrate computer vision techniques, machine learning models for expression prediction, and threshold-based filtering to automate the selection of relevant frames. This method is particularly useful in applications requiring precise facial expression recognition, such as affective computing, human-computer interaction, and video content analysis.

Claim 5

Original Legal Text

5. The method of claim 1 , wherein interpolating between the predicted facial expressions for the particular pair of anchor frames further comprises using a sigmoid function.

Plain English Translation

This invention relates to digital image processing, specifically techniques for generating intermediate facial expressions between keyframes in animated or synthesized video content. The problem addressed is the need for smooth, natural-looking transitions between predefined facial expressions in digital animations, particularly when generating expressions that fall between two anchor frames. The method involves predicting facial expressions for a sequence of frames based on a set of anchor frames, which are keyframes defining distinct expressions. To ensure smooth transitions, the system interpolates between the predicted expressions for each pair of anchor frames. The interpolation process uses a sigmoid function to control the transition rate, ensuring that the intermediate expressions appear natural and avoid abrupt changes. The sigmoid function provides a nonlinear interpolation that gradually accelerates and decelerates the transition, mimicking the organic motion of human facial expressions. The method may also include generating a sequence of intermediate frames between the anchor frames, where each intermediate frame is assigned a predicted expression based on the interpolation results. The use of a sigmoid function in the interpolation step enhances the realism of the generated expressions by avoiding linear transitions that can appear unnatural. This approach is particularly useful in applications such as facial animation, virtual avatars, and real-time video synthesis, where smooth and lifelike transitions are critical for user engagement and visual quality.

Claim 6

Original Legal Text

6. The method of claim 1 , wherein the input image is a first input image, the method further comprising: generating, by the trained classifier after having been supplementally trained using the set of video frames having calibrated expression labels, a first feature vector associated with the first input image, the first feature vector being representative of the facial expression for the first input image; generating, by the trained classifier after having been supplementally trained using the set of video frames having calibrated expression labels, a second feature vector associated with a second input image, the second feature vector being representative of a facial expression for the second input image; and predicting a similarity of the facial expression for the first input image and the facial expression for the second input image.

Plain English Translation

This invention relates to facial expression analysis, specifically improving the accuracy of expression recognition by supplementally training a classifier using video frames with calibrated expression labels. The problem addressed is the difficulty in accurately comparing facial expressions across different images due to variations in lighting, pose, or expression intensity. The solution involves a trained classifier that generates feature vectors representing facial expressions from input images. After supplemental training with video frames labeled for calibrated expressions, the classifier produces a first feature vector for a first input image and a second feature vector for a second input image. These vectors encode the facial expressions in a way that allows for precise comparison. The system then predicts the similarity between the expressions in the two images, enabling applications such as emotion recognition, behavioral analysis, or biometric verification. The supplemental training with video frames ensures that the classifier can generalize across different contexts, improving robustness in real-world scenarios. The method leverages deep learning techniques to extract and compare expression features, enhancing the reliability of facial expression analysis systems.

Claim 7

Original Legal Text

7. The method of claim 1 , wherein the input image is a first input image, the method further comprising: predicting, by the trained classifier after having been supplementally trained using the set of video frames having calibrated expression labels, a similarity of the facial expression for the first input image and a facial expression for a second input image.

Plain English Translation

This invention relates to facial expression analysis in video frames, addressing the challenge of accurately classifying and comparing facial expressions across multiple images. The method involves training a classifier using video frames with calibrated expression labels to improve expression recognition. The trained classifier can then predict the similarity between facial expressions in two different input images. The system first processes video frames to extract features and assign calibrated expression labels, which are used to supplement the training of the classifier. This supplemental training enhances the classifier's ability to distinguish subtle differences in facial expressions. Once trained, the classifier can compare expressions in a first input image and a second input image, quantifying their similarity. The method ensures robust expression recognition by leveraging calibrated labels from video data, which provides more consistent and reliable training data than static images. This approach is particularly useful in applications requiring precise expression analysis, such as emotion recognition, human-computer interaction, and behavioral studies. The invention improves upon prior methods by incorporating video-based training to enhance accuracy in expression similarity prediction.

Claim 8

Original Legal Text

8. A computer program product including one or more non-transitory machine-readable mediums encoded with instructions that when executed by one or more processors cause a process to be carried out to provide facial expression classification, the process comprising: predicting, by a trained classifier, a facial expression and a confidence level for a plurality of video frames; identifying a pair of anchor frames included in the plurality of video frames, wherein each of the anchor frames has a confidence level that exceeds a threshold confidence level, and wherein each frame between the anchor frames has a confidence level that does not exceed the threshold confidence level; making a determination that the identified pair of anchor frames are separated by a quantity of frames that is less than a separation threshold; in response to making the determination, calibrating the predicted facial expression for one or more video frames between the anchor frames, thereby generating a set of video frames having calibrated expression labels; and predicting, by the trained classifier after having been supplementally trained using the set of video frames having calibrated expression labels, a facial expression for an input image.

Plain English Translation

The invention relates to a system for improving facial expression classification in video frames using a trained classifier. The problem addressed is the variability in confidence levels of facial expression predictions across video frames, which can lead to inconsistent or inaccurate classifications. The solution involves a multi-step process to enhance the classifier's accuracy. First, a trained classifier predicts facial expressions and associated confidence levels for each frame in a video sequence. Next, pairs of anchor frames are identified where each frame in the pair has a confidence level exceeding a predefined threshold, while intermediate frames between them do not. If the distance between anchor frames is below a separation threshold, the predicted expressions for the intermediate frames are calibrated using the anchor frames, generating a set of frames with refined expression labels. This calibrated data is then used to supplementally train the classifier, improving its ability to predict expressions for new input images. The system ensures more reliable facial expression classification by leveraging high-confidence frames to correct lower-confidence predictions in nearby frames.

Claim 9

Original Legal Text

9. The computer program product of claim 8 , wherein the trained classifier is initially trained using a plurality of labelled images.

Plain English Translation

A system for image classification uses a trained classifier to analyze input images and generate classification outputs. The classifier is initially trained using a plurality of labeled images, where each image is associated with a known classification label. During training, the classifier learns to recognize patterns and features in the labeled images that correlate with their respective labels. Once trained, the classifier can process new, unlabeled images by extracting relevant features and comparing them to the learned patterns to determine the most likely classification. The system may include preprocessing steps to enhance image quality or extract specific features before classification. The trained classifier can be updated or refined over time using additional labeled data to improve accuracy. This approach is useful in applications such as object recognition, medical imaging, and automated quality control, where accurate and efficient image classification is essential. The system may also include validation mechanisms to assess the classifier's performance and ensure reliable results.

Claim 10

Original Legal Text

10. The computer program product of claim 8 , wherein the trained classifier is a neural network.

Plain English Translation

A system for analyzing data using machine learning techniques addresses the challenge of efficiently processing large datasets to extract meaningful insights. The system includes a trained classifier that processes input data to generate output data, where the classifier is specifically designed to handle structured or unstructured data. The classifier is trained using a training dataset to optimize its performance for specific tasks, such as classification, regression, or clustering. The system further includes a data preprocessing module that prepares the input data for analysis by the classifier, ensuring consistency and quality. Additionally, a post-processing module refines the output data from the classifier to enhance accuracy and usability. The trained classifier is implemented as a neural network, leveraging deep learning techniques to improve predictive accuracy and adaptability. The neural network may include multiple layers of interconnected nodes, allowing it to learn complex patterns from the training data. The system is particularly useful in applications requiring high accuracy and scalability, such as fraud detection, medical diagnosis, or predictive maintenance. By integrating preprocessing and post-processing steps, the system ensures robust and reliable performance across various data types and domains.

Claim 11

Original Legal Text

11. The computer program product of claim 8 , wherein calibrating the predicted facial expression for the one or more video frames between the anchor frames comprises an interpolation process.

Plain English Translation

This invention relates to computer vision systems for analyzing facial expressions in video frames, particularly focusing on improving accuracy between key reference frames (anchor frames). The problem addressed is the difficulty in maintaining consistent and accurate facial expression predictions across video sequences, especially when intermediate frames lack detailed expression data. The solution involves a calibration process that interpolates predicted facial expressions between anchor frames to ensure smooth and accurate expression tracking over time. The interpolation process adjusts the predicted expressions in intermediate frames based on the known expressions in the anchor frames, enhancing the overall reliability of facial expression analysis. This approach is particularly useful in applications like emotion recognition, human-computer interaction, and video analysis, where precise and continuous expression tracking is essential. The interpolation process may involve mathematical techniques such as linear or nonlinear interpolation to refine the expression predictions, ensuring they align with the reference data from the anchor frames. This method improves the robustness of facial expression analysis in dynamic video environments.

Claim 12

Original Legal Text

12. The computer program product of claim 11 , wherein the interpolation process is based on a sigmoid function.

Plain English Translation

A computer program product is designed to enhance data interpolation in computational systems, particularly for applications requiring smooth and controlled transitions between data points. The invention addresses the challenge of achieving accurate and predictable interpolation in scenarios where linear or polynomial methods may produce undesirable artifacts or abrupt changes. The program product includes a data processing module that performs interpolation using a sigmoid function, which provides a smooth, S-shaped curve that transitions gradually between input values. This approach ensures that interpolated values remain within a defined range while avoiding sharp discontinuities. The sigmoid-based interpolation is particularly useful in fields such as signal processing, machine learning, and computer graphics, where smooth transitions are critical for maintaining data integrity and visual quality. The program product may also include additional modules for preprocessing input data, selecting interpolation parameters, and validating results to ensure robustness and accuracy. By leveraging the sigmoid function, the invention offers a more refined and controlled interpolation method compared to traditional techniques.

Claim 13

Original Legal Text

13. The computer program product of claim 8 , wherein the input image is a first input image, the process further comprising: generating, by the trained classifier after having been supplementally trained using the set of video frames having calibrated expression labels, a first feature vector associated with the first input image, the first feature vector being representative of the facial expression for the first input image; generating, by the trained classifier after having been supplementally trained using the set of video frames having calibrated expression labels, a second feature vector associated with a second input image, the second feature vector being representative of a facial expression for the second input image; and predicting a similarity of the facial expression for the first input image and the facial expression for the second input image.

Plain English Translation

This invention relates to computer vision systems for analyzing and comparing facial expressions in images. The problem addressed is the challenge of accurately assessing and quantifying similarities between facial expressions across different images, particularly when training data is limited or inconsistent. The solution involves a trained classifier that has been supplementally trained using video frames with calibrated expression labels. The classifier generates feature vectors representing facial expressions from input images. Specifically, a first feature vector is generated for a first input image, capturing its facial expression, and a second feature vector is generated for a second input image, capturing its facial expression. The system then predicts the similarity between the two facial expressions by comparing these feature vectors. The supplemental training with calibrated video frames improves the classifier's ability to generalize and accurately represent facial expressions, enabling reliable comparison between different images. This approach enhances applications in emotion recognition, biometric authentication, and human-computer interaction by providing a robust method for evaluating facial expression similarity.

Claim 14

Original Legal Text

14. The computer program product of claim 8 , wherein the input image is a first input image, the process further comprising: predicting, by the trained classifier after having been supplementally trained using the set of video frames having calibrated expression labels, a similarity of the facial expression for the first input image and a facial expression for a second input image.

Plain English Translation

This invention relates to computer vision and machine learning techniques for analyzing facial expressions in images and videos. The problem addressed is the accurate prediction of facial expression similarity between different images, particularly when training data is limited or lacks precise expression labels. The solution involves a trained classifier that has been supplementally trained using video frames with calibrated expression labels. The classifier is used to predict the similarity between facial expressions in a first input image and a second input image. The supplemental training with video frames improves the classifier's ability to generalize across different expressions and lighting conditions. The system may also include preprocessing steps to align or normalize the input images before analysis. The invention aims to enhance applications such as emotion recognition, human-computer interaction, and video analysis by providing a robust method for comparing facial expressions across multiple images.

Claim 15

Original Legal Text

15. The computer program product of claim 8 , wherein one or more training samples used to supplementally train the trained classifier include two video frames and associated facial expression labels.

Plain English Translation

This invention relates to computer vision and machine learning, specifically improving facial expression recognition in video frames. The problem addressed is the limited accuracy of classifiers when trained on static images, as facial expressions in videos involve dynamic changes over time. The solution involves supplementally training a pre-trained classifier using video frames paired with associated facial expression labels. The training samples include at least two video frames, allowing the classifier to learn temporal dynamics and transitions between expressions. The classifier is initially trained on a primary dataset, then further refined with these video-based samples to enhance its ability to recognize expressions in real-world video scenarios. This approach improves accuracy by capturing temporal context, which is critical for distinguishing subtle or rapid expression changes. The method is applicable to applications like emotion analysis, human-computer interaction, and video content moderation. The invention focuses on leveraging video data to overcome the limitations of static-image-based training, resulting in a more robust classifier for dynamic facial expression recognition.

Claim 16

Original Legal Text

16. A system for providing facial expression classification, the system comprising: one or more electronic memories encoding instructions; and one or more processors configured to execute the instructions to carry out a process that includes predict, by a trained classifier, a facial expression for a plurality of video frames; identify a pair of anchor frames included in the plurality of video frames, based on the predicted facial expression for the anchor frames; determine a distribution of predicted facial expressions between the anchor frames; calibrate the predicted facial expression for at least a portion of the video frames that are between the anchor frames by interpolating between the predicted facial expressions for the pair of anchor frames, thereby generating a set of video frames having calibrated expression labels; and predict, by the trained classifier after having been supplementally trained using the set of video frames having calibrated expression labels, a facial expression for an input image.

Plain English Translation

This invention relates to facial expression classification in video frames, addressing challenges in accurately tracking and predicting dynamic facial expressions over time. The system uses a trained classifier to predict facial expressions for multiple video frames. It identifies key anchor frames within the sequence based on the predicted expressions, then analyzes the distribution of expressions between these anchors. To improve consistency, the system calibrates intermediate frames by interpolating between the expressions of the anchor frames, generating a refined set of video frames with calibrated expression labels. These calibrated frames are then used to supplementally train the classifier, enhancing its accuracy. The trained classifier can subsequently predict expressions for new input images. The approach improves expression tracking by leveraging temporal coherence between frames, reducing noise and inconsistencies in dynamic facial expression analysis. The system is particularly useful in applications requiring high-fidelity expression recognition, such as emotion analysis, human-computer interaction, and video content analysis.

Claim 17

Original Legal Text

17. The system of claim 16 , wherein the trained classifier is a neural network.

Plain English Translation

A system for classifying data using machine learning techniques addresses the challenge of accurately identifying patterns in complex datasets. The system employs a trained classifier to process input data and generate classification outputs. The classifier is specifically designed to handle high-dimensional data, such as images, text, or sensor readings, by learning from labeled training examples. The training process involves adjusting the classifier's parameters to minimize errors in predictions, ensuring robust performance on unseen data. The system further includes a preprocessing module to normalize and transform input data into a suitable format for classification. Additionally, a post-processing module refines the classifier's outputs, improving accuracy and reliability. The classifier itself is implemented as a neural network, leveraging multiple layers of interconnected nodes to extract hierarchical features from the input data. This neural network architecture enables the system to capture intricate relationships within the data, making it suitable for tasks requiring high precision, such as medical diagnosis, fraud detection, or autonomous decision-making. The system's modular design allows for easy integration with existing data pipelines and supports continuous learning to adapt to evolving data distributions.

Claim 18

Original Legal Text

18. The system of claim 16 , wherein each anchor frame is identified based on the predicted facial expression of that frame satisfying a confidence threshold.

Plain English Translation

A system for analyzing facial expressions in video content identifies key frames, called anchor frames, based on the predicted facial expression of each frame. The system processes video frames to detect and analyze facial expressions, assigning a confidence score to each prediction. An anchor frame is selected only if its predicted facial expression meets or exceeds a predefined confidence threshold, ensuring reliability in the analysis. This approach helps filter out low-confidence predictions, improving the accuracy of subsequent applications, such as emotion recognition, behavioral analysis, or user interaction tracking. The system may further refine anchor frame selection by comparing expressions across multiple frames or applying additional validation criteria. By focusing on high-confidence anchor frames, the system enhances the robustness of facial expression analysis in dynamic video environments.

Patent Metadata

Filing Date

Unknown

Publication Date

September 29, 2020

Inventors

Yu Luo

Xin Lu

Jen-Chan Jeff Chien

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search