Automated Execution of Computer Software Based Upon Determined Empathy of a Communication Participant

PublishedAugust 25, 2020

Assigneenot available in USPTO data we have

InventorsSamir Kakkar Hilary Lex Neha Dave Richa Srivastava

Technical Abstract

Patent Claims

16 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A system for automated execution of computer software based upon determined empathy of a communication participant using artificial intelligence-based speech recognition and facial recognition techniques, the system comprising: a remote computing device of the communication participant; and a server computing device coupled to the remote computing device via a network connection, the server computing device comprising a memory for storing programmatic instructions and a processor that executes the programmatic instructions to: capture a digitized voice segment from the remote computing device, the digitized voice segment corresponding to speech submitted by the communication participant during a digital communication session; analyze one or more vocal cues of a waveform of the digitized voice segment to generate a voice empathy score for the digitized voice segment; convert the speech in the digitized voice segment into text and extract a set of keywords from the text; determine one or more empathy keywords in the extracted set of keywords and generate a keyword empathy score based upon the empathy keywords; capture, via a camera coupled to the remote computing device, digitized images of the communication participant's face during the digital communication session; analyze one or more physical expressions of the communication participant's face in the digitized images to identify one or more emotions of the communication participant and generating a facial empathy score for the digitized images; generate, using an artificial intelligence classification model, an overall empathy confidence score for the communication participant based upon the voice empathy score, the keyword empathy score, and the facial empathy score; generate recommended changes to (i) a physical expression of the communication participant's face or (ii) vocal cues of the communication participant's speech based upon the overall empathy confidence score; and execute a computer software application that displays the recommended changes to the communication participant.

Plain English Translation

This system automates the execution of computer software to assess and enhance empathy in digital communications using artificial intelligence. The technology addresses the challenge of measuring and improving empathetic interactions in remote or digital conversations, where non-verbal cues are often lost or misinterpreted. The system includes a remote computing device used by the communication participant and a server computing device connected via a network. The server captures a digitized voice segment from the remote device, analyzing vocal cues in the speech waveform to generate a voice empathy score. The speech is also converted to text, and keywords are extracted to identify empathy-related terms, producing a keyword empathy score. Simultaneously, the system captures facial images of the participant via a camera, analyzing physical expressions to detect emotions and generate a facial empathy score. An AI classification model combines these scores into an overall empathy confidence score. Based on this score, the system recommends adjustments to the participant's facial expressions or vocal cues to improve empathy. These recommendations are displayed to the participant through a software application, guiding them to enhance their communication effectiveness. The system integrates multi-modal analysis—voice, text, and facial recognition—to provide real-time feedback for more empathetic digital interactions.

Claim 2

Original Legal Text

2. The system of claim 1 , wherein generating an overall empathy confidence score comprises: generating, using the classification model, a first empathy confidence score for the participant based upon the voice empathy score, the keyword empathy score, and the facial empathy score; if the first empathy confidence score is below a predetermined threshold: train, using an artificial intelligence neural network executing on the server computing device, a second classification model using historical data comprising voice empathy scores, keyword empathy scores, and facial empathy scores, and overall empathy confidence scores previously determined by the server computing device; and execute the trained second classification model using the voice empathy score, the keyword empathy score, and the facial empathy score as input to generate the overall empathy confidence score for the participant.

Plain English Translation

This invention relates to an empathy assessment system that evaluates participant responses using voice, keyword, and facial analysis to generate an overall empathy confidence score. The system addresses the challenge of accurately measuring empathy in interactions, which is crucial for applications in mental health, customer service, and training. The system processes audio and video inputs from participants to extract voice, keyword, and facial empathy scores. These scores are combined using a classification model to produce an initial empathy confidence score. If this score falls below a predetermined threshold, the system enhances accuracy by training a secondary classification model. This secondary model is trained using historical data, including previously recorded voice, keyword, and facial empathy scores, along with corresponding overall empathy confidence scores. The trained secondary model then re-evaluates the participant's scores to generate a refined overall empathy confidence score. By dynamically adjusting the classification model when initial scores are insufficient, the system improves the reliability of empathy assessments. This approach ensures that empathy evaluations are both accurate and adaptable to varying participant responses. The use of artificial intelligence and neural networks further enhances the system's ability to learn from past data, making it more effective over time.

Claim 3

Original Legal Text

3. The system of claim 2 , wherein executing the trained second classification model generates an accuracy value associated with the overall empathy confidence score.

Plain English Translation

This invention relates to a system for evaluating empathy in interactions, particularly in automated or AI-driven communication systems. The system addresses the challenge of accurately assessing and quantifying empathy in digital interactions, which is critical for improving user engagement and satisfaction in applications like customer service, mental health support, and virtual assistants. The system includes a trained first classification model that processes input data, such as text or speech, to generate an empathy confidence score. This score reflects the likelihood that the interaction exhibits empathetic behavior. A second classification model is then applied to the output of the first model, further refining the empathy assessment. The second model generates an accuracy value associated with the overall empathy confidence score, providing a measure of confidence in the system's evaluation. The system may also include a data preprocessing module to prepare input data for analysis, ensuring consistency and relevance. The models are trained using labeled datasets that include examples of empathetic and non-empathetic interactions, allowing the system to learn patterns and nuances in empathetic communication. The accuracy value helps users or administrators understand the reliability of the empathy assessment, enabling adjustments or further refinement of the models as needed. This approach enhances the system's ability to provide meaningful and contextually appropriate responses in automated interactions.

Claim 4

Original Legal Text

4. The system of claim 1 , wherein capturing a digitized voice segment comprises: capturing a bitstream containing the digitized voice segment from the remote computing device as a speech file; and adjusting compression of the bitstream containing the digitized voice segment to enhance audio quality of the bitstream.

Plain English Translation

This invention relates to a system for processing digitized voice segments, particularly for improving audio quality in voice communication or transcription applications. The system addresses the problem of degraded audio quality in digitized voice data, which can occur due to compression artifacts or transmission errors when voice data is captured from remote computing devices. The system captures a digitized voice segment as a speech file from a remote computing device, receiving the data as a bitstream. To enhance audio quality, the system dynamically adjusts the compression of the bitstream. This adjustment may involve decompressing the bitstream to reduce distortion or applying adaptive compression techniques to balance file size and audio fidelity. The system may also include preprocessing steps, such as noise reduction or bandwidth optimization, to further refine the captured voice data before transmission or storage. The invention is particularly useful in applications where high-quality voice data is critical, such as teleconferencing, voice recognition, or transcription services. By dynamically adjusting compression, the system ensures that the digitized voice segment retains sufficient clarity and intelligibility while minimizing data overhead. The system may operate in real-time or as part of a batch processing workflow, depending on the application requirements.

Claim 5

Original Legal Text

5. The system of claim 1 , wherein the one or more vocal cues of the waveform comprise a tone attribute, a pitch attribute, a volume attribute, and a speed attribute.

Plain English Translation

This invention relates to a system for analyzing vocal cues in audio waveforms to enhance communication or interaction between users and devices. The system processes audio signals to extract and interpret vocal cues, which are specific characteristics of speech that convey emotional or contextual information beyond the spoken words. These cues include tone, pitch, volume, and speed attributes of the waveform. The system captures these attributes to improve natural language processing, user authentication, or emotional state detection. By analyzing variations in tone, pitch, volume, and speed, the system can determine the speaker's intent, emotional state, or identity with higher accuracy. This is particularly useful in applications like voice assistants, customer service automation, or mental health monitoring, where understanding subtle vocal nuances is critical. The system may integrate with existing audio processing frameworks or operate as a standalone module to enhance existing voice recognition or analysis systems. The extracted vocal cues can be used to trigger specific responses, adapt system behavior, or provide feedback to users based on their vocal patterns. The invention aims to bridge the gap between traditional speech recognition and deeper emotional or contextual understanding, making interactions more intuitive and responsive.

Claim 6

Original Legal Text

6. The system of claim 5 , wherein the server computing device analyzes one or more frequencies associated with the waveform to determine the one or more vocal cues of the waveform.

Plain English Translation

This invention relates to a system for analyzing vocal cues in audio waveforms, particularly for extracting and interpreting frequency-based characteristics from speech or sound signals. The system addresses the challenge of accurately identifying and processing vocal cues, such as pitch, tone, or stress patterns, which are often used in applications like voice recognition, emotion detection, or speech analysis. The system includes a server computing device configured to receive an audio waveform containing vocal data. The server processes the waveform by analyzing one or more frequencies associated with the signal to detect specific vocal cues. This analysis may involve spectral decomposition, frequency filtering, or other signal processing techniques to isolate and measure frequency components that correspond to vocal characteristics. The extracted cues can then be used for further applications, such as determining speaker identity, assessing emotional state, or improving speech recognition accuracy. The system may also include additional components, such as a microphone or audio input device to capture the waveform, and a user interface to display or interact with the analyzed results. The server may employ machine learning or statistical models to enhance the accuracy of vocal cue detection, adapting to variations in speech patterns or environmental noise. By focusing on frequency analysis, the system provides a robust method for extracting meaningful vocal information from audio signals.

Claim 7

Original Legal Text

7. The system of claim 1 , wherein converting the digitized voice segment into text comprises executing a speech recognition engine on a digital file containing the digitized voice segment to generate the text.

Plain English Translation

This invention relates to a system for processing voice segments, specifically converting digitized voice segments into text. The system addresses the challenge of accurately transcribing spoken language into written form, which is critical for applications such as voice assistants, transcription services, and accessibility tools. The system includes a speech recognition engine that processes a digital file containing a digitized voice segment to generate corresponding text. The speech recognition engine applies algorithms to analyze the audio data, identify phonetic patterns, and convert them into textual output. This process may involve noise reduction, voice activity detection, and language model optimization to improve accuracy. The system may also include preprocessing steps to enhance audio quality before speech recognition, such as filtering background noise or normalizing audio levels. The generated text can then be used for further applications, such as real-time captioning, searchable transcripts, or automated documentation. The invention aims to provide a reliable and efficient method for converting spoken language into text, overcoming limitations in traditional transcription methods.

Claim 8

Original Legal Text

8. The system of claim 7 , wherein the server computing device analyzes the text using a grammar recognition engine to validate the text.

Plain English Translation

This invention relates to a system for processing and validating text data, particularly in applications requiring high accuracy and compliance with grammatical standards. The system addresses the problem of ensuring text correctness in automated or semi-automated environments where human oversight may be limited, such as in document generation, transcription services, or natural language processing tasks. The system includes a server computing device configured to receive text data from one or more client devices. The server processes the text using a grammar recognition engine to validate its structure, syntax, and adherence to predefined grammatical rules. The grammar recognition engine may employ rule-based or machine learning-based techniques to identify errors, inconsistencies, or deviations from standard language conventions. The system may also include a database for storing validated text, user preferences, or historical data to improve future validation processes. Additionally, the system may support user feedback mechanisms, allowing users to correct or override validation results, thereby refining the grammar recognition engine over time. The server may further generate reports or alerts for detected errors, enabling users to address issues promptly. The system ensures that processed text meets specified quality standards, reducing the risk of errors in automated workflows and enhancing overall text reliability.

Claim 9

Original Legal Text

9. A computerized method of automated execution of computer software based upon determined empathy of a communication participant using artificial intelligence-based speech recognition and facial recognition techniques, the method comprising: capturing, by a server computing device, a digitized voice segment from a remote computing device, the digitized voice segment corresponding to speech submitted by a communication participant during a digital communication session; analyzing, by the server computing device, one or more vocal cues of a waveform of the digitized voice segment to generate a voice empathy score for the digitized voice segment; converting, by the server computing device, the speech in the digitized voice segment into text and extract a set of keywords from the text; determining, by the server computing device, one or more empathy keywords in the extracted set of keywords and generate a keyword empathy score based upon the empathy keywords; capturing, by the server computing device via a camera coupled to the remote computing device, digitized images of the communication participant's face during the digital communication session; analyzing, by the server computing device, one or more physical expressions of the communication participant's face in the digitized images to identify one or more emotions of the communication participant and generating a facial empathy score for the digitized images; generating, by the server computing device using an artificial intelligence classification model, an overall empathy confidence score for the communication participant based upon the voice empathy score, the keyword empathy score, and the facial empathy score; generating, by the server computing device, recommended changes to (i) a physical expression of the communication participant's face or (ii) vocal cues of the communication participant's speech based upon the overall empathy confidence score; and executing, by the server computing device, a computer software application that displays the recommended changes to the communication participant.

Plain English Translation

This invention relates to automated systems for analyzing and enhancing empathy in digital communications using artificial intelligence. The system captures voice and facial data from a participant during a digital communication session, such as a video call or chat. The voice segment is analyzed for vocal cues like tone and pitch to generate a voice empathy score, while speech recognition converts the audio into text to extract keywords. Empathy-related keywords are identified and scored. Simultaneously, facial recognition analyzes the participant's expressions to detect emotions and generate a facial empathy score. An AI model combines these scores into an overall empathy confidence score. Based on this score, the system recommends adjustments to the participant's facial expressions or vocal cues to improve empathy. These recommendations are displayed to the participant in real-time, helping them modify their communication style. The system aims to enhance emotional intelligence in digital interactions by providing automated, AI-driven feedback on empathy levels.

Claim 10

Original Legal Text

10. The method of claim 9 , wherein generating an overall empathy confidence score comprises: generating, by the server computing device using the classification model, a first empathy confidence score for the participant based upon the voice empathy score, the keyword empathy score, and the facial empathy score; if the first empathy confidence score is below a predetermined threshold: training, by the server computing device using an artificial intelligence neural network executing on the server computing device, a second classification model using historical data comprising voice empathy scores, keyword empathy scores, and facial empathy scores, and overall empathy confidence scores previously determined by the server computing device; and executing, by the server computing device, the trained second classification model using the voice empathy score, the keyword empathy score, and the facial empathy score as input to generate the overall empathy confidence score for the participant.

Plain English Translation

This invention relates to systems for evaluating empathy in participants, particularly in scenarios like customer service interactions, therapy sessions, or training programs. The problem addressed is the need for an automated, accurate, and adaptable method to assess empathy levels based on multiple behavioral cues, such as voice, keywords, and facial expressions. The method involves analyzing a participant's voice, text (keywords), and facial expressions to generate individual empathy scores for each modality. These scores are combined to produce an initial overall empathy confidence score. If this score falls below a predefined threshold, the system dynamically improves its accuracy by training a secondary classification model using historical empathy data. This historical data includes previously generated empathy scores and their corresponding overall confidence scores. The trained model is then applied to the current participant's voice, keyword, and facial empathy scores to generate a refined overall empathy confidence score. The system leverages artificial intelligence, specifically neural networks, to enhance its predictive accuracy over time. By continuously refining its models based on historical data, the method ensures that empathy assessments remain reliable and adapt to new patterns or variations in participant behavior. This approach is particularly useful in applications where empathy is a critical factor, such as mental health support, customer service evaluations, or leadership training.

Claim 11

Original Legal Text

11. The method of claim 10 , wherein executing the trained second classification model generates an accuracy value associated with the overall empathy confidence score.

Plain English Translation

This invention relates to systems and methods for evaluating empathy in interactions, particularly in automated or machine-assisted contexts. The technology addresses the challenge of quantifying and assessing empathy levels in communications, such as between humans and artificial intelligence systems, to improve interaction quality and user satisfaction. The method involves training a second classification model to analyze interaction data and generate an empathy confidence score, which reflects the likelihood that a given response or interaction exhibits empathetic behavior. The trained model produces an accuracy value associated with this empathy confidence score, providing a measurable metric for evaluating the model's performance in detecting and assessing empathy. This accuracy value helps refine the model's predictions and ensures reliable empathy detection in real-world applications. The system may also include preprocessing steps to prepare interaction data, such as text or speech, for analysis. The trained model may be applied to new interactions to generate empathy scores, which can be used to adjust responses, improve training datasets, or enhance user experience in automated systems. The accuracy value allows for continuous improvement of the model by identifying areas where empathy detection may be less precise, ensuring higher-quality interactions over time. This technology is particularly useful in customer service, mental health support, and other fields where empathetic communication is critical.

Claim 12

Original Legal Text

12. The method of claim 9 , wherein capturing a digitized voice segment comprises: capturing, by the server computing device, a bitstream containing the digitized voice segment from the remote computing device as a speech file; and adjusting, by the server computing device, compression of the bitstream containing the digitized voice segment to enhance audio quality of the bitstream.

Plain English Translation

This invention relates to digital voice processing systems, specifically methods for capturing and enhancing digitized voice segments transmitted between computing devices. The technology addresses challenges in maintaining high audio quality during voice data transmission, particularly when dealing with compressed bitstreams that may degrade speech clarity. The method involves a server computing device receiving a bitstream containing a digitized voice segment from a remote computing device. The bitstream is stored as a speech file, and the server dynamically adjusts the compression of the bitstream to improve audio quality. This adjustment may involve modifying compression parameters, such as bitrate or codec settings, to preserve speech intelligibility and reduce distortion. The process ensures that the transmitted voice data retains sufficient fidelity for accurate processing or playback, even under varying network conditions or device capabilities. The invention builds on a broader system where voice data is captured, processed, and transmitted between devices. The compression adjustment step is a critical refinement that distinguishes this method from conventional voice transmission techniques, which often rely on fixed compression settings that may not optimize quality for all scenarios. By dynamically enhancing the bitstream, the method improves the reliability and clarity of digitized voice communications in applications such as telephony, voice assistants, or transcription services.

Claim 13

Original Legal Text

13. The method of claim 9 , wherein the one or more vocal cues of the waveform comprise a tone attribute, a pitch attribute, a volume attribute, and a speed attribute.

Plain English Translation

This invention relates to audio processing, specifically analyzing vocal cues in waveforms to extract and classify speech attributes. The method addresses the challenge of accurately identifying and interpreting human vocal characteristics in audio signals, which is critical for applications like voice recognition, emotion detection, and speech analysis. The technique processes an audio waveform to detect and quantify specific vocal attributes, including tone, pitch, volume, and speed. These attributes are extracted from the waveform and used to analyze or classify the speech content. The method may involve comparing the extracted attributes against reference profiles or thresholds to determine characteristics such as emotional state, speaker identity, or speech patterns. By focusing on these four key vocal attributes, the system provides a robust framework for understanding and interpreting spoken language in various applications, including voice assistants, healthcare diagnostics, and security systems. The approach enhances the precision of audio analysis by leveraging multiple vocal dimensions, improving accuracy in tasks like speaker verification, sentiment analysis, and speech synthesis.

Claim 14

Original Legal Text

14. The method of claim 13 , wherein the server computing device analyzes one or more frequencies associated with the waveform to determine the one or more vocal cues of the waveform.

Plain English Translation

This invention relates to analyzing vocal cues in audio waveforms to detect emotional or physiological states. The method involves processing an audio signal containing speech or vocalizations to extract frequency-based features that indicate vocal cues. A server computing device receives the audio waveform and performs spectral analysis to identify specific frequency components associated with vocal cues, such as pitch, tone, or stress patterns. These cues are then used to infer emotional states, health conditions, or other physiological responses. The analysis may involve comparing the detected frequencies against a reference database or applying machine learning models trained to recognize patterns in vocal frequencies. The system can be used in applications like mental health monitoring, voice-based authentication, or stress detection in real-time communication. The method improves upon traditional voice analysis by focusing on frequency-based vocal cues rather than just linguistic content, enabling more nuanced and accurate assessments of the speaker's state. The invention addresses the challenge of extracting meaningful emotional or physiological insights from voice data without relying on explicit verbal content.

Claim 15

Original Legal Text

15. The method of claim 9 , wherein converting the digitized voice segment into text comprises executing a speech recognition engine on a digital file containing the digitized voice segment to generate the text.

Plain English Translation

This invention relates to speech recognition systems designed to convert digitized voice segments into text. The problem addressed is the need for accurate and efficient transcription of spoken language into written form, particularly in applications requiring real-time or automated processing of voice data. The method involves processing a digitized voice segment, which is a recorded or captured audio representation of speech. The digitized voice segment is stored as a digital file, which may be in a standard audio format. A speech recognition engine is then applied to this digital file to analyze the audio data and generate a corresponding text output. The speech recognition engine uses algorithms to interpret phonetic patterns, linguistic structures, and contextual cues within the digitized voice segment to produce accurate text transcription. The method may include preprocessing steps to enhance audio quality, such as noise reduction or normalization, before applying the speech recognition engine. The engine may also incorporate machine learning models trained on diverse speech samples to improve recognition accuracy. The resulting text can be used for various applications, including transcription services, voice-controlled interfaces, and automated documentation systems. This approach ensures reliable conversion of spoken language into text, addressing challenges in speech recognition accuracy and processing efficiency.

Claim 16

Original Legal Text

16. The method of claim 15 , wherein the server computing device analyzes the text using a grammar recognition engine to validate the text.

Plain English Translation

COMPUTER-IMPLEMENTED TEXT VALIDATION A system validates user-input text by analyzing it for grammatical correctness. The validation process involves employing a grammar recognition engine to identify and confirm that the text adheres to established grammatical rules. This ensures the accuracy and proper structure of the text before it is further processed or accepted.

Patent Metadata

Filing Date

Unknown

Publication Date

August 25, 2020

Inventors

Samir Kakkar

Hilary Lex

Neha Dave

Richa Srivastava

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search