US-12586572-B2

Systems and methods for intent prediction and usage

PublishedMarch 24, 2026

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

In some aspects, the techniques described herein relate to a method including: tokenizing a text string into utterance tokens; vectorizing the utterance tokens; providing the utterance tokens to a machine learning model as input to the machine learning model; receiving, as output from the machine learning model, a predicted intent; formatting a query of a content repository, wherein the query includes the predicted intent; receiving, based on the query, an artifact from the content repository; and displaying the artifact via an interface.

Patent Claims

Legal claims defining the scope of protection, as filed with the USPTO.

. A method for intent prediction and usage comprising:

. The method of, comprising:

. A system comprising at least one computer including a processor, wherein the at least one computer is configured to:

. The system of, wherein the at least one computer is configured to:

. A non-transitory computer readable storage medium, including instructions stored thereon, which instructions, when read and executed by one or more computer processors, cause the one or more computer processors to perform steps comprising:

. The non-transitory computer readable storage medium of, comprising:

Detailed Description

Complete technical specification and implementation details from the patent document.

Aspects generally relate to systems and methods for intent prediction and usage.

Customer calls and other contacts with a customer contact management center represent opportunities for an organization to serve customers. If left unattended to, residual customer issues can lead to repeat customer calls/contacts with a contact center. Repeat calls may diminished customer experience and add costs to a customer support process. Because of the voice-only interaction of phone support, however, it can be challenging for agents to ascertain when all of a customer's concerns have been addressed during an interaction. Determining a customer's intents when a contact is made and presenting relevant customer information and support solutions to a support specialist at crucial times, such as immediately prior to and during an interaction, presents technical challenges.

In some aspects, the techniques described herein relate to a method for intent prediction and usage including: tokenizing a text string into utterance tokens; vectorizing the utterance tokens; providing the utterance tokens to a machine learning model as input to the machine learning model; receiving, as output from the machine learning model, a predicted intent; formatting a query of a content repository, wherein the query includes the predicted intent; receiving, based on the query, an artifact from the content repository; and displaying the artifact via an interface.

In some aspects, the techniques described herein relate to a method, wherein the artifact is indexed in the content repository by the predicted intent.

In some aspects, the techniques described herein relate to a method, wherein the artifact is a knowledge base article.

In some aspects, the techniques described herein relate to a method, including: receiving a voice-based contact; and converting the voice-based contact into the text string using a speech-to-text engine.

In some aspects, the techniques described herein relate to a method, wherein the text string is received as a text-based contact.

In some aspects, the techniques described herein relate to a method, including: generating mapped utterance tokens, wherein the generating mapped utterance tokens includes: mapping previously recorded utterance tokens to predefined intent category labels.

In some aspects, the techniques described herein relate to a method, including: training the machine learning model with the mapped utterance tokens.

In some aspects, the techniques described herein relate to a system including at least one computer including a processor, wherein the at least one computer is configured to: tokenize a text string into utterance tokens; vectorize the utterance tokens; provide the utterance tokens to a machine learning model as input to the machine learning model; receive, as output from the machine learning model, a predicted intent; format a query of a content repository, wherein the query includes the predicted intent; receive, based on the query, an artifact from the content repository; and displaying the artifact via an interface.

In some aspects, the techniques described herein relate to a system, wherein the artifact is indexed in the content repository by the predicted intent.

In some aspects, the techniques described herein relate to a system, wherein the artifact is a knowledge base article.

In some aspects, the techniques described herein relate to a system, wherein the at least one computer is configured to: receive a voice-based contact; and convert the voice-based contact into the text string using a speech-to-text engine.

In some aspects, the techniques described herein relate to a system, wherein the text string is received as a text-based contact.

In some aspects, the techniques described herein relate to a system, wherein the at least one computer is configured to: generate mapped utterance tokens, wherein the generating mapped utterance tokens includes mapping previously recorded utterance tokens to predefined intent category labels.

In some aspects, the techniques described herein relate to a system, wherein the at least one computer is configured to: train the machine learning model with the mapped utterance tokens.

In some aspects, the techniques described herein relate to a non-transitory computer readable storage medium, including instructions stored thereon, which instructions, when read and executed by one or more computer processors, cause the one or more computer processors to perform steps including: tokenizing a text string into utterance tokens; vectorizing the utterance tokens; providing the utterance tokens to a machine learning model as input to the machine learning model; receiving, as output from the machine learning model, a predicted intent; formatting a query of a content repository, wherein the query includes the predicted intent; receiving, based on the query, an artifact from the content repository; and displaying the artifact via an interface.

In some aspects, the techniques described herein relate to a non-transitory computer readable storage medium, wherein the artifact is indexed in the content repository by the predicted intent.

In some aspects, the techniques described herein relate to a non-transitory computer readable storage medium, wherein the artifact is a knowledge base article.

In some aspects, the techniques described herein relate to a non-transitory computer readable storage medium, including: receiving a voice-based contact; and converting the voice-based contact into the text string using a speech-to-text engine.

In some aspects, the techniques described herein relate to a non-transitory computer readable storage medium, wherein the text string is received as a text-based contact.

In some aspects, the techniques described herein relate to a non-transitory computer readable storage medium, including: generating mapped utterance tokens, wherein the generating mapped utterance tokens includes: mapping previously recorded utterance tokens to predefined intent category labels; and training the machine learning model with the mapped utterance tokens.

Aspects generally relate to systems and methods for intent prediction and usage.

In accordance with aspects, customer service specialists (also referred to as “support agents” or “agents” herein) may receive assistance from technological systems and methods during their interactions with customers. Systems and methods may determine customer intent from a conversation or other contact and use the intent with available customer data and other metadata to retrieve content from a content repository. The content may help a customer service specialist to address a customer's particular needs. Moreover, customer data may be used with determined intents and other available data to predict products, services, or other organizational offerings that a customer may be interested in.

In accordance with aspects, a transcription platform, including a speech-to-text engine, may transcribe audio of a conversation between an agent and a customer into a text utterance as the conversation happens. A trained machine learning (ML) model may receive the text utterance and categorize utterance tokens into customer intents. The determined intents may be used to select and display content such as knowledge base (KB) artifacts (e.g., how-to articles, FAQs, procedural instructions, etc.) from a repository to an agent.

Moreover, customer intents and/or information may be derived from other portions of a customer contact. For instance, the proposed system may display customer intents derived from an interactive voice response (IVR) system interaction of a customer before being routed to an agent. Customer account details may be displayed for, e.g., authenticated/verified customers when they are available. Digital interaction, e.g., instant messages or text messages sent to a text bot, or an agent, may also be processed with ML models to produce customer intents and useful information.

Pertinent customer information (e.g., organizational, and personal information of a customer), intents, and related KB artifacts may be displayed to an agent through an interface. An intent management platform may include a query engine for retrieving KB artifacts based on received intents and an interface (e.g., a web-based or other graphical interface) for displaying the KB artifacts, intents, etc., to an agent.

In accordance with aspects, a proposed system/method may use a transcription platform to generate text from speech signals of both a customer and an agent in a conversation. The transcription engine may include a service that takes streaming audio content as input and produces streaming text transcriptions as output. A conversation may be divided into sub-portions of recorded words called utterance tokens. The transcription engine may tokenize and vectorize utterances and may relay the vectorized utterance tokens to an intent classifier as input. An intent classifier may output inferred customer intents based on one or more utterance tokens received as input. Intent classifiers may be machine learning (ML) models trained with a training data set and a corresponding machine learning algorithm.

Aspects may include a suite of trained intent classifiers to determine customer intents from a customer interaction as they are spoken. A customer intent may identify one or more issues, complaints, questions, etc., that a customer seeks a resolution/answer to. Inferred intents and other conversational elements isolated and classified by a ML engine may be displayed to an agent through a user interface (e.g., a graphical user interface (GUI)).

Customer intents may also be used as lookup keys to query knowledge base (KB) artifacts from a repository and display the artifacts to an agent via an interface. As noted, above, KB artifacts may include institutional knowledge such as instructions and content on how to accomplish tasks related to customer inquiries, complaints, etc. For instance, a KB article for a payment product issuing organization may include instructions for submitting a report of a customer's lost/stolen payment card and for requesting a replacement card for the customer. A KB repository may be indexed by customer intent. All KB artifacts associated with a determined intent may be displayed to an agent during a customer contact. An artifact preview such as a title, abstract, etc., may also be displayed in a preview pane to an agent.

In some aspects, intent categories may be distilled into a set of labels that may be included in a training dataset that is used to train machine learning (ML) classification models. For instance, related intents may be categorized and given a label. Determined intents may be assigned to one or more labels, and the assignments may be used as input to a ML algorithm for fitting vectorized utterance tokens to a ML classifier/classification model.

Aspects may train intent classifier models on a training data set that includes previously recorded, transcribed, and labeled conversations that include customer intents and/or mappings to intent category labels. The training data may be input to a machine learning algorithm, and the ML algorithm may fit the data to a corresponding ML model. Exemplary models may include convolutional neural networks that include one input layer, one hidden layer, and one output (single node) layer. The input layer may receive, as input, vectorized utterance tokens, a vector of keyword indicators, and the output of, e.g., a “bidirectional encoder representations from transformers” (BERT) encoding model component that indicates key phrases within the utterance.

Aspects may first estimate customer intent at each speaker switch (turn) of the conversation. Aspects may further collect the finalized utterances recovered during each conversational turn, and the collated transcripts may be input to the intent classification model. A finalized utterance may be a segment of audio containing spoken words that the transcription engine is able to detect and transcribe into text. Aspects may first use, e.g., a Global Vectors for Word Representation (GloVe) algorithm to generate features of the text. The output of the algorithm is a vector of float values that represent the likelihood of the following term given the input terms. The vectors of term likelihoods from, e.g., a GloVe algorithm may serve as part of the input to the intent classifier. Keywords recovered from the terms in the conversation turn may be additional input features. The classifier output layer produces a vector of floating point output values that represent the classifier labels. Aspects may collect a number of scores, such as the top three intent scores, from the classifier model for each conversation turn and may output up to three intents to a user. In some aspects, a threshold value may be used to determine whether an intent is displayed to a user.

In accordance with aspects, system components, such as a transcription platform, speech-to-text engine, ML engine (including a ML classifier/classification model and corresponding algorithms), query engine, user interface, etc., may be provided as a set of microservices that are individually created and loosely coupled, such that the solution is scalable, robust to errors, observable, and highly available.

is a block diagram of a system for intent prediction and usage, in accordance with aspects. Systemincludes contact management platform, transcription platform, machine learning engine (ML) engine, and intent management platform. Systemalso includes content repositoryand customer account database.

Contact management platformincludes call engine, IVR engine, and digital engine, in accordance with aspects. A customer, such as customer, may initiate contact with contact management platform. The contact may be a voice-based contact such as a telephone call, video call, web-based call, etc., or it may be a text-based contact such as an instant message, SMS message, etc. Contact management platformmay include components for handling both voice-based and text-based contacts. For instance, call enginemay receive and process incoming voice-based contacts and digital enginemay receive and process incoming text-based contacts. In accordance with aspects, a contact may include more than one form of communication. For instance, a contact may start as a text-based contact and may progress to a voice-based contact. Additionally, a voice-based contact may be with an interactive voice response (IVR), a support agent, or both. That is, a voice-based contact may begin with an IVR and transition to contact with a support agent. IVR enginemay be responsible serving an IVR process and recording customer responses to the IVR process.

In accordance with aspects, contact management platformmay receive voice-based and text-based contacts and process the contacts according. For instance, call enginemay send a stream of uttered speech to transcription platform. IVR enginemay record responses to IVR prompts and send the response to transcription platform. Additionally, call engineand/or IVR enginemay make and store recordings of voices contacts for training purposes (i.e., training of both agents and ML models). Moreover, digital enginemay receive strings of text and forward the text strings to transcription platform).

In accordance with aspects, transcription platformmay include speech-to-text engineand tokenization engine. Transcription platformmay receive steams of spoken utterances from contact management platformand may process the streams of spoken utterances with speech-to-text engine. Speech-to-text enginemay transcribe the spoken utterances to strings of text. Speech-to-text enginemay send strings of text to tokenization engine. Tokenization enginemay also receive strings of text (e.g., directly) from digital engine.

Tokenization enginemay divide a string of text into related sub-portions called utterance tokens. Utterance tokens may be split text strings based on white space, special characters, known phrases, parts of speech, etc. Tokenization enginemay also vectorize utterance tokens into feature vectors. A feature vector is a numerical feature generated from utterance tokens that may be processed as input to a ML model.

In accordance with aspects, vectorizing processes may produce, e.g., a count of individual words filtered by those most frequently occurring in the data, a count of sub-word character n-grams of, e.g., between 3 to 6 characters, and (in some aspects) regular expression (regex) patterns that may be supplied by data scientists for certain keywords, etc. Other vectorization processes may attempt to determine the semantic meaning of a word or an entire utterance in context. A vectorizing process may include model pre-trained weights that may consider word casing and may further convert word strings into a real valued feature vector for each word or sub-word of an utterance token and a real valued feature vector for the entire utterance. Such feature vectors generated by a tokenization engine may be ingested by a ML model for processing/predictions. Counts and vectors generated by tokenization enginemay be passed to a model as input.

In accordance with aspects, vectorizing may occurs at each speaker switch in a transcription. The transcribed utterances during a speaker's turn in the conversation may be collected and concatenated. This may represent a transcribed turn of the conversation. Prior to vectorizing, stop words may be removed from the turn transcript. Stop words may be frequent terms that do not add to an interpretation of the intent. Exemplary stop words in a transcribed text may include, “the” or “an”. Additionally, in transcribed conversation, “um” or “uh” may be a common stop word. Term stemming may then be applied to reduce vocabulary cardinality. Term stemming may include removing suffixes and prefixes that do not contribute to semantic content. The output of vectorization may be a representation (embedding) of terms and term relationships to other terms in the word stream. The vector space representation may include the input to the intent categorization method.

Exemplary embedding schemes for text may include GloVe, TF-IDF, ELMo, and LSTM each of which may generate vectorized representations. Aspects may select the best embedding scheme for a specific corpus and intent classification task. Using training data e.g., (conversation elements with labeled intents), aspects may generate embedding representations of each embedding scheme for each training data instance (conversation element). Then for each embedding, the centroid representation of each intent may be found. A selected embedding may be the generated embedding with the maximal distance between intent centroids (e.g., L2-norm).

In accordance with aspects, feature vectors generated by transcription platformmay be sent to ML enginefor processing with one or more ML models. Feature vectors may be used as input to intent ML model. Intent ML modelmay be a classification model that takes a feature vector generated from an utterance token as input and provides a prediction of an intent classification as output. Although intent ML modelis depicted as a single model, it is contemplated that transcription platformmay include a suite of intent classification models to determine customer intents from a customer interaction as they are spoken.

In accordance with aspects, an intent classification model may be a convolutional neural network (CNN) model. Input to a CNN model may include the embedding representation of a conversational element combined with additional keyword indicators that are relevant to the target intents detection. The input layer size may be the size of the embedding representation plus the keyword indicators. The input layer may be fully connected to the initial hidden layers with dropout regularization. The hidden convolutional layers (e.g., 6 hidden layers) may be 512, 256, 256, 256, 256, and 512 in size respectively. Kernel (i.e., filter) sizes of the CNN may decrease through the hidden layers (for instance, 10, 5, 4, 3, 2, and 1 respectively). The intuition for kernel sizes may be that adjacent features are relevant to the activation of a particular feature and by gradually decreasing kernel sizes, feature activation can be focused to the closest features. The output layer size may be the number of intents to classify. The hidden layers may use dropout regularization to prune connections between hidden layers as well as hidden to output layers. The output intent category for the input element may correspond to the output node with a maximal value. Aspects may search in the convolutional network parameters (e.g., numbers of hidden layers (depth of convolution), layer sizes, kernel sizes) to find the best network for the intent categorization task using the training data.

Intent ML modelmay be configured to output a predicted intent for one or more feature vector inputs. For instance, based on one or more feature vectors generated from a stream of audio from a voice-based contact with respect to a customer's payment card product being lost, intent ML modelmay output an intent of “lost/stolen payment card.” In another example, intent ML modelmay output an intent of “online bill-pay process” for a customer that has made contact and inquired about how to pay a bill through on organization's bill pay service.

Intents output from intent ML modelmay be sent to intent management platform. Intent management platformmay include query engineand interface. Intent management platformmay receive intents from ML engineand query enginemay use the intents as a lookup parameter to retrieve content from content repository. Content repositorymay be any suitable data store for storing content that may be displayed by interface. For instance, content repositorymay be a relational database, a NoSQL database, a key-value pair datastore, etc.

Content repositorymay store KB artifacts. KB artifacts stored in content repositorymay be indexed by intents. That is, content repositorymay be indexed by intents that are output by intent ML model. Query enginemay receive an intent from ML engineand format a query using the received intent as a lookup parameter. The query may retrieve all KB artifacts from content repositorythat are associated with the intent used as the lookup parameter.

In accordance with aspects, intent management platformmay display all retrieved KB artifacts from content repositoryto agentvia interface. Interfacemay be any suitable interface, such as a web interface, an application interface, or some other type of graphical, textual, or commend-line interface. Interfacemay further be configured to display other helpful information to support agentin real time (i.e., as agentis engaging customeron a voice-based or text-based contact). For instance, various intents may be displayed. Raw intents may help agentclarify if the system has determined the correct intent. Moreover, certain intents may classify a customer's various emotions at the outset of a contact and may reclassify emotions as they change throughout the course of a contact (e.g., an initial displayed intent may be disappointment and/or frustration, and as a contact proceeds, a freshly determined intent may indicate appreciation).

In some aspects, interfacemay display a transcript of a conversation as it happens. This may help agentmore clearly understand what is being spoken and may allow agentto read the transcript rather than ask customers to repeat themselves.

In accordance with aspects, intents may be distilled into a set of labels that may be included in a training dataset that is used to train ML classifier models. For instance, related intents may be categorized and given a label. Determined intents may be assigned to one or more labels, and the assignments may be used as input to a ML algorithm for fitting data to a ML classification model.

Patent Metadata

Filing Date

Unknown

Publication Date

March 24, 2026

Inventors

Unknown

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search