US-12585658-B2

Generating augmented output sequences by a neural network using external databases

PublishedMarch 24, 2026

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating augmented output sequences by using a language model neural network and a plurality of databases. Each database is associated with a respective level of confidence. Each database stores a respective plurality of pre-generated embeddings in an embedding space.

Patent Claims

Legal claims defining the scope of protection, as filed with the USPTO.

. A computer-implemented method comprising:

. The method of, wherein identifying the particular subsequence that has the particular confidence score comprises:

. The method of, wherein the augmented output sequence comprises a rephrasing of the particular subsequence that accounts for a lack of confidence of the information represented by the particular subsequence.

. The method of, wherein each database stores the respective plurality of pre-generated embeddings in association with indices that specify the level of confidence of the database.

. The method of, wherein the index specifies, for each of the respective plurality of pre-generated embeddings, (i) a network location of an electronic document that comprises a source subsequence based on which the pre-generated embeddings is generated, (ii) a location offset of the source subsequence within the electronic document, and (iii) a timestamp that the electronic document is last modified.

. The method of, wherein the language model neural network comprises a Transformer neural network that has been trained on language modeling tasks.

. The method of, further comprising providing the augmented output sequence to a client device for presentation on a display of the client device.

. The method of, further comprising generating the plurality of databases based on:

. The method of, further comprising:

. The method of, wherein the third level of confidence is lower than the first level of confidence but higher than the second level of confidence.

. The method of, wherein the particular level of confidence associated with the particular database is a lowest level of confidence among the levels of confidence associated with one or more of the plurality of databases.

. One or more non-transitory computer-readable storage media storing instructions that when executed by one or more computers cause the one or more computers to perform operations comprising:

. A system comprising one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to perform operations comprising:

. The system of, wherein identifying the particular subsequence that has the particular confidence score comprises:

. The system of, wherein the augmented output sequence comprises a rephrasing of the particular subsequence that accounts for a lack of confidence of the information represented by the particular subsequence.

. The system of, wherein each database stores the respective plurality of pre-generated embeddings in association with indices that specify the level of confidence of the database.

. The system of, wherein the index specifies, for each of the respective plurality of pre-generated embeddings, (i) a network location of an electronic document that comprises a source subsequence based on which the pre-generated embeddings is generated, (ii) a location offset of the source subsequence within the electronic document, and (iii) a timestamp that the electronic document is last modified.

. The system of, wherein the operations further comprise providing the augmented output sequence to a client device for presentation on a display of the client device.

Detailed Description

Complete technical specification and implementation details from the patent document.

This specification relates to generating an output in response to an input using neural networks.

Neural networks are machine learning models that employ one or more layers of nonlinear units to predict an output for a received input. Some neural networks include one or more hidden layers in addition to an output layer. The output of each hidden layer is used as input to the next layer in the network, i.e., the next hidden layer or the output layer. Each layer of the network generates an output from a received input in accordance with current values of a respective set of parameters.

Some interactive software applications (which may be referred to as “automated assistants,” “digital agents,” “chatbots,” “interactive personal assistants,” “intelligent personal assistants,” “conversational agents,” etc.) implement neural networks to generate textual outputs in response to inputs, e.g., requests and/or prompts, received from humans (which, when they interact with the interactive software applications may be referred to as “users”).

This specification describes a system implemented as computer programs on one or more computers in one or more locations that generates an augmented output sequence by using a language model neural network to generate an initial output sequence and then modifying, as a result of evaluating the initial output sequence by using a plurality of external databases, the initial output sequence to generate the augmented output sequence. The augmented output sequence has a greater likelihood to be factually accurate than the initial output sequence, and is thus more suitable for providing as the output of the system in response to a given request.

Particular embodiments of the subject matter described in this specification can be implemented so as to realize one or more of the following advantages.

An augmented output sequence generation system as described in this specification can generate augmented output sequences in response to input sequences based on using a language model neural network and on the content stored in multiple databases external to the language model neural network. The multiple external databases store embeddings generated from different collections of electronic documents that are associated with different levels of confidence and that are used in an augmentation process to improve the quality of the initial output sequences generated by the language model neural network. As a result of the augmentation process, the initial output sequences can be augmented to generate the augmented output sequences that account for, e.g., mitigate or alleviate, any factual incorrectness that might occur in the initial output sequences.

Thus, the augmented output sequences generated by the system are more likely to be factually accurate than output sequences generated using a language model neural network without performing a subsequent augmentation process. The augmented output sequence generation system described in this specification is thus suitable for deployment at production environments such as within an educational or medical organization in which false or misleading information may result in serious consequences.

In particular, by generating an initial output sequence using the language model neural network followed by an augmentation process that uses the multiple external databases to augment the initial output sequence, the described system improves the quality of the generated output sequences without having to retrain the language model neural network and thus avoids additional processing resource and power consumption that is otherwise required for retraining the neural network to improve the quality of its output sequences. Moreover, the system can augment the initial output sequences without needing to have access to any underlying details of the language model neural network, e.g., the architecture, the weights, or both of the neural network that generates the initial output sequences, allowing the described techniques to be applied across a wide variety of different language model neural networks.

The details of one or more embodiments of the subject matter of this specification are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages of the subject matter will become apparent from the description, the drawings, and the claims.

Like reference numbers and designations in the various drawings indicate like elements.

shows an example augmented output sequence generation system(referred to below as the “system”for short). The systemis an example of a system implemented as computer programs on one or more computers in one or more locations that generates an augmented output sequencefrom an input sequence.

In some implementations, the systemis part of a text generation system that generates text sequences, i.e., each augmented output sequencegenerated by the systemis a sequence of text tokens from a vocabulary of text tokens that includes, e.g., one or more of characters, sub-words, words, punctuation marks, numbers, or other symbols that appear in natural language text. For example, the systemcan generate text sequences in response to received requests and provide the text sequences for presentation to users, e.g., on a display of a client device of a user.

The input sequencecan be a query submitted to the systemby a user through the client device, a question submitted to the systemby through the client device, or a different request that requires a response from the system. In some cases, the systemreceives the query as text from the client device. In some cases, the systemreceives the query as part of a multi-model input from the client device. In general, a multi-modal input is a combination of two or more different types of data, e.g., two or more of text data, audio data, image data, or graph data. As one example the multi-modal input may comprise a combination of i) text data representing text in a natural language and ii) pixels of an image or of video or audio data representing values of an audio waveform. In some other cases, the systemreceives a natural language speech query from the user and converts the speech into the input sequenceby applying a speech recognition engine to the speech. The input sequencemay be received in the form of a sound (speech) signal, captured by a microphone of the client device, which is converted by a speech recognition engine, i.e., a speech-to-text converter to form the input sequence. Alternatively, it may be entered by typing using a data input device of the client device.

Once the systemreceives the input sequence, the systemprocesses the input sequenceusing a language model neural networkto generate an initial output sequencethat includes a sequence of tokens.

The language model neural networkcan be any appropriate language model neural network that receives an input sequencemade up of tokens selected from a vocabulary and auto-regressively generates an initial output sequencemade up of tokens from the vocabulary. For example, the language model neural networkcan be a Transformer-based language model neural network or a recurrent neural network-based language model. The tokens in the vocabulary can be any appropriate text tokens, e.g., words, word pieces, punctuation marks, and so on, that represent elements of text in one or more natural languages and, optionally, numbers and other text symbols that are found in a corpus of text.

The language model neural networkis referred to as an auto-regressive neural network because the language model neural networkauto-regressively generates an initial output sequence of tokens by generating each particular token in the initial output sequenceconditioned on a current input sequence that includes any tokens that precede the particular text token in the initial output sequence, i.e., the tokens that have for already been generated for any previous positions in the initial output sequence that precede the particular position of the particular token.

More specifically, to generate a particular token at a particular position within the initial output sequence, the language model neural networkcan process the current input sequence to generate a score distribution, e.g., a probability distribution, that assigns a respective score, e.g., a respective probability, to each token in the vocabulary of tokens. The language model neural networkcan then select, as the particular token, a token from the vocabulary using the score distribution. For example, the neural networkcan greedily select the highest-scoring token or can sample, e.g., using nucleus sampling or another sampling technique, a token from the distribution.

As a particular example, the language model neural networkcan be an auto-regressive Transformer-based neural network that includes (i) a plurality of attention blocks that each apply a self-attention operation and (ii) an output subnetwork that processes an output of the last attention block to generate the score distribution.

The language model neural networkcan have any of a variety of Transformer-based neural network architectures. Examples of such architectures include those described in Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683, 2019; Daniel Adiwardana, Minh-Thang Luong, David R. So, Jamie Hall, Noah Fiedel, Romal Thoppilan, Zi Yang, Apoorv Kulshreshtha, Gaurav Nemade, Yifeng Lu, and Quoc V. Le. Towards a human-like open-domain chatbot. CoRR, abs/2001.09977, 2020; and Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neclakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. arXiv preprint arXiv:2005.14165, 2020.

Generally, however, the Transformer-based neural network includes a sequence of attention blocks, and, during the processing of a given input sequence, each attention block in the sequence receives a respective input hidden state for each input token in the given input sequence. The attention block then updates each of some or all of the hidden states at least in part by applying self-attention to generate a respective output hidden state for each of the input tokens. The input hidden states for the first attention block are embeddings of the input tokens in the input sequence and the input hidden states for each subsequent attention block are the output hidden states generated by the preceding attention block.

In this example, the output subnetwork processes the output hidden state generated by the last attention block in the sequence for the last input token in the input sequence to generate the score distribution.

In some implementations, the systemor another training system pre-trains the language model neural networkon a language modeling task, e.g., a task that requires predicting, given a current sequence of text tokens, the next token that follows the current sequence in the training data. As a particular example, the language model neural networkcan be pre-trained on a maximum-likelihood objective on a large dataset of text, e.g., text that is publicly available from the Internet or another text corpus.

When the initial output sequenceis generated, the systemdetermines whether the initial output sequenceis suitable for providing to the user in response to submitting the input sequence. In particular, the systemuses a scoring engineto determine a likelihood that the initial output sequenceincludes any factually incorrect, misleading, or otherwise nonsensical information. The scoring enginecan be configured to do this on the granularity of subsequences. Each subsequence includes a respective subset of the sequence of tokens included in the initial output sequencegenerated by the language model neural network.

If the scoring enginedetermines that the likelihood that the initial output sequenceincludes any such information is lower than a threshold likelihood, then the systemprovides the initial output sequenceas the augmented output sequencefor presentation to the user in response to receiving the input sequence.

Alternatively, if the scoring enginedetermines that the likelihood that the initial output sequenceincludes factually incorrect, misleading, or otherwise nonsensical information is higher than the threshold likelihood, then the systemuses an augmentation engineto augment, e.g., apply a modification or correction to, the initial output sequenceto generate an augmented output sequencethat is subsequently provided for presentation to the user in response to receiving the input sequence.

Optionally, if the scoring enginedetermines that the likelihood that the initial output sequenceincludes factually incorrect, misleading, or otherwise nonsensical information is higher than the threshold likelihood, some implementations of the systemsimply rejects the input sequencebased on which the initial output sequenceis generated, and instead provide a default output to the user indicating that the system cannot respond to their query.

Such a determination can be made based on confidence scoresgenerated for the initial output sequence. The scoring enginecan generate a confidence scorefor each of one or more subsequences included in the initial output sequence. Each subsequence includes a respective subset of the sequence of tokens included in the initial output sequencegenerated by the language model neural network.

In some implementations, each subsequence can include the same number of tokens as another subsequence, while in other implementations, different subsequences can include different numbers of tokens. For each subsequence, the confidence scoregenerally indicates a level of confidence, e.g., confidence of the accuracy, of information represented by the subsequence.

In some implementations, the systemcan provide the confidence scoresalongside the augmented output sequencefor presentation to the user. In some implementations, the systempresents the augmented output sequencewith visual indications of the confidence scores. For example, the augmented output sequencecan be color coded, i.e., different subsequences are presented in potentially different colors, where different colors indicate different confidence scores.

To generate the confidence scores, the scoring engineaccesses or interfaces with a plurality of databases that are each associated with a respective level of confidence. In some implementations, each database corresponds to a respective knowledge base external to the language model neural network, and is associated with a respective level of confidence with respect to the knowledge base corresponding to the database.

Although a total of four databases,,,are shown infor convenience, there can generally be any number of databases, e.g., two databases, five databases, ten databases, and so on, that correspond respectively to two knowledge bases, five knowledge bases, ten knowledge bases, and so on. Each knowledge base includes a collection of electronic documents. The collection of electronic documents may pertain to a particular domain of knowledge.

The particular domain of knowledge may be generic. An example of a knowledge base that has a generic domain is the Wikipedia collection available on the Internet in the wikipedia.org domain, which includes a collection of encyclopedia articles such as a collection of Wikipedia articles or other encyclopedic collection of text articles. Another example of a knowledge base that has a generic domain is a collection of electronic documents that is available on a blogging platform or a social media platform. A further example of a knowledge base that has a generic domain is a collection of newspaper and/or magazine articles that is published by a news organization.

Alternatively, the particular domain of knowledge may be specific. An example of a knowledge base that has a specific domain is a scholarly, academic, and/or peer-reviewed journal that includes a collection of text articles specific to a scientific, medical, engineering, financial or another technical field. Another example of a knowledge base that has a specific domain is a collection of electronic documents that is maintained by an official government organization, e.g., that is available on the Internet in a .gov domain.

Each database is associated with a respective level of confidence with respect to the knowledge base corresponding to the database. For example, in, the databaseis associated with a first level of confidence, the databaseis associated with a second level of confidence, the databaseis associated with a third level of confidence, and the databaseis associated with a fourth level of confidence. In some implementations, different databases are associated with different levels of confidence, while in other implementations, two or more databases are associated with the same level of confidence.

More specifically, each database stores a respective plurality of pre-generated embeddings in an embedding space that have been generated from source subsequences included in the collection of electronic documents in the knowledge base that corresponds to the database. Each source subsequence can be, for example, a paragraph, a sentence, or another string or strings of text. Generally, each source subsequence is also made up of tokens selected from the vocabulary.

Within each database, the plurality of embeddings can be pre-generated in any appropriate way. For example, they can be generated by the language model neural networkofbased on processing the source subsequences included in the collection of electronic documents in the knowledge base that corresponds to the database. As a particular example, they can be the output hidden states generated by an attention block included in the language model neural network, or more generally, any output hidden states generated by one or more intermediate layers of the language model neural networkbased on processing the source subsequences. As another example, they can be generated in accordance with a predetermined mapping between each token included in the vocabulary and a corresponding embedding in the embedding space.

In some implementations, the collection of electronic documents in each knowledge base can be obtained by an index engine. The index enginecan be any appropriate index engine that is accessible by the systemand that crawls electronic documents (e.g., books, web pages (e.g., HTML pages), news articles, or other documents) that can be found in a corpus (e.g., a collection or repository of content) that is available on the Internet.

An electronic document (which for brevity will simply be referred to as a document) does not necessarily correspond to a file. A document may be stored in a portion of a file that holds other documents, in a single file dedicated to the document in question, or in multiple coordinated files.

For each database, the index engineassigns a level of confidence to the database. Index data that defines the assigned level of confidence can then be stored in association with the plurality of pre-generated embeddings within the database. Put another way, each pre-generated embedding can be stored in association with corresponding index data that specifies the level of confidence of the database within which the pre-generated embedding is stored.

In some implementations, the index enginecan determine the level of confidence based on a known location of the knowledge base that corresponds to the database. For example, the location of a knowledge base is known when it is included in a predetermined list of knowledge base locations that is maintained by the system.

In other implementations, the location of a knowledge base may not be known to the index engine. The location of a knowledge base may be unknown for many reasons. For example, the knowledge base could be a new knowledge base, e.g., is relatively recently made publicly available.

In those implementations, the index enginecan determine a level of confidence of a database that corresponds to the knowledge base in an automatic manner based on determining a similarity measure between the pre-generated embeddings stored in the database and the pre-generated embeddings stored in one or more other databases the levels of confidence of which are already determined. This will be described in more detail inbelow.

In some implementations, the corresponding index for each pre-generated embedding can specify additional information about the source subsequence based on which the pre-generated embedding is generated. Such additional information can generally include any contextual information about the source subsequence.

For example, for a given pre-generated embedding, the index can specify a network location of an electronic document from which the source subsequence is obtained, e.g., extracted. As another example, for a given pre-generated embedding, the index can specify a location offset of the source subsequence within an electronic document that includes the source subsequence. As yet another example, for a given pre-generated embedding, the index can specify a timestamp that an electronic document from which the source subsequence is obtained is last modified.

To generate the confidence scorefor each of the one or more subsequences included in the initial output sequence, the scoring enginegenerates a query embeddingfor each subsequence, and then identifies one or more neighbor pre-generated text embeddingsfrom the databases,,,using the query embedding.

Like the plurality of pre-generated embeddings stored in the databases, the query embeddingfor each subsequence included in the initial output sequencecan be generated in any appropriate way. For example, the query embeddingcan be the output hidden state of an intermediate layer in the language model neural network, or a combination of the output hidden states of two or more layers in the language model neural network, when the neural network is processing the input sequenceto generate the subsequence in the initial output sequence. As a particular example, the query embeddingcan be the output hidden state generated by an attention block included in the language model neural networkofwhen the neural network is processing the input sequenceto generate the subsequence in the initial output sequence.

Generally, identifying the one or more neighbor pre-generated text embeddingscan include computing, for each subsequence included in the initial output sequence, a respective distance between the query embeddingthat has been generated for the subsequence and the plurality of pre-generated text embeddings included in each of the databases,,,, and then selecting one or more pre-generated text embeddings based on the respective distances, that is, selecting the one or more “neighbor pre-generated text embeddings.” For example, the systemcan select, as the neighbor pre-generated text embeddings, one or more pre-generated text embeddings that have the smallest distances to the query embeddingor that satisfy a distance threshold, and the distance can be, e.g., a Euclidean distance, a Hamming distance, or other type of distance in the embedding space.

After identifying the one or more neighbor pre-generated text embeddings, the scoring enginecan compute the confidence score for each subsequence. In particular, it does this based on the index data that is stored in association with the one or more neighbor pre-generated text embeddings.

For each subsequence, when only one neighbor pre-generated text embeddingis identified, the scoring enginecan compute the confidence scorefor the subsequence based on the level of confidence specified by the index that is stored in association with the neighbor pre-generated text embedding. For example, there can be a one-to-one mapping between different levels of confidence and different confidence scores, and the scoring enginecan use the confidence score mapped to the level of confidence specified by the index as the confidence scorefor the subsequence.

In some implementations, the scoring enginecan compute the confidence scorebased on the additional information specified by the index that is stored in association with the neighbor pre-generated text embedding. For example, the confidence score can be lower when the neighbor pre-generated text embedding is generated based on a source subsequence included in an electronic document at a first network location, while the confidence score can be higher when the neighbor pre-generated text embedding is generated based on a source subsequence included in an electronic document at a second network location.

Patent Metadata

Filing Date

Unknown

Publication Date

March 24, 2026

Inventors

Unknown

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search