Patentable/Patents/US-20260057016-A1

US-20260057016-A1

Computing Systems and Methods for Generating a Response to a Query Based on a Corpus of Documents

PublishedFebruary 26, 2026

Assigneenot available in USPTO data we have

InventorsNoël Vouitsis Jiapeng Wu Yi Sui Graham Andrew Warner Paulina Corona Ugalde+1 more

Technical Abstract

Systems and methods for retrieving information from a corpus of documents that is relevant to a query. The method comprises: generating a first plurality of chunks by subdividing each document in the corpus of documents into one or more chunks of a first size; generating a second plurality of chunks by subdividing each document in the corpus of documents into one or more chunks of a second, larger, size; using an information retrieval system to identify, from the second plurality of chunks, a set of chunks of the second size that are relevant to a query; and using the information retrieval system to identify, from a subset of chunks of the first plurality of chunks, a set of chunks of the first size that are relevant to the query. The subset is based on the set of chunks of the second size that are relevant to the query.

Patent Claims

Legal claims defining the scope of protection, as filed with the USPTO.

a memory, a communication interface, and at least one processor operatively coupled to the memory and the communication interface; generate a first plurality of chunks by subdividing each document in the corpus of documents into one or more chunks of a first size; generate a second plurality of chunks by subdividing each document in the corpus of documents into one or more chunks of a second, larger, size; use an information retrieval system to identify, from the second plurality of chunks, a set of chunks of the second size that are relevant to a query; and use the information retrieval system to identify, from a subset of chunks of the first plurality of chunks, a set of chunks of the first size that are relevant to the query, wherein the subset of the first plurality of chunks is based on the set of chunks of the second size that are relevant to the query; and execute a generation LLM to generate a response to the query based on all or a portion of the set of chunks of the first size that are relevant to the query. the at least one processor configured to: . A system for retrieving information from a corpus of documents relevant to a query, the system comprising:

claim 1 . The system of, wherein the at least one processor is further configured to identify from the set of chunks of the second size that are relevant to the query a set of one or more relevant documents of the corpus of documents; and wherein the subset of chunks of the first plurality of chunks comprises chunks in the first plurality of chunks corresponding to a document in the set of one or more relevant documents.

claim 2 . The system of, wherein a relevant document is a document corresponding to at least one chunk in the set of chunks of the second size that are relevant to the query.

claim 1 . The system of, wherein the at least one processor is configured to use the information retrieval system to identify, from the subset of chunks of the first plurality of chunks, the set of chunks of the first size relevant to the query by using an index engine of the information retrieval system to generate a first search index for the first plurality of chunks and using a search engine of the information retrieval system to identify, from the first search index, the set of chunks of the first size that are relevant to the query from the first search index.

claim 4 . The system of, wherein the at least one processor is configured to use the information retrieval system to retrieve, from the second plurality of chunks, the set of chunks of the second size that are relevant to the query by using the index engine to generate a second search index for the second plurality of chunks and using the search engine to identify, from the second search index, the set of chunks of the second size relevant to the query.

claim 1 . The system of, wherein the at least one processor is further configured to retrieve the set of chunks of the first size from storage.

claim 1 . The system of, wherein the at least one processor is further configured to use a re-ranker LLM to rank the set of chunks of the first size based on a relevance to the query or another query related to the query.

claim 7 . The system of, wherein the at least one processor is further configured to select a subset of chunks from the set of chunks of the first size based on the ranking.

claim 8 . The system of, wherein selecting the subset of chunks from the set of chunks of the first size based on the ranking comprises selecting k chunks of the set of chunks of the first size with a highest ranking to form the subset, wherein k is an integer greater than or equal to 1.

claim 8 . The system of, wherein selecting the subset of chunks from the set of chunks of the first size based on the ranking comprises selecting a set of top documents of the corpus of documents from the ranking and selecting all or a portion of the chunks in the set of chunks of the first size associated with the set of top documents to form the subset.

(canceled)

claim 1 . The system of, wherein the response to the query comprises one or more citations to a document corresponding to a chunk of the set of chunks of the first size relevant to the query.

claim 1 . The system of, wherein the at least one processor is further configured to use a query modification LLM to generate synthetic information related to a second query and generate an amended second query based on the synthetic information related to the second query; and wherein the query is the amended second query.

claim 13 use the query modification LLM to generate synthetic information related to the second query by instructing the query modification LLM to generate a set of one or more keywords for the second query; and generate the amended second query based on the synthetic information related to the second query by combining the second query and the set of one or more keywords to form the amended second query. . The system of, wherein the at least one processor is configured to:

claim 13 . The system of, wherein the at least one processor is further configured to use the information retrieval system to retrieve a document from the corpus of documents deemed most relevant to the second query; and wherein the at least one processor is configured to use the query modification LLM to generate synthetic information related to the second query by causing the query modification LLM to re-write the second query based on a context of the retrieved document.

generating a first plurality of chunks by subdividing each document in the corpus of documents into one or more chunks of a first size; generating a second plurality of chunks by subdividing each document in the corpus of documents into one or more chunks of a second, larger, size; using an information retrieval system to identify, from the second plurality of chunks, a set of chunks of the second size that are relevant to a query; using the information retrieval system to identify, from a subset of chunks of the first plurality of chunks, a set of chunks of the first size that are relevant to the query, wherein the subset of the first plurality of chunks is based on the set of chunks of the second size that are relevant to the query; and executing a generation LLM to generate a response to the query based on all or a portion of the set of chunks of the first size that are relevant to the query. . A method for retrieving information from a corpus of documents relevant to a query, the method executed in a computing environment comprising one or more processors, a communication interface, and memory, and the method comprising:

claim 16 . The method of, further comprising, identifying, from the set of chunks of the second size that are relevant to the query, a set of one or more relevant documents of the corpus of documents; and wherein the subset of chunks of the first plurality of chunks comprises chunks in the first plurality of chunks corresponding to a document in the set of one or more relevant documents.

claim 17 . The method of, wherein a relevant document is a document corresponding to at least one chunk in the set of chunks of the second size that are relevant to the query.

claim 16 using the information retrieval system to retrieve, from the second plurality of chunks, the set of chunks of the second size that are relevant to the query comprises using the index engine to generate a second search index for the second plurality of chunks and using the search engine to identify, from the second search index, the set of chunks of the second size relevant to the query. . The method of, wherein using the information retrieval system to identify, from the subset of chunks of the first plurality of chunks, the set of chunks of the first size relevant to the query comprises using an index engine of the information retrieval system to generate a first search index for the first plurality of chunks and using a search engine of the information retrieval system to identify, from the first search index, the set of chunks of the first size that are relevant to the query from the first search index; and

generating a first plurality of chunks by subdividing each document in the corpus of documents into one or more chunks of a first size; generating a second plurality of chunks by subdividing each document in the corpus of documents into one or more chunks of a second, larger, size; using an information retrieval system to identify, from the second plurality of chunks, a set of chunks of the second size that are relevant to a query; using the information retrieval system to identify, from a subset of chunks of the first plurality of chunks, a set of chunks of the first size that are relevant to the query, wherein the subset of the first plurality of chunks is based on the set of chunks of the second size that are relevant to the query; and executing a generation LLM to generate a response to the query based on all or a portion of the set of chunks of the first size that are relevant to the query. . A non-transitory computer readable medium storing computer executable instructions which, when executed by at least one computer processor, cause the at least one computer processor to carry out a method for retrieving information from a corpus of documents relevant to a query, the method comprising:

Detailed Description

Complete technical specification and implementation details from the patent document.

The disclosed example embodiments relate to computer-implemented methods and systems for generating a response to a query based on a corpus of documents, and more specifically for generating a response to a query from a corpus of documents using a large language model (LLM)-based retrieval-augmented generation (RAG) system.

There are many applications where it may be useful to be able to generate a response to a query based on a private or non-public corpus of documents (e.g., documents internal to an enterprise). For example, an enterprise may have a set of agents that receive enquiries from customers about products and services offered by the enterprise. While information that the resolves the customer's query can generally be found in the enterprise's internal and/or external documents it may be cumbersome for an agent (or another enterprise employee) to locate information relevant to a customer's query in the internal and/or external documents. It is desirable to generate an answer to the customer's query in a more efficient and automated way.

With the emergence of large language models (LLMs) and their ability to generate understand and generate human-like text based on patterns they recognize, LLMs seem well suited to automatically generate responses to queries. However, while LLMs are trained on a vast amount of data from various fields, if the query relates to information that is not known to the LLM (e.g., information that did not form part of the LLM's training dataset)—because, for example, the information relates to a specific domain or to an enterprise's internal knowledge base—the LLM may not be able to provide an accurate answer to the query. Accordingly, a technique referred to as retrieval augmented generation (RAG) has been developed. In RAG, a query is first sent to an information retrieval (IR) system to retrieve information from an external knowledge base (external to the data used to train the LLM) which comprises, for example, documents etc. related to a specific domain and/or an enterprise's internal documents etc.; then the retrieved information and the original query are provided to an LLM along with instructions to generate a response to the query based on the provided information. In this way the external knowledge base is used to enhance the LLM's output without having to re-train the LLM.

The following summary is intended to introduce the reader to various aspects of the detailed description, but not to define or delimit any invention.

A first aspect provides a system for retrieving information from a corpus of documents relevant to a query, the system comprising: a memory, a communication interface, and at least one processor operatively coupled to the memory and the communication interface; the at least one processor configured to: generate a first plurality of chunks by subdividing each document in the corpus of documents into one or more chunks of a first size; generate a second plurality of chunks by subdividing each document in the corpus of documents into one or more chunks of a second, larger, size; use an information retrieval system to identify, from the second plurality of chunks, a set of chunks of the second size that are relevant to a query; and use the information retrieval system to identify, from a subset of chunks of the first plurality of chunks, a set of chunks of the first size that are relevant to the query, wherein the subset of the first plurality of chunks is based on the set of chunks of the second size that are relevant to the query.

The at least one processor may be further configured to identify from the set of chunks of the second size that are relevant to the query a set of one or more relevant documents of the corpus of documents; and wherein the subset of chunks of the first plurality of chunks comprises chunks in the first plurality of chunks corresponding to a document in the set of one or more relevant documents.

A relevant document may be a document corresponding to at least one chunk in the set of chunks of the second size that are relevant to the query.

The at least one processor may be configured to use the information retrieval system to identify, from the subset of chunks of the first plurality of chunks, the set of chunks of the first size relevant to the query by using an index engine of the information retrieval system to generate a first search index for the plurality of chunks and using a search engine of the information retrieval system to identify, from the first search index, the set of chunks of the first size that are relevant to the query from the first search index.

The at least one processor may be configured to use the information retrieval system to retrieve, from the second plurality of chunks, the set of chunks of the second size that are relevant to the query by using the index generator to generate a second search index for the second plurality of chunks and using the search engine to identify, from the second search index, the set of chunks of the second size relevant to the query.

The at least one processor may be further configured to retrieve the set of chunks of the first size from storage.

The at least one processor may be further configured to use a re-ranker LLM to rank the set of chunks of the first size based on a relevance to the query or another query related to the query.

The at least one processor may be further configured to select a subset of chunks from the set of chunks of the first size based on the ranking.

Selecting the subset of chunks from the set of chunks of the first size based on the ranking may comprise selecting k chunks of the set of chunks of the first size with a highest ranking to form the subset, wherein k is an integer greater than or equal to 1.

Selecting the subset of chunks from the set of chunks of the first size based on the ranking may comprise selecting a set of top documents of the corpus of documents from the ranking and selecting all or a portion of the chunks in the set of chunks of the first size associated with the set of top documents to form the subset.

The at least one processor may be further configured to use a generation LLM to generate a response to the query or the other query based on the subset of chunks.

The response to the query may comprise one or more citations to a document corresponding to a chunk of the subset of chunks.

The at least one processor may be further configured to use a query modification LLM to generate synthetic information related to a second query and generate an amended second query based on the synthetic information related to the second query; and wherein the query is the amended second query.

The at least one processor may be configured to: use the query modification LLM to generate synthetic information related to the second query by instructing the query modification LLM to generate a set of one or more keywords for the second query; and generate the amended second query based on the synthetic information related to the second query by combining the second query and the set of one or more keywords to form the amended second query.

The at least one processor may be further configured to use the information retrieval system to retrieve a document from the corpus of documents deemed most relevant to the second query; and wherein the at least one processor is configured to use the query modification LLM to generate synthetic information related to a second query by causing the query modification LLM to re-write the second query based on a context of the retrieved document.

A second aspect provides a method for retrieving information from a corpus of documents relevant to a query, the method executed in a computing environment comprising one or more processors, a communication interface, and memory, and the method comprising: generating a first plurality of chunks by subdividing each document in the corpus of documents into one or more chunks of a first size; generating a second plurality of chunks by subdividing each document in the corpus of documents into one or more chunks of a second, larger, size; using an information retrieval system to identify, from the second plurality of chunks, a set of chunks of the second size that are relevant to a query; and using the information retrieval system to identify, from a subset of chunks of the first plurality of chunks, a set of chunks of the first size that are relevant to the query, wherein the subset of the first plurality of chunks is based on the set of chunks of the second size that are relevant to the query.

The method may further comprise, identifying, from the set of chunks of the second size that are relevant to the query, a set of one or more relevant documents of the corpus of documents; and wherein the subset of chunks of the first plurality of chunks comprises chunks in the first plurality of chunks corresponding to a document in the set of one or more relevant documents.

A relevant document may be a document corresponding to at least one chunk in the set of chunks of the second size that are relevant to the query.

Using the information retrieval system to identify, from the subset of chunks of the first plurality of chunks, the set of chunks of the first size relevant to the query may comprise using an index engine of the information retrieval system to generate a first search index for the plurality of chunks and using a search engine of the information retrieval system to identify, from the first search index, the set of chunks of the first size that are relevant to the query from the first search index; and using the information retrieval system to retrieve, from the second plurality of chunks, the set of chunks of the second size that are relevant to the query comprises using the index generator to generate a second search index for the second plurality of chunks and using the search engine to identify, from the second search index, the set of chunks of the second size relevant to the query.

According to some aspects, the present disclosure provides a non-transitory computer-readable medium storing computer-executable instructions. The computer-executable instructions, when executed, configure a processor to perform any of the methods described herein.

As described above, a technique referred to as retrieval augmented generation (RAG) has been developed to allow LLMs to generate accurate responses to queries that related to subject matter that does not form part of the LLM's training dataset. In RAG, a query is first sent to an IR system to retrieve information from an external knowledge base (external to the data used to train the LLM) which comprises, for example, documents etc. related to a specific domain and/or an enterprise's internal documents etc.; then the retrieved information and the original query are provided to an LLM along with instructions to generate a response to the query based on the provided information. In this way the external knowledge is used to enhance the LLM's output without having to re-train the LLM.

Described herein are enhanced LLM-based RAG systems and methods for automatically generating a response to a query from a corpus of documents. Specifically, in the methods and systems described herein, an LLM is used to generate synthetic information related to the query; an amended query is generated from the synthetic information; an information retrieval system is used to retrieve, from a plurality of chunks (each of which is all or a portion of a document in the corpus of documents), a set of chunks that are relevant to the amended query; an LLM is used to rank the set of chunks based on their relevance to the query; a subset of the set of chunks is selected based on the ranking; and an LLM is used to generate a response to the query based on the subset of chunks. The systems and methods described herein leverage LLMs to provide an improved RAG system.

1 FIG. 100 Reference is now made to, which illustrates a block diagram of an example computing system, in accordance with at least some embodiments.

100 110 120 110 130 120 100 Computing systemcomprises a source database system, an enterprise data provisioning platform (EDPP)operatively coupled to the source database system, and a cloud-based computing clusterthat is operatively coupled to the EDPP. In some cases, this computing systemis provided for automatically generating a response to a user query from information in a corpus of documents. In some cases, the documents in the corpus of documents are files that include text. In some cases, different data formats of documents or files (or both), and which include text, can be used in the computing system described herein.

110 112 112 112 110 114 114 114 112 112 112 120 a b c a b c a b c Source database systemhas one or more databases, of which three are shown for illustrative purposes: database, databaseand database. One or more of the databases of the source database systemmay contain confidential information that is subject to restrictions on export. One or more export modules,,may periodically (e.g., daily, weekly, monthly, etc.) export data from the databases,,to EDPP. In some instances, the data is exported on an ad hoc basis.

120 114 114 114 110 130 122 120 a b c EDPPreceives source data exported by the export modules,,of source database system, processes it and exports the processed data to an application database within the cloud-based computing cluster. For example, a parsing moduleof EDPPmay perform extract, transform and load (ETL) operations on the received source data.

124 126 126 126 130 124 126 126 126 130 a b c a b c In many environments, access to the EDPP may be restricted to relatively few users, such as administrative users. However, with appropriate access permissions, data relevant to a document or group of documents (e.g., a client document) may be exported via reporting and analysis moduleor an export module,,. In particular, parsed data can then be processed and transmitted to the cloud-based computing clusterby a reporting and analysis module. Alternatively, one or more export modules,,can export the parsed data to the cloud-based computing cluster.

120 130 In some cases, there may be confidentiality and privacy restrictions imposed by governmental, regulatory, or other entities on the use or distribution of the source data. These restrictions may prohibit confidential data from being transmitted to computing systems that are not “on-premises” or within the exclusive control of an organization, for example, or that are shared among multiple organizations, as is common in a cloud-based environment. In particular, such privacy restrictions may prohibit the confidential data from being transmitted to distributed or cloud-based computing systems, where it can be processed by machine learning systems, without appropriate anonymization or obfuscation of personal identifiable information (PII) in the confidential data. Moreover, such “on-premises” systems typically are designed with access controls to limit access to the data, and thus may not be resourced or otherwise suitable for use in broader dissemination of the data. In some cases, to comply with such restrictions, one or more module of EDPPmay “de-risk” data tables that contain confidential data prior to transmission to cloud-based computing cluster. In some cases, this de-risking process may obfuscate or mask elements of confidential data, or may exclude certain elements, depending on the specific restrictions applicable to the confidential data. The specific type of obfuscation, masking or other processing is referred to as a “data treatment.”

130 188 190 The cloud-based computing clusterincludes an interface, which facilitates data communication with one or more of the client devices.

In some environments, the EDPP may be omitted.

2 FIG. 1 FIG. 2 FIG. 130 130 202 204 206 204 208 210 212 204 214 216 218 220 208 212 214 220 216 218 130 130 Reference is now made to, which illustrates an example implementation of the cloud-based computing clusterof. In the example shown inthe cloud-based computing clustercomprises a data ingestorfor receiving a set of documents, a document repositoryfor storing the received set of documents, and a pipelinefor automatically generating a responseto a user querybased on the set of documentsusing one or more LLMs,,and an IR system. Specifically, the pipelineis configured to receive a user query; generate, using an LLM, a modified user query based on synthetic data generated from the original user query; use the IR systemto obtain a set of document chunks relevant to the modified user query; use an LLMto re-rank the set of documents chunks based on their relevance to the original user query using and select a subset of the set of document chunks based on the ranking; and use an LLMto generate a response to the original user query based on the subset of document chunks. In some cases, one or more components of the cloud-based computing clustermay be implemented by one or more computers within the cloud-based computing cluster. In some cases, one or more components of the cloud-based computing clustermay be implemented as virtual machines within the cloud-based computing cluster.

202 120 204 206 204 204 202 216 218 130 206 The data ingestoris configured to receive from, for example, the EDPP, a set of documentsand store the received set of documents in the document repository. The set of documentscomprises a corpus of documents that comprise information from which answers to user queries can be found. In some cases, the set of documentsmay represent a set of web pages. The web pages may include an enterprise's internal web pages and/or external web pages. In such cases, there may be a document (or file) per web page. Where the documents represent web pages the documents may be in HTML (Hyper Text Markup Language) format, or they may be in a different format, such as a markdown format. In some case, the documents may be received at the data ingestorin an original format (e.g., HTML format) and converted, by a format converter (not shown) to another format, such as a markdown format. Converting a document in HTML format to a markdown format removes HTML-related characteristics that are not relevant to human understanding which may help the LLMs,from misinterpreting the HTML code. Thus, markdown is a simpler format, vs HTML, that may help improve an LLM's understanding of the document. Where the received documents are converted to another format at the cloud-based computing cluster, the set of documents may be stored in the document repositoryonly the converted format or both the original format (e.g., HTML) and the converted format.

206 206 The document repositoryis a storage device or set of storage devices that can be used to store digital or electronic data, including digital or electronic documents. The document repositoryis designed to store the received set of documents but may also be used to store other electronic information or data.

208 212 210 204 208 222 214 220 216 218 214 216 218 214 216 218 214 216 218 216 218 2 FIG. The pipelineis configured to receive a user queryand automatically generate a responsethereto based on the content of the set of documents. The pipelinecomprises a chunking module, a query modification LLM, an information retrieval (IR) system, a re-ranker LLMand a generation LLM. In the example ofthe query modification LLM, the re-ranker LLMand the generation LLMare shown as different LLMs, however, in other examples, two or more of the LLMs,,may be combined. In other words, in other examples, a single LLM may perform the functions described as being performed by two or more of the query modification LLM, the re-ranker LLMand the generation LLM. For example, a single LLM may perform the re-ranker LLMand the generation LLMfunctions.

222 204 224 224 204 224 204 206 204 208 220 218 The chunking moduleis configured to subdivide or partition each document in the set of documentsinto one or more portions or chunks. Each portion or chunkcomprises all or a subset of a document in the set of documents. The process of subdividing a document into smaller portions or chunks may be referred to as chunking. The chunksfor the set of documentsmay be stored in the document repository. Since one or more of the documents may be large, chunking the set of documentsmay help the pipelineextract relevant content and therefore improve both the retrieval performed by the information retrieval systemand the response generation performed by the generation LLM, making them more precise and relevant.

222 216 218 222 222 204 202 222 204 206 In some cases, the chunking modulemay segment the text in a given document into portions or chunks of text. In some cases, semantic chunking is used to segment the text. In other cases, document-based chunking is used to segment the text, which identifies and uses a structure of a document—e.g., headers, paragraphs or spaces. Other examples of chunking computations include recursive chunking and fixed-sized chunking. For example, the chunks may be selected so not to exceed a certain size so as to fit within the context window of the re-ranker LLMand/or the generation LLM. In other examples, combinations of these chunking methods may be used. Other currently known and future known chunking computations can be used by the chunking module. The chunking modulemay be configured to receive the set of documentsfrom the data ingestoror the chunking modulemay be configured to retrieve the set of documentsfrom the document repository.

214 212 226 220 The query modification LLMis used to perform query expansion on a user queryto generate a modified query. Query expansion is a technique in which a query is changed or modified to include additional information to improve the quality of the query. Query expansion can overcome issues with the original query such as, but not limited to, missing keywords, ambiguity or specificity. By incorporating terms and concepts that did not exist in the original query, query expansion can more clearly capture the meaning and context of the user's request which can result in more relevant documents being retrieved by the information retrieval system.

214 212 228 214 212 226 228 214 212 228 214 212 Specifically, the query modification LLMreceives the user queryand a query modification (QM) promptwhich instructs the query modification LLMto generate synthetic information related to the user query. A modified queryis then generated from the synthetic information. The query modification promptmay be configured to instruct the query modification LLMto generate any suitable synthetic information related to the query. For example, in some cases, the query modification promptmay be configured to instruct the query modification LLMto generate a set of keywords for the query. An example of such a prompt is shown below.

Provide a set of keywords for the following query: {query}

228 214 212 212 212 In other cases, the query modification promptmay be configured to instruct the query modification LLMto: generate a passage that answers the user query, wherein the synthetic information is the passage; provide a concise rationale to the user queryand think step by step, wherein the synthetic information is the rationale; or generate an answer to the user queryand give the rational wherein the rationale is the synthetic information.

214 228 212 214 212 220 212 212 228 214 228 214 228 214 212 In yet other cases, the query modification LLMmay be provided with additional information that aids in generating the synthetic information. For example, in some cases, prior to providing the query modification promptand the queryto the query modification LLM, the querymay be provided to the information retrieval systemto retrieve the document closest to the query. Then, the query, the retrieved document, and a query modification promptis provided to the query modification LLM, wherein the query modification promptinstructs the query modification LLMto generate the synthetic information (e.g., keywords, passage, rationale) given the context of the returned document. It will be evident that these are examples only and that the query modification promptmay be configured to instruct the query modification LLMto generate any suitable synthetic information related to the original user query. The inventors have determined that generating a set of keywords words well in many cases.

226 212 214 212 214 214 212 226 In some cases, the modified user queryis generated from the generated synthetic information by combining the original user queryand the synthetic information generated by the query modification LLM. For example, the original user queryand the synthetic information generated by the query modification LLM(e.g., the keywords, passage or rationale generated by the query modification LLM) may be concatenated. In other cases, the modified user query is generated by replacing the original user querywith the synthetic information. In other words, in these cases, only the synthetic information forms part of the modified user query.

228 214 212 214 226 In some cases, the query modification promptcauses the query modification LLMto generate the modified query from the generated synthetic information. However, in other examples, another module, such as a modified query generation module (not shown) may be configured to receive the original user queryand the synthetic information generated by the query modification LLMand generate the modified user querytherefrom.

212 230 212 190 232 230 212 234 190 234 212 300 302 302 304 130 3 FIG. The user querymay be received from a user via, for example a user interface. In some cases, the user queryis provided by a client devicethat is connected over a data communication linkto the user interface. For example, a user may input a queryvia a web browseror some other application that operates on the client device. In particular, when the user accesses a certain web page via the web browser, they may be provided with a text field or the like where the user can enter the query.illustrates an example web pagewhich comprises a text fieldin which the user can input their query. Once the user has entered their query in the text field, the user can press or otherwise activate the submit buttonto send the query to the cloud-based computing clusterfor processing.

LLMs are a class of machine learning models that have been trained on massive amounts of data so that they can understand and generate natural language.

214 214 The query modification LLMmay be implemented by any LLM that can generate synthetic data for a query. Example LLMs which may be used to implement the query modification LLMinclude, but are not limited to, a Microsoft Azure™ Open AI LLM (e.g., a GPT-40, GPT-4 Turbo, GPT-4, or GPT-3.5 Turbo model).

2 FIG. 220 226 236 224 226 Returning to, the information retrieval systemis configured to receive the modified queryand identify and retrieve a set of chunks(from the document chunks) that are relevant to the modified query.

An information retrieval (IR) system is a system that can identify and retrieve documents in a corpus of documents that are relevant to the query by comparing the query (or a representation thereof) to each document (or a representation thereof). An information retrieval system generally starts by creating a search index of the documents in the corpus of documents. Indexing a set of documents is the process of organizing and categorizing documents in a way that makes them easily searchable. The search index generally comprises, searchable fields, which represent information in the documents. There are many different techniques which may be used to index a set of documents.

Once the index has been generated, documents relevant to a query are identified by comparing the query (or a representation of the query) to the searchable fields in the search index; generating a relevance score for the documents based on the comparisons; and selecting one or more documents as being relevant to the query based on the relevance score. For example, the information retrieval system may select the k documents with the best relevance scores.

One example technique for indexing a set of documents is tokenization. In tokenization, a tokenizer divides the text in each field of each document into tokens (e.g., each token may represent a single word) and may discard some characters, such as punctuation. An optional token filter may then be used to manipulate the generated tokens. A token filter may be used to, for example: normalize the token (e.g., all text may be converted to small letters); remove stopwords such as “the”, “and” and “is”; and/or split some tokens (e.g., tokens that represent phone numbers) into smaller tokens. The tokens may then be stored in an inverted index, which allows for fast, full-text search. An inverted index enables full-text search by mapping all of the unique terms to the document in which they were found. As noted above, there may be an inverted index for each searchable field. So, if there is a title search field and a document search field, there may be an inverted index for each field. When the search index is generated via tokenization, documents relevant to a query are identified by performing simple or full text queries on the inverted indexes. This may comprise parsing the query to identify terms and operations. The inverted indexes are then searched to find matching terms and each match is assigned a relevance score. The result set is then sorted based on a relevance score assigned to each matching document. The relevance score may be based on statistical properties of terms that match. For example, in some cases the relevance score (and thus a ranking of) the documents may be determined in accordance with the Best Match 25 (BM25) algorithm. BM25 is a ranking algorithm that ranks a set of documents based on the query terms appearing in each document, regardless of their proximity within the document.

Another example technique which may be used to index a set of documents is vectorization. In vectorization each document (or each chunk of a document) is converted or transformed, by an embedding model, into a plurality of embeddings which are stored as a multi-dimensional vector. The multi-dimensional vector is an array of (floating point) numbers that captures the semantic meaning of the document (or the chunk of a document). In other words, the multi-dimensional vector is a numeric representation of the content of a document. The multi-dimensional vector can be understood as defining a point in multi-dimensional space, and the distance between two vectors indicates the semantic similarity between the respective documents/queries from which the vectors were generated. Different embedding models may generate a different number of embeddings. For example, the text-embedding-ada-002 embedding model generates 1,536 embeddings for each input (e.g., each chunk).

220 2 FIG. Different embedding models are also designed to be good at different tasks. For example, a similarity embedding model is good at capturing the semantic similarity between texts; a text search embedding model, such as text-embedding-ada-002, is good at determining whether a long document is relevant to a short query. Since the objective of the information retrieval systemofis to identify documents/chunks that are relevant to an input query, it may be beneficial to use a text search embedding model, such as, but not limited to, text-embeeding-ada-002.

The generated vectors are stored in the search index as a searchable field. When the search index is generated by vectorization, documents relevant to a query can be identified by converting the query into a plurality of embeddings (i.e., multi-dimensional vector), using the same embedding model used to generate the document/chunk embeddings, and comparing the query multi-dimensional vector to the document/chunk multi-dimensional vectors to find the document/chunk multi-dimensional vectors that are closest to the query multi-dimensional vector. In some cases, similarity metrics can be calculated using the Hierarchical Navigatable Small World (HNSW) algorithm or Exhaustive K-nearest neighbors (KNN).

In some cases, tokenization and vectorization may be used in combination. For example, both tokenized search fields and vectorized search fields may be generated and a search may be performed on both types of fields in parallel. The result for an individual document/chunk may be based on the combination of the text search results and the vector search results.

220 224 204 204 220 226 236 226 220 226 220 2 FIG. 2 FIG. 6 7 9 FIGS.,and Accordingly, the information retrieval systemofis configured to index the chunksgenerated from the documentsusing any suitable method to generate a search index (e.g., tokenization, vectorization, a combination of tokenization and vectorization etc.). The indexing may be performed off-line—i.e., prior to receiving user queries—and may only be performed initially and, optionally, after a change to the set of documents-instead of being performed for each query. The information retrieval systemis then configured to receive the modified queryand identify and retrieve a set of chunksthat are relevant to the modified queryby searching the search index. Specifically, the information retrieval systemis configured to receive a certain number of chunks that are most similar to the modified query. The number of chunks that are retrieved may be configurable. Example information retrieval systems which may be used to implement the information retrieval systemofare described below with respect to.

216 236 212 240 240 216 100 While information retrieval systems are very efficient and effective at organizing and sorting through a large corpus of documents, they may not be able to accurately rank the documents they retrieve. Accordingly, the re-ranker LLMis used to rank the set of chunksretrieved by the information retrieval system based on their relevance to the original user query. It has been shown that LLMs, such as, but not limited to, GPT-3.5 can achieve top zero-shot performance by prompting general LLMs to re-rank documents. A subset of chunksfrom the set of chunks is then selected based on the ranking. For example, the top k ranked chunks may be selected to form the subset, wherein k is an integer greater than 1. Thus, the subset of chunksmay comprise the most relevant k chunks to the original query, according to the re-ranker LLM. In some examples, k may be 3. However, it will be evident that this is just an example. It is noted that the variable k is used numerous times throughout this document as a generally variable and each instance the variable is used, it may be set to a different value. For example, the number of document chunks that are retrieved by the information retrieval systemmay be different than the number of document chunks that are selected after re-ranking.

216 212 236 220 238 216 236 212 216 238 236 Specifically, the re-ranker LLMis provided the original user query, the set of chunksretrieved by the information retrieval systemand one or more re-ranker (RR) promptswhich instruct the re-ranker LLMto rank the set of chunksbased on their relevance to the original user query. The output of the re-ranker LLMin response to the one or more RR promptsis a ranking of the documents in the set of chunks.

238 216 238 216 1 2 238 216 2 3 1 The one or more re-ranker (RR) promptsmay be configured to cause the re-ranker LLMto perform the ranking in any suitable manner. In some cases, the re-ranker prompt(s)may be configured to cause the re-ranker LLMto perform listwise ranking. In listwise ranking the LLM is provided with all of the chunks to be ranked at the same. Each chunk is identified by a unique identifier like [], [], etc. The re-ranker promptthen instructs the re-ranker LLMto generate a ranked permutation of these documents such as []> []> [] The following is an example of a listwise ranking prompt.

The following are passages related to a query {{query}} [1] {{chunk_1}} [2] {{chunk_2}} (more passages) Rank these passages based on their relevance to the query.

238 In other cases, the one or more RR promptsmay be configured to implement pairwise ranking prompting (PRP). PRP has proven to be an efficient method for an LLM to rank a plurality of documents by relevance to a query. As its name suggests, pairwise ranking prompting involves prompting the LLM to compare and rank pairs of documents. The results of the pairwise rankings are then used to generate a final ranking of the documents.

1 2 1 2 2 1 2 1 2 2 1 216 In one implementation of PRP, each document is individually ranked against each other document. A score is then assigned to each document based on the outcome of the pairwise rankings. The scores assigned to the documents are then used to rank the documents. For example, since LLMs may be sensitive to text orders in prompts, for each pair of documents dand d, two rankings may be performed by the re-ranker LLM—i.e., a ranking of dand d, and a ranking of dand d. If both rankings produce a consistent result (e.g., both rankings indicate that d) is more relevant than dto a query) then the identified document may be allocated 1 point and the unidentified document is not allocated any points. In contrast, if the rankings produce inconsistent results (e.g., one ranking indicates that dis more relevant than dto a query, and the other ranking indicates that dis more relevant than dto the query) then each document may be allocated 1 point. The total score for a document may then be the sum of the points allocated to that document. The documents can then be ranked based on their total scores.

2 216 216 216 216 While the described implementation of PRP is simple to implement, is prompt order independent, and has proven to be quite effective, it requires O(N) prompts/calls to the re-ranker LLMper query, where N is the number of documents to be ranked for a query. Accordingly, in some cases PRP may be implemented in another manner. For example, a pairwise sorting algorithm, such as, but not limited, heap sort and bubble sort, may use the output of a pairwise ranking from the re-ranker LLMas a comparator for the sorting algorithm. This reduces the number of prompt/calls to the re-ranker LLMto O (N log N). In another example, a sorting window approach which starts at a bottom of a list and compares and swaps documents with a stride of 1 based on the output of a pairwise ranking from the re-ranker LLM.

216 216 216 Causing the re-ranker LLMto rank a pair of documents (A, B) with respect to a query (Q) may comprise providing the re-ranker LLMwith a pair ranking few-shot prompt that comprises one or more example (Q, A, B, answer) quadruples, and instructions for the re-ranker LLMto determine whether A or B is more relevant to Q. An example pair ranking few-shot prompt is shown below.

Given the following question and documents, please generate which document is more relevant for answering the query. The output should be only A or B. Query : {{Example Query}} Document A : {{Example Document A}} Document B : {{Example Document B}} Answer : {{A or B}} Now your turn : Query : {{Synthetic Query}} Document A : {{ Document A}} Document B : {{ Document B}} Answer : {{A or B}}

216 216 216 There are benefits and drawbacks related to each ranking technique described above. For example, pairwise ranking can be performed efficiently since the pairwise rankings can be performed in parallel, but performing a comparison between each document pair can be computationally expensive. Furthermore, since in pairwise ranking the re-ranker LLMonly considers two documents at a time without information about the other documents it may not be able to effectively rank all the documents. In contrast, listwise ranking allows the re-ranker LLMto see all the documents at the same time, but a re-ranker LLMmay struggle to perform listwise ranking on larger sets of documents. Testing has shown that listwise ranking can be effectively performed by closed-source LLMs, such as, but not limited to GPT-4.

238 216 238 216 10 FIG. In other cases, the one or more RR promptsmay be configured to cause the re-ranker LLMto perform the ranking in another manner. For example, the one or more RR promptsmay be configured to cause the re-ranker LLMto perform pointwise ranking. See also the examples provided in relation to.

238 216 240 216 240 In some cases, the one or more re-ranker promptsmay cause the re-ranker LLMto, in addition to ranking the documents, select the subset of chunksbased on the ranking. However, in other examples, another module, such as a subset selection module (not shown) may be configured to receive the ranking of the set of chunks generated by the re-ranker LLMand select the subset of chunksbased on the ranking.

216 216 216 216 As noted above, LLMs are a class of machine learning models that have been trained on massive amounts of data so that they can understand and generate natural language. The re-ranker LLMmay be implemented by any LLM that can perform re-ranking of a set of passages. In some cases, the re-ranker LLMmay be implemented by a Microsoft Azure™ Open AI LLM (e.g., a GPT-40, GPT-4 Turbo, GPT-4, or GPT-3.5 Turbo model). In some cases, the LLM used to implement the re-ranker LLMmay be selected based on the ranking technique implemented. For example, GPT-4 has proven to perform pairwise ranking efficiently. In some cases, the re-ranker LLMmay be an LLM that has been specifically trained or fine-tuned for re-ranking.

240 236 218 210 212 240 218 240 212 242 218 210 212 240 210 212 242 Once the subset of chunkshas been selected from the ranking of the set of chunks, the generation LLMis used to generate a responseto the original querybased on the subset of chunks. Specifically, the generation LLMis provided with the subset of chunks, the original queryand a generation (GEN) promptwhich instructs the generation LLMto generate a responseto the original user querybased on the subset of chunks. The responsemay be free-form text that attempts to answer the original user query. An example generation promptis shown below.

Given the following query and passages, please generate a summarized response to the query using the text of the passages. Keep your answer grounded in the facts of the passages. Query: {query} Passage 1: {chunk 1} Passage 2: {chunk 2} Passage 3: {chunk 3}

210 218 210 190 230 212 234 190 210 212 234 400 300 302 304 212 400 300 402 210 218 212 210 210 210 210 210 404 4 FIG. 3 FIG. 3 FIG. 4 FIG. The responsegenerated by the generation LLMmay be provided to a user (e.g., the user that input the original query). In some cases, the responseis provided to a client devicevia the user interface. For example, in response to the user inputting the original user queryin a web browseror some other application that operates on the client device, the responseto the querymay be provided to the web browser, e.g., via a web page.illustrates an example web page, which represents the web pageofafter the user has submitted, via text fieldand button, an original queryrelated to how to transfer $140,000 from a client's checking account to her savings account. It can be seen that the web page, relative to the web pageof, comprises an additional response windowin which the responsegenerated by the generation LLMto the original queryis displayed. In this example, the responseprovides step by step instructions on how to implement the transfer. In some cases, the responsemay comprise a list of documents which were relied on to generate the response—in other words, the responsemay comprise citations. For example, the responseshown inlists a single reference document—i.e., document “EBKM231147403918758.md”.

210 500 400 404 500 502 402 502 5 FIG. 4 FIG. 5 FIG. In some cases, each reference document may by hyperlinked in the responsesuch that if the user clicks on, or otherwise selects the reference document, they will be presented with the full text of the reference document. For example,illustrates an example of a web page, which represents the web pageofafter the used has clicked on the single reference document. It can be seen that the web pageofcomprises an additional reference document windowwhich displays the content of the reference document (i.e., document “EBKM231147403918758.md”). In some cases, where the documents are originally in one format (e.g., HTML) and are converted into another format (e.g., markup format) before they are chunked and imported into the information retrieval system, when the user clicks on a reference document in the response window, they may be first presented with the document in the converted format (e.g., markup format) in the reference document windowwith an option to view the document in its original format (e.g., HTML).

210 212 210 210 212 210 212 210 210 Once the user has received the responseto the query, the user may review the response(and optionally the citations) to determine if the responseprovides an acceptable and/or appropriate answer to the query. If the user determines that the responsedoes not provide an acceptable and/or appropriate answer to the querythe user may reformulate the query and resubmit the query to the cloud-based computing device for processing, or the user may manually search the corpus of documents for an answer to the query. If, however, the user determines that the responseis acceptable and/or appropriate, the user may take an action based on the response. For example, if the response provides information on how to resolve a customer query, the user may instruct the customer on how to resolve the query based on the response, or the user may provide the information in the response to another person (e.g., another employee of the enterprise to which the user is associated) who may then instruct the customer on how to resolve the customer's query.

210 2 FIG. In some cases, prior to providing the responseto the user, an LLM (one of the LLMs inor a different LLM) may be used to determine whether the response is supported by documents corresponding to the subset of chunks.

6 FIG. 2 FIG. 2 FIG. 6 FIG. 2 FIG. 6 FIG. 600 220 600 208 600 208 600 Reference is now made towhich illustrates a first example information retrieval system, which may be used to implement the information retrieval systemof. While the information retrieval systemmay form part of the pipelineof, the information retrieval systemofmay also be used independently from the other components of the pipelineof. The information retrieval systemofmay be implemented by one or more processors of one or more computers.

600 602 604 602 600 606 608 610 6 FIG. 6 FIG. 6 FIG. The information retrieval systemofis configured to receive a queryand identify and retrieve a set of items(which are shown as chunks in) related to that query. The information retrieval systemofcomprises an index engine, a data storeand a search engine.

606 612 614 614 614 600 208 614 224 204 222 612 612 614 610 614 612 612 606 612 612 608 6 FIG. 2 FIG. The index engineis configured to generate a search indexfor itemsthat are to be searched. The itemsmay represent a knowledge base of content that can be used to answer queries. The itemsmay be, for example, documents or chunks of documents. Where the information retrieval systemofis used in the pipelineof, the itemsmay be chunksgenerated from the corpus of documentsby the chunking module. As described above, generating a search index, which may also be referred to as indexing, a set of items is the process of organizing and categorizing the items in a way that makes them easily searchable. The search indexcomprises, searchable fields and optionally non-searchable fields, which represent information in or about the items. A searchable field is a field that is searched by the search engineto identify relevant items whereas a non-searchable field comprises other information about the item, such as, for example, information identifying the item (e.g., a unique item ID) or, where the item is a chunk, information identifying the document (e.g. a unique document ID) which the chunk forms part of. As described above, there are many different techniques for generating a search index for a set of items. For example, a search index may be generated by tokenization, vectorization or a combination of tokenization and vectorization. Where the search indexis generated by vectorization the search indexmay comprise, for each item, a searchable vector field which comprises a multi-dimensional vector that represents the item. Once the index enginehas generated the search index, the search indexmay be stored in the data store.

610 602 612 604 602 600 208 602 226 604 236 610 604 602 602 612 602 610 6 FIG. 2 FIG. The search engineis configured to receive a queryand search the search indexto identify a set of itemsthat are relevant to the query. Where the information retrieval systemofis used in the pipelineof, the queryis the modified queryand the set of itemsis a set of chunks. The search engineis configured to identify the set of itemsby comparing the query(or a representation of the query) to the searchable fields in the search index; generating a relevance score for items based on the comparisons; and selecting one or more items as being relevant to the querybased on the relevance scores. For example, the search enginemay select the k items with the best relevance scores, wherein k is an integer greater than or equal to 1.

610 612 612 606 612 610 How the search enginecompares a query to the searchable fields and generates a relevance score therefrom depends on how the search indexwas generated. For example, as described above, where the search indexis generated by the index enginethrough tokenization such that the search indexcomprises an inverted index for each field, documents relevant to a query are identified by performing simple or full text queries on the inverted indexes. This may comprise parsing the query to identify terms and operations. The inverted indexes are then searched to find matching terms and each matching is assigned a relevance score. The result set is then sorted based on a relevance score assigned to each matching document. The relevance score may be based on statistical properties of terms that match. For example, the search enginemay be configured to identify and retrieve the k most relevant chunks in the set of chunks according to a ranking algorithm such as, but not limited to, Best Match 25 (BM25), wherein k is an integer greater than 1. BM25 is a ranking algorithm that ranks a set of documents/chunks based on the query terms appearing in each document/chunk, regardless of their proximity within the document.

612 612 614 In contrast, where the search indexis generated through vectorization such that the search indexcomprises a multi-dimensional vector for each item, items relevant to a query can be identified by converting the query into multi-dimensional vector, using the same embedding model used to generate the item multi-dimensional vectors, and comparing the query multi-dimensional vector to the item multi-dimensional vectors to find the items with the multi-dimensional vectors that are closest to the query multi-dimensional vector. In some cases, the most similar vector can be found through Hierarchical Navigatable Small World (HNSW) algorithm or Exhaustive K-nearest neighbors (KNN).

612 606 612 610 Where the search indexis generated by the index enginevia tokenization and vectorization such that the search indexcomprises at least one token-based search field and at least one vector search fields, the search enginemay perform a search on both types of fields in parallel and the result for an individual item may be based on the combination of the text search relevance score assigned to that item and the vector search relevant score assigned to that item.

610 604 610 604 610 604 604 608 612 614 610 602 608 610 616 614 610 604 616 The search performed by the search engineidentifies (e.g., via unique item numbers) the itemsthat are most relevant to the query. In some cases, the search enginemay simply output information that identifies the itemsthat are most relevant to the query. In other cases, the search enginemay retrieve the identified itemsand provide those itemsto the query requestor. In some cases, the data storemay be configured to store, in addition to the search index, a copy of the itemsand the search enginemay be configured to retrieve the identified items (i.e., those identified as being most relevant to the query) from the data store. In other cases, the search enginemay have access to an item repositorywhere the itemsare stored, and the search enginemay be configured to retrieve the identified itemsfrom the item repository.

7 FIG. 2 FIG. 2 FIG. 7 FIG. 2 FIG. 7 FIG. 700 220 700 208 700 208 700 Reference is now made towhich illustrates a second example information retrieval system, which may be used to implement the information retrieval systemof. While the information retrieval systemmay form part of the pipelineof, the information retrieval systemofmay also be used independently from the other components of the pipelineof. The information retrieval systemofmay be implemented by one or more processors of one or more computers.

700 600 700 702 704 700 706 708 710 706 712 706 712 718 710 718 712 7 FIG. 6 FIG. The information retrieval systemofis similar to the information retrieval systemofin that the information retrieval systemis configured to receive a queryand retrieve a set of items(from a knowledge base) related to that query; and the information retrieval systemcomprises an index engine, a data storeand a search engine. However, instead of the index enginebeing configured to generate a single indexfor searching the knowledge base, the index engineis configured to generate multiple search indexes,for searching the knowledge base; and the search engineis configured to first perform a search on one search index, and then perform a filtered search on the other search indexbased on the results of the first search.

720 222 714 720 714 722 714 722 716 700 714 722 720 714 722 716 2 FIG. Specifically, instead of a corpus of documentsrepresenting a knowledge base being subdivided (e.g., by a chunking module, such as the chunking moduleof) into a single set of chunks, the corpus of documentsare subdivided into a first set of chunkswith a first size (or a first maximum size), and a second, separate, set of chunkswith a second size (or a second maximum size) which is larger than the first size. The first and second sets of chunks,may be stored in an item/document repositorywhere they can be accessed by the information retrieval system. Each small chunkand each large chunkcorresponds to a document. Each small chunkand each large chunkmay be stored in an item/document repositoryalong with information identifying the corresponding document (i.e., the document that the chunk was generated from).

706 712 718 714 722 706 712 714 718 722 712 718 714 722 712 718 The index engineis then configured to generate a search index,for each set of chunks,. Specifically, the index engineis configured to generate a first search indexfor the set of smaller chunksand generate a second search indexfor the set of larger chunks. Each search index,comprises, searchable fields and optionally non-searchable fields, which represent information in or about the corresponding chunks,. Preferably, each search index,comprises one or more non-searchable fields which uniquely identify each chunk and each document that chunk is associated with.

220 600 714 722 712 718 712 718 712 718 714 722 712 718 712 718 706 712 718 712 718 708 2 FIG. 6 FIG. As described above with respect to the information retrieval systemofand the information retrieval systemof, there are many different techniques for generating a search index for a set of items (e.g., chunks),. For example, a search index may be generated by tokenization, vectorization or a combination of tokenization and vectorization. Any of the described techniques, or any other known technique may be used to generate the search indexes,. Where the search indexes,are generated by vectorization, each search index,may comprise, for each chunk in the corresponding set of chunks,, a searchable vector field which comprises a multi-dimensional vector that represents the chunk. Preferably, the two search indexes,are generated by the same technique—e.g., both search indexes,are generated through the tokenization technique or both are generated through the vectorization technique. Once the index enginehas generated the search indexes,the search indexes,may be stored in the data store.

710 702 712 718 704 714 702 700 208 702 226 714 236 7 FIG. 2 FIG. The search engineis configured to receive a queryand perform a multi-stage search on the two search indexes,to identify chunksin the first set of chunks(i.e., small chunks) that are relevant to the query. Where the information retrieval systemofis used in the pipelineof, the queryis the modified queryand the first set of chunksis the set of chunks.

710 718 722 722 702 712 714 714 702 718 722 Specifically, the search engineis configured to perform a first search on the second search index(i.e., the search index for the set of large chunks) to identify chunks in the second set of chunks(i.e., large chunks) that are relevant to the query; and then perform a second, filtered, search on the first search index(i.e., the search index for the set of small chunks) to identify chunks in the first set of chunks(i.e., small chunks) that are relevant to the query, wherein the filter criteria are selected based on the results of the first search (i.e. the results of the search performed on the search indexfor the second set of chunks).

710 714 718 702 712 712 In some cases, the filtered criteria for the filtered search may be selected so that the search engineonly searches for chunks in the first set of chunks(i.e., small chunks) that correspond to a document that was identified in the first search. Specifically, the first search (the search performed on the search indexfor the large chunks) identifies large chunks relevant to the query. Each of the identified large chunks will have a corresponding document. The unique documents that correspond to at least one identified large chunk forms a set of relevant documents. The filter criteria may then be configured so that the second search (the search performed on the search indexthat corresponds to the small chunks) is limited to the small chunks that correspond to a document in the set of relevant documents identified by the first search. Accordingly, the second, filtered, search performed on the first search indexmay be performed by filtering on the document IDs of the relevant documents identified by the first search.

8 FIG. 718 722 802 2 1 3 1 6 2 3 5 804 804 1 2 5 712 714 804 1 2 5 806 1 2 5 1 3 6 1 3 5 6 2 2 3 5 For example, as shown in, a first search may be performed on the second search index(the search index for the large chunks) to identify large chunks that are related to a query. The first search may identify a set of large chunks—e.g., large chunkof document, large chunkof document, large chunkof documentand large chunkof document. A set of relevant documentsmay then be identified from the identified set of large chunks. As noted above, a relevant document is a document which corresponds to at least one of the large chunks identified by the first search. In this example, the relevant documentsare documents,andsince each of these documents corresponds to at least one large chunk identified by the first search. A second search may then be performed on the first search index(the search index for the small chunks) with a filter that only includes the relevant documents(i.e., documents,and). The second search may identify a set of small chunksin the relevant documents—i.e., small chunks in documents,and. For example, the second search may identify small chunks,andin document, small chunks,andin document, and small chunksandin document.

This two-phase search combines advantages of large and small chunking methods. Specifically, using larger chunks may result in better recall and using smaller chunks may result in better precision. Precision measures how often a model or system makes correct positive predictions. Precision can be calculated by dividing the number of correct positive predictions (true positives) by the total number of instances the model predicted as positive (both true and false positives) as shown in equation (1) where TP is the number of true positives, TN is the number of true negatives, FP is the number of false negatives, and FN is the number of false negatives. Recall, which may also be referred to as sensitivity or the true positive rate (TPR), measures how often a model or system identifies positive instances from the actual positive samples in the dataset. Recall can be calculated by dividing the number of true positives by the number of positive instances (true positives+false negatives) as shown in equation (2).

710 702 702 712 718 714 722 The search engineis configured to perform each of the first and second searches by comparing the query(or a representation of the query) to the searchable fields in the corresponding search index,; generating a relevance score for chunks in the corresponding set of chunks,based on the comparisons;

714 722 702 710 and selecting one or more of the chunks in the corresponding set of chunks,as being relevant to the querybased on the relevance scores. For example, the search enginemay select the k documents with the best relevance scores, wherein k is an integer greater than or equal to 1.

710 712 718 712 718 220 600 2 FIG. 6 FIG. How the search enginecompares a query to the searchable fields in a search index,and generates a relevance score therefrom depends on how the search index,was generated. Different methods which can be used for different search indexes were described above with respect to the information retrieval systemofand the information retrieval systemof. For example, as described above, where a search index is generated through tokenization such that the search index comprises an inverted index for each searchable field, chunks relevant to a query are identified by performing simple or full text queries on the inverted indexes. This may comprise parsing the query to identify terms and operations. The inverted indices are then searched to find matching terms-thus the chunks that comprises the matching terms. A document that has one or more matching terms is assigned a relevance score according to, for example, a ranking algorithm, such as, but not limited to, BM25.

In contrast, as described above, where a search index is generated through vectorization such that the search index comprises a multi-dimensional vector for each chunk, chunks relevant to a query can be identified by converting the query into a multi-dimensional vector, using the same embedding model used to generate the multi-dimensional vectors for the chunks, and comparing the query multi-dimensional vector to the chunk multi-dimensional vectors to find the chunks with the multi-dimensional vectors that are closest to the query multi-dimensional vector (using, for example HNSW or KNN).

710 702 712 718 Also, as described above, where a search index is generated via tokenization and vectorization such that the search index comprises tokenized search fields and vector search fields, the search enginemay perform searches on both types of fields in parallel and the result for an individual chunk may be based on the combination of the text search score assigned to that chunk and the vector search score assigned to that chunk. Any of the methods described above, or any other known method, can be used to compare a queryto the searchable fields in a search index,.

710 704 702 710 704 710 704 704 708 712 718 714 710 704 708 710 716 714 710 704 716 The search performed by the search engineidentifies (e.g., via unique chunk numbers) a set of small chunksthat are most relevant to the query. In some cases, the search enginemay simply output information that identifies the set of small chunksdeemed to be most relevant to the query. In other cases, the search enginemay retrieve the identified small chunksand output those small chunks. In some cases, the data storemay be configured to store, in addition to the search indexes,, a copy of the set of small chunksand the search enginemay be configured to retrieve the identified small chunksfrom the data store. In other cases, the search enginemay have access to a document repositorywhere the small chunksare stored, and the search enginemay be configured to retrieve the identified small chunksfrom the document repository.

700 208 220 704 700 216 218 218 240 216 216 700 218 3 216 808 700 1 1 3 3 6 2 810 1 2 812 806 7 FIG. 2 FIG. 2 FIG. 2 FIG. 8 FIG. 8 FIG. Where the information retrieval systemofis used in the pipelineof—i.e., it is used to implement the information retrieval systemof—the small chunksidentified by the information retrieval systemmay be provided to the re-ranker LLMfor ranking. As described above, a subset of the small chunks may then be selected based on the ranking and forwarded to the generation LLM. In some cases, as described above, the subset of small chunks that are forwarded to the generation LLM(i.e., the subset of chunksin) may be the k chunks with the highest ranking according to the re-ranker LLMwhere k is an integer greater than or equal to 1. However, in other cases, the ranking performed by the re-ranker LLMmay be first used to identify the top documents, and then all or a portion the small chunks identified by the information retrieval systemfor those top documents may be provided to the generation LLM. In some cases, the top documents may be the documents that are related to the top x (e.g., top) ranked small chunks according to the re-ranker LLM. For example, as shown in, if the re-ranker identifies (at) that the top three small chunks of the small chunks identified by the information retrieval systemare small chunkof document, small chunkof documentand small chunkof documentthen the top documents (at) may be documentsand. Then (at) all or a portion of the chunks in the results of the second search (at) may form the subset. Specifically, in the example shown in, all of the chunks in the results of the second search for the top documents are selected to form the subset. However, in other cases, the top documents may be identified from the ranking of the small chunks in another manner.

9 FIG. 2 FIG. 2 FIG. 9 FIG. 2 FIG. 9 FIG. 6 FIG. 9 FIG. 900 220 900 208 900 208 900 600 900 902 904 914 900 906 908 910 906 912 914 910 922 924 Reference is now made towhich illustrates a third example information retrieval system, which may be used to implement the information retrieval systemof. While the information retrieval systemmay form part of the pipelineof, the information retrieval systemofmay also be used independently from the other components of the pipelineof. The information retrieval systemofis similar to the information retrieval systemofin that the information retrieval systemis configured to receive a queryand retrieve a set of items(from a collection of itemsthat form a knowledge base) that are related to that query; and the information retrieval systemcomprises an index engine, a data storeand a search engine. However, the index engineofis specifically configured to generate a vector search indexfor a collection of items(e.g., a collection of chunks) wherein the vector search index comprises multiple vectors per item (e.g., per chunk); and the search engineis configured to perform a multi-vector search on the vector search index to identify items that are relevant to a query. One of the vectors for an item is generated from (and represents) the item (e.g., chunk) itself, and at least one of the other vectors for an item is generated from (and represents) a piece of synthetic information,generated by an LLM for that item (e.g., chunk).

926 914 926 928 926 922 924 926 928 928 928 926 Write a summary for the given passage: {chunk} Specifically, a synthetic generation LLMis used to generate at least one piece of synthetic information for each item to be searched (e.g., each chunk in the collection of chunks). This may comprise providing each item (e.g., each chunk) to the synthetic generation LLMalong with a synthetic generation promptthat instructs the synthetic generation LLMto generate a piece of synthetic information,related to the item (e.g., chunk). The piece of synthetic information that the synthetic generation LLMis instructed to generate by the synthetic generation promptmay comprise a summary of the item (e.g., chunk), keywords for the item (e.g., chunk), and one or more questions that can be answered by the item (e.g., chunk). The synthetic generation promptmay be a zero-shot prompt or a few shot prompt. An example zero-shot synthetic generation promptwhich may be used to instruct the synthetic generation LLMto generate a summary of an item (e.g., chunk) is shown below.

928 926 926 926 An example few-shot synthetic generation promptwhich may be used to instruct the synthetic generation LLMto generate a query that can be answered by an item (e.g., chunk) is shown below. The example prompt induces the synthetic generation LLMto generate a query that algins with (e.g., is in the same format and style as) the example document-query pairs. Generally, the higher the quality and more diverse the example document-query pairs, the more likely the synthetic generation LLMwill generate relevant and informative queries.

Please ask a good and specific question that can be answered with the given passage. Document 1: {{Example Passage 1}} Query 1 {{Example Query 1}} Document 2: {{Example Passage 2}} Query 2: {{Example Query 2}} Now it is your turn: Document 3: {{Passage}} Query 3:

914 920 920 926 928 926 922 922 926 Where the items that are searched are chunkswhich are generated from a corpus of documents, a piece of synthetic information for each chunk may be generated by providing each document of the corpus of documentsto the synthetic generation LLMalong with a synthetic generation promptthat instructs the synthetic generation LLMto generate a summaryof the document. A document summarygenerated by the synthetic generation LLMcan be used as a piece of synthetic information for each chunk that was generated from that document. For example, if a document is sub-divided into five chunks, then the summary of that document can be used as a piece of synthetic information for each of the five chunks.

922 924 914 922 924 916 914 922 924 914 914 922 924 916 914 Once one or more pieces of synthetic information,has/have been generated for each item (e.g., each chunk), the piece(s) of synthetic information,may be stored in a document repositoryalong with the items (e.g., chunks). In some cases, the synthetic information,may be stored separately from the items (e.g., chunk)but with information that links each piece of synthetic information with its corresponding item (e.g., chunk). For example, each piece of synthetic information,may be stored in the document repositoryalong with information identifying the corresponding item (e.g., chunk).

926 926 900 220 208 926 214 216 218 208 926 214 216 218 2 FIG. As noted above, LLMs are a class of machine learning models that have been trained on massive amounts of data so that they can understand and generate natural language. The synthetic generation LLMmay be implemented by any LLM that can generate synthetic data for a passage. In some cases, the synthetic generation LLMmay be implemented by a Microsoft Azure™ Open AI LLM (e.g., a GPT-40, GPT-4 Turbo, GPT-4, or GPT-3.5 Turbo model). When the information retrieval systemis used to implement the information retrieval systemofsuch that it forms part of the pipeline, the synthetic generation LLMmay be combined with one of the other LLMs,,in the pipeline. In other words, a single LLM may be used to perform the functions of the synthetic generation LLMand the query modification LLM, the re-ranker LLM, and/or the generation LLM.

906 912 914 914 906 914 914 912 914 912 914 922 924 912 914 914 9 FIG. The index engineis configured to generate a vector search indexfor the items (e.g., chunks)that comprises multiple vectors per item (e.g., chunk). Specifically, the index engineis configured to, for each item (e.g., each chunk), convert, using an embedding model, that item (e.g., chunk)into a set of embeddings (i.e., a multi-dimensional vector) and each piece of synthetic information for that item (e.g., chunk)into a set of embeddings (i.e., a multi-dimensional vector). Each multi-dimensional vector is stored in the vector search indexas a searchable field. The number of multi-dimensional vectors for each item (e.g., chunk)in the vector search indexwill depend on the number of different pieces of synthetic information generated for each item (e.g., chunk). For example, as shown in, if for each item (e.g., chunk) there is an associated synthetic summaryand an associated synthetic question, then the vector search indexwill comprise, for each item (e.g., chunk)three vectors-a first vector (“V1-Chunk”) that represents the item (e.g., chunk)itself, a second vector (“V2-SUMM”) that represents the summary, and a third vector (“V3-QUES”) that represents the question. It will be evident that this is an example only and that there may be only one piece of synthetic information associated with each item (e.g., chunk). Each type of vector may be said to be in a different vector field.

912 912 906 912 912 908 The vector search indexmay also comprise one or more non-searchable fields. For example, where the items that are to be searched are chunks, the vector search indexmay also comprise one or more non-searchable fields which uniquely identify each chunk and its corresponding document. Once the index enginehas generated the vector search index, the vector search indexmay be stored in the data store.

910 902 912 904 900 208 902 226 914 236 9 FIG. 2 FIG. The search engineis configured to receive a queryand perform a multi-vector search on the vector search indexto identify a set of items (e.g., chunks)relevant to the query. Where the information retrieval systemofis used in the pipelineof, the queryis the modified queryand the items (e.g., chunks)is the set of chunks.

910 910 902 902 910 912 Performing a multi-vector search means that there are multiple vectors for each item (e.g., chunk) to be searched, and the search enginetakes each vector associated with an item (e.g., chunk) into account in determining which are the most relevant items (e.g., chunks) to a query. The search engineis configured to perform the multi-vector search by first converting, using the same embedding model used to generate the vectors for the items (e.g., chunks), the queryinto a plurality of embeddings (i.e., into a multi-dimensional vector) that mathematically represents the semantic meaning of the query. The search enginethen compares the multi-dimensional vector for the query to the multi-dimensional vectors in all vector search fields of the vector search indexto identify the items (e.g., chunks) that are most relevant to the query.

9 FIG. In some cases, this may comprise performing a separate vector search on each vector field to identify the k items (i.e., chunks) with multi-dimensional vectors in that field that are closest to the query multi-dimensional vector; and then combining the results of the different vector searches. For example, if, as shown in, there are three vector fields, then a vector search may be performed on the first vector field (“V1-Chunk”) to identify a first set of k items (i.e., chunks) with multi-dimensional vectors in that field that are closest to the query multi-dimensional vector, wherein items (e.g., chunks) are ranked based on their closeness; a vector search may also be performed on the second vector field (“V2-SUMM”) to identify a second set of k items (i.e., chunks) with multi-dimensional vectors in that field that are closest to the query multi-dimensional vector, wherein the items (i.e., chunks) are ranked based on their closeness; and a vector search may also be performed on the third vector field (“V3-QUES”) to identify a third set of k items (i.e., chunks) with multi-dimensional vectors in that field that are closest to the query multi-dimensional vector, wherein the items (i.e., chunks) are ranked based on their closeness. The set of k items (i.e., chunks) with multi-dimensional vectors in a particular field that are closest to the query multi-dimensional vector may be identified using any suitable algorithm such, as but not limited to KNN and HNSW. The distance between multi-dimensional vectors may be measured using any suitable metric such as, but not limited to, cosine angle, Euclidean distance and DotProduct.

Once a vector search has been performed on each vector field, such that there is a ranked list of k items (e.g., chunks) for each vector field, the results of the vector searches are combined to get a final list of k items that are most relevant to the query. In one example, the results may be combined using a re-ranker technique or algorithm, such as, but not limited to, Reciprocal Rank Fusion (RRF) with or without weighted scoring. In RRF each item (e.g., chunk), in a ranked list of k items, is assigned a reciprocal rank score based on its position in the list. The score is calculated as 1/(rank+m), where rank is the position of the items in the list and m is a constant that may be empirically selected. Then, for each item (e.g., chunk), its reciprocal rank scores are combined to get a final combined score. The items are then ranked based on their combined scores. For example, in some cases the combined score for an item (e.g., chunk) may be the sum of its reciprocal scores. In other cases, the reciprocal score for different vector fields may be weighted differently. For example, the ranking for the chunk vector field may be given more weight than the ranking for the summary vector field. In these cases, the combined score for an item (e.g., chunk) may be a weighted sum of its reciprocal scores.

It will be evident to a person of the art that this is an example only and that other techniques or algorithms may be used to combine the results of the vector searches.

For example, in some cases, each item in a ranked list of k items may be assigned a relevance score based on the distance between its multi-dimensional vector and the query multi-dimensional vector and a final relevance score for an item (e.g., chunk) may be generated by combining (e.g., summing) the relevance scores for the item (e.g., chunk).

910 904 902 910 904 902 910 904 904 908 912 914 910 904 908 910 916 914 910 904 916 The multi-vector search performed by the search engineidentifies (e.g., via unique chunk numbers) a set of items (e.g., chunks)that are most relevant to the query. In some cases, the search enginemay simply output information that identifies the set of items (e.g., chunks)deemed to be most relevant to the query. In other cases, the search enginemay retrieve the identified items (e.g., chunks)and output those items (e.g., chunks). In some cases, the data storemay be configured to store, in addition to the vector search index, a copy of the original items (e.g., chunks)and the search enginemay be configured to retrieve the identified items (e.g., chunks)from the data store. In other cases, the search enginemay have access to a document repositorywhere the items (e.g., chunks)are stored, and the search enginemay be configured to retrieve the identified items (e.g., chunks)from the document repository.

900 208 900 220 904 900 216 9 FIG. 2 FIG. 9 FIG. 2 FIG. Where the information retrieval systemofis used in the pipelineof—i.e., the information retrieval systemofis used to implement the information retrieval systemof—the chunksidentified by the information retrieval systemmay be provided to the re-ranker LLMfor ranking.

10 FIG. 10 FIG. 2 FIG. 10 FIG. 10 FIG. 1000 1000 208 1000 1000 Reference is now made towhich illustrates an example response generation systemfor generating a response to a query based on a set of chunks generated from a corpus of documents. The response generation systemofmay be used to implement the backend portion of the pipelineof(i.e., the re-ranking, subset selection and response generation) or the response generation systemofmay be used in another system. For example, the response generation systemofmay be used in another RAG system to perform the response generation.

1000 1002 1004 1002 1006 1002 1004 1008 1010 208 1000 1008 1010 1008 1002 1012 1004 1010 1006 1002 1012 208 1008 1010 10 FIG. 2 FIG. 10 FIG. 2 FIG. Specifically, the response generation systemofis configured to receive a user query, a set of chunks(e.g., generated from documents in a corpus of documents) that have been deemed relevant to the user query(e.g., by an information retrieval system), and generate a responseto the user querybased on the set of chunksusing one or more LLMs,. Like the pipelineof, the systemofcomprises a re-ranker LLMand a generation LLM. The re-ranker LLMis used to rank the set of chunks based on their relevance to the query. A subsetof the set of chunksis then selected based on the ranking. The generation LLMis then used to generate a responseto the querybased on the subset of chunks. However, unlike the pipelinedescribed above with respect, one or more of the re-ranker LLMand the generation LLMis configured to perform their task (e.g., ranking or response generation) via chain-of-thought prompting, which may also be referred to as chain-of-notes prompting. Chain-of-thought (CoT) prompting allows LLMs to solve complex reasoning tasks by instructing the LLM to generate an explanation before the final prediction/output to draw out the reasoning capabilities of LLMs. This forces the LLM to break down a complex problem into intermediate steps.

1008 1004 1002 1008 1004 1002 1014 1008 1004 1002 1004 1002 1014 1008 1014 10 FIG. Specifically, in some examples the re-ranker LLMofmay be used to rank the set of chunksbased on their relevance to the queryby providing the re-ranker LLMwith the set of chunks, the queryalong with a CoT re-ranker (RR) promptthat instructs the re-ranker LLMto, for each chunk in the set of chunks, explain (using the chunk) why that chunk is relevant to the queryand assign a relevance rating thereto, and then rank the set of chunksbased on their relevance to the query. Such a CoT re-ranker promptforces the re-ranker LLMto stay focused on the content of relevance when ranking the chunks. The CoT re-ranker promptmay also specify that the explanation as to why a chunk is relevant is to be limited to the content of the chunk.

1008 1012 1014 1008 1004 1012 1008 1012 1012 2 8 FIGS.and Once the re-ranker LLMhas generated a ranking of the set of chunks, a subset of chunksare selected based on the ranking. In some cases, the CoT re-ranker promptmay cause the re-ranker LLMto both rank the chunksand select the subset of chunksbased on the ranking. However, in other examples, another module, such as a subset selection module (not shown) may be configured to receive the ranking of the set of chunks generated by the re-ranker LLMand select the subset of chunksbased on the ranking. The subset of chunksmay be selected from the ranking in any suitable manner, such as those described above with respect to.

1008 216 As noted above, LLMs are a class of machine learning models that have been trained on massive amounts of data so that they can understand and generate natural language. The re-ranker LLMmay be implemented by any LLM that can perform re-ranking of a set of passages. In some cases, the re-ranker LLMmay be implemented by a Microsoft Azure™ Open AI LLM (e.g., a GPT-40, GPT-4 Turbo, GPT-4, or GPT-3.5 Turbo model).

1012 1010 1006 1002 1012 1010 1012 1002 1010 1004 1002 1016 1016 1010 1006 1002 1012 1016 1010 1012 1002 1006 1016 1010 1006 1010 2 FIG. Once the subset of chunkshas been selected based on the ranking, the generation LLMis used to generate a responseto the querybased on the subset of chunks. In some cases, this may comprise providing the generation LLMthe subset of chunks, the queryand a generation prompt as described above with respect to. However, in other cases, this may comprise providing the generation LLMwith the set of chunks, the queryand a CoT generation (GEN) prompt. The CoT generation promptis configured to cause the generation LLMto generate the responseto the querybased on the subset of chunksthrough a step-by-step process. Specifically, the CoT generation promptinstructs the generation LLMto explain, using the content of the chunk, why each chunk in the subset of chunksis relevant to the queryand assign a relevance rating thereto, and then generate a responseto the query based on the set of chunks. Such a CoT generation promptforces the generation LLMto stay focused on the content of relevance when generating the response. It also forces the generation LLMidentify the relevant sections of the chunks before generating a response.

1010 1010 As noted above, LLMs are a class of machine learning models that have been trained on massive amounts of data so that they can understand and generate natural language. The generation LLMmay be implemented by any LLM that can generate a response to a query using provided passages. In some cases, the generation LLMmay be implemented by a Microsoft Azure™ Open AI LLM (e.g., a GPT-40, GPT-4 Turbo, GPT-4, or GPT-3.5 Turbo model).

1008 1005 1008 216 1004 1010 1006 2 FIG. In other examples, a CoT prompt may not be used to cause the re-ranker LLMto rank the set of chunks(e.g., in contrast the re-ranker LLMmay be used in the same manner as the re-ranker LLMof, as described above, to rank the set of chunks) but a CoT prompt (such as that described above) may be used to cause the generation LLMto generate the response.

11 FIG. 2 FIG. 2 FIG. 11 FIG. 2 FIG. 11 FIG. 9 FIG. 10 FIG. 1100 208 220 216 218 208 1100 208 1100 900 1000 Reference is now made towhich illustrates an example RAG systemwhich can be used in the pipelineof(e.g., it can be used to perform the functions of the information retrieval system, re-ranker LLMand generation LLMof the pipelineof). However, the RAG systemofcan also be used independently from the pipelineofas a stand-alone RAG system. The RAG systemofcan be described as a combination of the information retrieval systemofand the generation systemof.

1100 1102 900 1102 912 914 914 914 914 922 924 904 926 900 1102 11 FIG. 9 FIG. 9 FIG. 11 FIG. The RAG systemofcomprises an information retrieval systemthat is almost identical to the information retrieval systemof. Specifically, the information retrieval systemis configured to perform multi-vector search on a vector search indexthat comprises a plurality of vectors for each item (e.g., chunk)to be searched. One vector for each item (e.g., chunk)is generated from (and represents) the item (e.g., chunk)itself and at least one other vector for each item (e.g., chunk)is generated from (and represents) a piece of synthetic information (e.g., summary, keyword, content),generated for that item (e.g., chunk)by a synthetic generation LLM. Accordingly, all of the comments provided above with respect to the information retrieval systemofequally apply to the information retrieval systemof.

1102 900 904 902 1110 904 1104 1110 904 1104 1110 910 11 FIG. 9 FIG. 11 FIG. 11 FIG. 11 FIG. 9 FIG. The only difference between the information retrieval systemofand the information retrieval systemofis that, in addition to retrieving and outputting a set of items (e.g., chunks) that are relevant to the query, the search engineofalso retrieves and outputs, for each item (e.g. chunk) in the set of items (e.g., chunks), at least one piece of synthetic informationassociated with that item. In the example shown inthe search engineoutputs, for each item (e.g., chunk) in the set of items (e.g., chunks), the synthetic question(s)associated with that item (e.g., chunk), but it will be evident that this is just an example of a piece of synthetic information associated with an item (e.g., chunk) that may be output. The search engineofotherwise operates in the same manner as the search engineof.

1100 1106 1108 1008 1010 1106 904 1102 902 904 1102 1104 902 1106 1112 1106 904 902 1014 1008 1112 1106 904 1104 11 FIG. 10 FIG. 10 FIG. 11 FIG. The RAG systemofalso comprises a re-ranker LLMand a generation LLMwhich are used in a similar manner as the re-ranker LLMand the generation LLMofrespectively. Specifically, the re-ranker LLMis used to rank the set of items (e.g., chunks)retrieved by the information retrieval systembased on their relevance to the queryvia chain-of-thought prompting. This is implemented by providing the set of items (e.g., chunks)retrieved by the information retrieval system, the corresponding synthetic information, and the queryto the re-ranker LLMalong with a chain-of-thoughts (CoT) re-ranker (RR) promptthat instructs the re-ranker LLMto explain why each chunk is relevant to the query and assign a relevance rating thereto, and then rank the set of items (e.g., chunks)based on their relevance to the query. However, in contrast to the CoT re-ranker promptofwhich instructs the re-ranker LLMto generate the explanation from the item (e.g., chunk) itself, the CoT re-ranker promptof, instructs the re-ranker LLMto generate the explanation for an item (e.g., chunk) from the item (e.g., chunk)and the related synthetic information.

1106 904 1114 1112 1106 904 1114 904 1106 1114 1114 2 8 FIGS.and Once the re-ranker LLMhas generated a ranking of the set of items (e.g., chunks), a subset of the items (e.g., chunks)are selected based on the ranking. In some cases, the CoT re-ranker promptmay cause the re-ranker LLMto, in addition to ranking the set of items (e.g., chunks), select the subset of items (e.g., chunks)based on the ranking. However, in other examples, another module, such as a subset selection module (not shown) may be configured to receive the ranking of the set of items (e.g., chunks)generated by the re-ranker LLMand select the subset of items (e.g., chunks)based on the ranking. The subset of items (e.g., chunks)may be selected from the ranking in any suitable manner, such as those described above with respect to.

1106 1106 The re-ranker LLMmay be implemented by any LLM that can perform re-ranking of a set of passages. In some cases, the re-ranker LLMmay be implemented by a Microsoft Azure™ Open AI LLM (e.g., a GPT-40, GPT-4 Turbo, GPT-4, or GPT-3.5 Turbo model).

904 1108 1118 902 1114 1116 1108 1114 1116 902 1108 1114 1116 902 1120 1120 1108 1118 902 1114 1116 1120 1108 904 902 1118 902 1120 1108 904 1108 2 FIG. Once the subset of items (e.g., chunks)has been selected based on the ranking, the generation LLMis used to generate a responseto the querybased on the subset of items (e.g., chunks)and synthetic informationrelated to the subset of items (e.g., chunks). In some cases, this may comprise providing the generation LLMthe subset of items (e.g., chunks), the synthetic informationrelated to the subset of items (e.g., chunks), the queryand a generation prompt as described above with respect to. However, in other cases, this may comprise providing the generation LLMwith the subset of items (e.g., chunks), the synthetic informationrelated to the subset of items, and the queryalong with a CoT generation (GEN) prompt. The CoT generation promptis configured to cause the generation LLMto generate the responseto the querybased on the subset of items (e.g., chunks)and the synthetic informationthrough a step-by-step process. Specifically, the CoT generation promptinstructs the generation LLMto explain, using the item (e.g. chunk) and its related synthetic information, why each item (e.g., chunk) in the subset of items (e.g., chunks)is relevant to the queryand assign a relevance rating thereto, and then generate a responseto the querybased on the set of items (e.g. chunks). Such a CoT generation promptforces the generation LLMto stay focused on the content of relevance when generating the response. It also forces the generation LLMto identify the relevant sections of the chunks before generating a response.

1108 1108 As noted above, LLMs are a class of machine learning models that have been trained on massive amounts of data so that they can understand and generate natural language. The generation LLMmay be implemented by any LLM that can generate a response to a query using provided passages. In some cases, the generation LLMmay be implemented by a Microsoft Azure™ Open AI LLM (e.g., a GPT-40, GPT-4 Turbo, GPT-4, or GPT-3.5 Turbo model).

1106 1106 904 1106 2 FIG. In other examples, a CoT re-ranker prompt may not be provided to the re-ranker LLMto cause the re-ranker LLMto rank the set of items (e.g., chunks). In contrast, a standard re-ranker prompt or set of prompts, as described above with respect to, may be provided to the re-ranker LLMto rank the set of chunks.

11 FIG. 11 FIG. 8 FIG. 900 1100 208 Although inthe query provided to the information retrieval systemis the same as the query used for information generation, in other cases they may be different queries. For example, if the RAG systemofis used in the pipelineof, the query provided to the information retrieval system may be the amended query and the query used for response generation may be the original user query.

12 FIG. 1 2 FIGS.and 6 7 9 FIGS.,and 1200 1200 110 120 130 600 700 900 1200 1202 1204 1206 1208 Reference is now made towhich illustrates a simplified block diagram of an example computer. Computeris an example implementation of a computer which may implement the source database system, EDPP, one or more components of the cloud-based computing clusterof, and/or one or more component of the information retrieval systems,,of. Computerhas at least one processoroperatively coupled to at least one memory, at least one communications interface(also referred to herein as a network interface), and at least one input/output (I/O) device.

1204 1202 1204 The at least one memoryincludes a volatile memory that stores instructions executed or executable by the processor, and input and output data used or generated during execution of the instructions. The memorymay also include non-volatile memory used to store input and/or output data—e.g., within a database-along with program code containing executable instructions.

1202 1206 1208 The processormay transmit or receive data via the communications interfaceand may also transmit or receive data via any additional input/output deviceas appropriate.

1202 1210 1202 310 1212 214 216 218 926 1210 1212 12 FIG. In some cases, the processorincludes a system of central processing units (CPUs). In other cases, the processorincludes a system of one or more CPUsand one or more Graphical Processing Units (GPUs)that are coupled together. For example, any of the LLMs,,,described herein may execute neural network computations on CPU and GPU hardware, such as the system of CPUsand GPUsof.

13 FIG. 2 FIG. 1300 208 1300 1300 1302 1300 1304 Reference is now made towhich illustrates an example methodfor generating a response to a user query which may be executed, for example, by the pipelineof. The methodmay implemented by one or more processors of one or more computers. The methodbegins at blockwhere a user query is received. The methodthen proceeds to block.

1304 214 1300 1306 At block, a first LLM (e.g., query modification LLM) is used to generate synthetic information related to the user query. As described above, using an LLM to generate synthetic information related to a user query may comprise providing the user query and a query modification prompt to the LLM which instructs the LLM to generate the synthetic information related to the user query. Examples of synthetic information which the query modification prompt may instruct the LLM to generate was provided above. For example, the query modification may include; instructions to generate a set of keywords for the query, wherein the query is the synthetic information; instructions to generate a passage that answers the user query wherein the passages is the synthetic information; instructions to provide a concise rationale to the user query and think step by step, wherein the synthetic information is the rationale; or instructions to generate an answer to the user prompt and give the rational, wherein the rationale is the synthetic information. In yet other cases, the LLM may be provided with additional information that aids in generating the synthetic information. For example, in some cases, the query may be first provided to an information retrieval system to retrieve the document in the corpus of documents that is most relevant to the query. Then the query, the retrieved document and a prompt may be provided to the query modification LLM, wherein the prompt comprises instructions to generate the synthetic information (e.g., keywords, passage, rationale) given the context of the returned document. Once the synthetic information for the query has been generated the methodproceeds to block.

1306 1304 1300 1308 At block, a modified query is generated from the synthetic information. In some cases, the modified query is generated by combining the original user query and the synthetic information generated in block. For example, in some cases, the generated synthetic information may be concatenated to the original user query. In other cases, the modified query is generated by replacing the original user query with the synthetic information—i.e., the modified query only comprises the synthetic information. Once the modified query has been generated the methodproceeds to block.

1308 1300 1310 14 15 FIGS.and At block, an information retrieval system is used to retrieve a set of chunks, from a plurality of chunks generated from a corpus of documents, that are relevant to the modified query. In other words, each chunk of the plurality of chunks is all or a portion of a document in the corpus of documents. Example methods for retrieving a set of chunks, from a plurality of chunks generated form a corpus of documents, that are relevant to a query were describe above and are described below with respect to. Once a set of chunks relevant to the modified query have been retrieved, the methodproceeds to block.

1310 116 1308 1300 1312 At block, an LLM (e.g., re-ranker LLM) is used to rank the set of chunks retrieved in block. Using an LLM to rank the set of chunks may comprise providing the LLM with the set of chunks and one or more prompts which cause the LLM to rank the. Example prompts and sets of prompts which can be used to cause an LLM to rank a set of chunks were provided above. Once the set of chunks have been ranked by the LLM, the methodproceeds to block.

1312 1310 1308 1300 1314 At block, a subset of chunks of the set of chunks is selected based on the ranking of the set of chunks generated in block. The term “subset of X” is used herein to mean less than X—i.e., if X has a set of elements, then a subset of X does not have all of the element of X. As described above, in some cases, the top k chunks based on the ranking are selected to form the subset, wherein k is an integer greater than 1. In other cases, the ranking may be used to identify the top documents (e.g., the top documents may the documents associated with the top three ranked chunks) and then all or a subset of the chunks in the set of chunks associated with the top documents may be selected. Once a subset of chunks from the set of chunks retrieved in blockhave been selected, the methodproceeds to block.

1314 218 1302 1312 1300 At block, an LLM (e.g., generation LLM) is used to generate a response to the original user query (the user query received at block) based on the subset of chunks selected in block. Using an LLM to generate a response to the original user query based on the subset of chunks may comprise providing the LLM with the subset of chunks along with a prompt that instructs the LLM to generate a response based on the subset of chunks. As described above, the prompt may instruct the LLM to cite any referenced chunks and/or their corresponding document in the response. Once the response has been generated, the methodmay end.

14 FIG. 7 FIG. 1400 700 1400 1400 1402 222 1400 1404 Reference is now made towhich illustrates an example methodfor retrieving information in a corpus of documents that is relevant to a query which may be executed, for example, by the information retrieval systemof. The methodmay implemented by one or more processors of a computer or a computing system. The methodbegins at blockwhere a first plurality of chunks is generated by sub-dividing each document in the corpus of documents into one or more chunks of a first size to form a first plurality of chunks. Subdividing a document into chunks of the first size does not mean that each chunk has exactly the same size, only that each chunk does not exceed the first size. The documents may be subdivided into chunks of the first size using any suitable method, such as, but not limited to, those described above with respect to the chunking module. Once the first plurality of chunks has been generated the methodproceeds to block.

1404 222 1400 1406 At block, a second plurality of chunks is generated by subdividing each document in the corpus of documents into one or more chunks of a second, larger, size. Subdividing a document into chunks of the second size does not mean that each chunk has exactly the same size, only that each chunk does not exceed the second size. The documents may be subdivided into chunks of the second size using any suitable method, such as, but not limited to, those described above with respect to the chunking module. Once the second plurality of chunks has been generated, the methodproceeds to block.

1406 At block, an information retrieval system is used to identify, from the second plurality of chunks, a set of chunks of the second size that are relevant to the query. Using an information retrieval system to identify a set of chunks of the second size that are relevant to the query may comprise using an index engine of the information retrieval system to generate a search index for the second plurality of chunks and using a search index of the information retrieval system to search the search index for the second plurality of chunks to identify a set of chunks of the second size that are similar to the query. The search index represents the information in the set of chunks in a form that can be easily searched. As described above, there are many ways to generate a search index for a plurality of chunks, such as, but not limited to tokenization, vectorization and a combination of tokenization. Where vectorization is used to generate a search index for the second plurality of chunks, each chunk of the second plurality of chunks is embedded, using an embedding model, into a plurality of embeddings (i.e., a multi-dimensional vector) and each multi-dimensional vector is stored in the search index in a searchable field.

As described above, there are many ways to search a search index for items that are relevant to a query. The method used to search a search index is generally based on the technique or techniques used to generate the search index. For example, as described above, where the search index was generated using vectorization then searching the search index to identify chunks of the second size that are similar to the query may comprise converting (or embedding), using the same embedding model used to generate the vectors in the search index, the query into a plurality of embeddings (i.e., multi-dimensional vector) and identifying (using, for example KNN or HNSW) chunks of the second size that have a multi-dimensional vector that is close to the multi-dimensional vector for the query based on one or more distance metrics (e.g. cosine angle etc.).

1408 1406 At block, the information retrieval system is used to identify, from a subset of the first plurality of chunks, a set of chunks of the first size that are relevant to the query. The subset of the first plurality of chunks is selected based on the set of chunks of the second size identified in block.

1406 1406 1406 1 2 5 1 2 5 1 2 5 8 FIG. In some cases, the subset of the first plurality of chunks are the chunks in the first plurality of chunks that are associated with a relevant document, wherein a relevant document is a document that is associated with at least one chunk in the set of chunks of the second size identified in block. In these cases, the subset may be selected by identifying the document associated with each chunk of the set if chunks of the second size identified in blockand selecting the unique documents of the identified documents of the relevant document, and then selecting the subset to be the chunks in the first plurality of chunks associated with a relevant document. For example, as shown in, if all the chunks in the set of chunks of the second size identified in blockcorresponding to one of document,and, then the relevant documents are documents,, andand the subset of chunks in the first plurality of chunks may comprise only those chunks in the first plurality of chunks that correspond to or are associated with documents,and.

Using an information retrieval system to identify, from a subset of chunks in a first plurality of documents, a set of chunks of the first size that are relevant to the query may comprise using an index engine of the information retrieval system to generate a search index for the first plurality of chunks and using a search engine of the information retrieval system to perform a filtered search (filtered so as to be limited to the subset) on the search index for the first plurality of chunks to identify a set of chunks of the first size that are similar to the query. The search index represents the information in the first plurality of chunks in a form that can be easily searched. It is noted that the search index for the first plurality of chunks is separate and distinct from the search index for the second plurality of chunks. As described above, there are many ways to generate a search index for a plurality of chunks, such as, but not limited to tokenization, vectorization and a combination of tokenization and vectorization. Where vectorization is used to generate a search index for the first plurality of chunks, each chunk of the first plurality of chunks is embedded, using an embedding model, into a plurality of embeddings (i.e., a multi-dimensional vector) and each multi-dimensional vector is stored in the search index in a searchable field.

As described above, there are many ways to search a search index for items that are relevant to a query. The method used by the search engine to search a search index is generally based on the technique or techniques used to generate the search index. For example, as described above, where the search index was generated using vectorization then searching the search index to identify chunks of the second size that are similar to the query may comprise converting (or embedding), using the same embedding model used to generate the vectors in the search index, the query into a plurality of embeddings (i.e., multi-dimensional vector) and identifying (using, for example KNN or HNSW) chunks of the first size, in the subset, that have a multi-dimensional vector that is close to the multi-dimensional vector for the query based on one or more distance metrics (e.g. cosine angle etc.).

1400 Once a set of chunks of the first size that are relevant to the query have been identified the methodmay end or the set of chunks of the first size may be retrieved from a data store or repository.

15 FIG. 9 FIG. 1500 900 1500 1500 1502 222 1500 1504 Reference is now made towhich illustrates an example methodfor retrieving information in a corpus of documents that is relevant to a query which may be executed, for example, by the information retrieval systemof. The methodmay be implemented by one or more processors of a computer. The methodbegins at blockwhere each document in the corpus of documents is subdivided into one or more chunks. The documents may be subdivided into chunks of using any suitable method, such as, but not limited to, those described above with respect to the chunking module. Once the documents have been subdivided into a plurality of chunks the methodproceeds to block.

1504 926 1502 926 922 924 926 928 At block, an LLM (e.g., synthetic generation LLM) is used to generate at least one piece of synthetic information for each chunk generated in block. In some cases, using an LLM to generate at least one piece of synthetic information for each chunk may comprise, for each chunk, providing that chunk to the LLM along with a synthetic generation prompt that instructs the synthetic generation LLMto generate one or more pieces of synthetic information,related to the item (e.g., chunk). The synthetic data that the synthetic generation LLMis instructed to generate by the synthetic generation promptmay comprise one or more of: a summary of the item chunk, keywords for the chunk, the content of the chunk, and one or more questions that can be answered by the chunk. In some cases, using an LLM to generate at least one piece of synthetic information for each chunk may also or alternatively comprise, for each document of the corpus of documents, providing the document to the LLM to generate synthetic information (e.g., a summary) for the document, and the synthetic information generated for the document may be used as one piece of synthetic information for each chunk associated with (i.e., generated from) that document.

1506 1502 1500 1508 9 FIG. At block, an embedding model is used to generate a plurality of vectors for each chunk generated in block. The plurality of vectors for a chunk comprises a vector generated from the chunk and a vector generated from each of the at least one piece of synthetic information related to that chunk (i.e., a different vector is generated for each piece of synthetic information generated for that chunk). In some cases, as shown in, the generated vectors may be stored in a search index of an information retrieval system in separate search fields. Once a plurality of vectors has been generated for each chunk, the methodproceeds to block.

1508 1502 1506 At block, an information retrieval system is used to identify, from the plurality of vectors for each chunk, a set of chunks, of the chunks generated in block, that are relevant to a query. The information retrieval system may identify the set of chunks that are relevant to the query by using the embedding model used in block, to generate a vector for the amended user query and comparing the vector for the query to the plurality of vectors for each chunk of the plurality of chunks.

As described above, in some cases, the information retrieval system may be configured to group all of the vectors that were generated in the same manner together (e.g., grouping all the vectors generated from a chunk itself together, grouping all the vectors generated from a summary of the chunk together etc.); performing a separate vector search on the vectors in each group to identify the k chunks with multi-dimensional vectors that are closest to the query multi-dimensional vector; and then combining the results of the different vector searches to generate a final set of k chunks that are most similar to the query. The set of k chunks with multi-dimensional vectors in a particular group that are closest to the query multi-dimensional vector may be identified using any suitable algorithm such as, but not limited to, KNN and HNSW. The distance between multi-dimensional vectors may be measured using any suitable metric such as, but not limited to, cosine angle, Euclidean distance and DotProduct.

Once a search has been performed on each vector group, such that there is a ranked list of k chunks for each vector group, the results of the vector searches are combined to get a final list of k chunks that are most relevant to the query. In one example, the results may be combined using a re-ranker technique or algorithm, such as, but not limited to, Reciprocal Rank Fusion (RRF) with or without weighted scoring. In RRF each chunk, in a ranked list of k chunks, is assigned a reciprocal rank score based on its position in the list. The score is calculated as 1/(rank+m), where rank is the position of the items in the list and m is a constant that may be empirically selected. Then for each chunk, its reciprocal rank scores are combined to get a final combined score. The chunks are then ranked based on their combined scores. For example, in some cases the combined score for chunk may be the sum of its reciprocal scores. In other cases, the reciprocal score for vectors in different groups may be weighted differently. For example, the ranking for the vector generated from the chunk itself may be given more weight than the ranking for the vector generated from a summary of the chunk. In these cases, the combined score for a chunk may be a weighted sum of its reciprocal scores.

1500 Once a set of chunks that are relevant to the query have been identified the methodmay end or the set of chunks may be retrieved from a data store or repository.

16 FIG. 10 FIG. 1600 1000 1600 1600 1602 1008 904 1600 1604 Reference is now made towhich illustrates an example methodfor generating a response to a query based on a set of document chunks which may be executed, for example, by the systemof. The methodmay be implemented by one or more processors of a computer. The methodbegins at blockwhere chain of thought prompting is used to cause an LLM (e.g., a re-ranker LLM) to rank a set of document chunks based on their relevance to a query. The document chunks that are ranked may be document chunks that have been retrieved by an information retrieval system on the basis that they are related to the query. This may comprise providing the LLM with the set of document chunks, and the query along with a CoT re-ranker prompt that instructs the LLM to, for each chunk in the set of chunks, explain (using the chunk) why that chunk is relevant to the query and assign a relevance rating thereto, and then rank the set of chunks based on their relevance to the query. Once the LLM has ranked the set of document chunks, the methodproceeds to block.

1604 1602 1600 1606 2 8 FIGS.and At block, a subset of the document chunks is selected based on the ranking generated in block. Any method for selecting a subset of the document chunks, such as those described above with respect to, may be used. Once a subset of the document chunks is selected, the methodproceeds to block.

1606 1600 2 FIG. 10 FIG. At block, an LLM is used to generate a response to the query based on the subset of the document. This may comprise providing the LLM with a generation prompt such as that described above with respect toor providing the LLM with a CoT generation prompt such as that described above with respect to. Once the LLM has generated the response the methodmay end.

17 FIG. 11 FIG. 15 FIG. 1700 1100 1700 1700 1502 1508 1500 1700 1702 Reference is now made towhich illustrates an example methodof generating a response to a query based on a corpus of documents, which may be implemented by the systemof. The methodmay be implemented using one or more processors of one or more computers. The methodbegins with blockstoof the methodofto retrieve a set of document chunks that are relevant to the query. The methodthen proceeds to block.

1702 1106 1504 1700 1704 At blockchain of thought prompting is used to cause an LLM (e.g., a re-ranker LLM) to rank a set of document chunks based on their relevance to a query. This may comprise providing the LLM with the set of document chunks, and the query along with a CoT re-ranker prompt that instructs the LLM to, for each chunk in the set of chunks, explain (using the chunk and the related synthetic information generated in block) why that chunk is relevant to the query and assign a relevance rating thereto, and then rank the set of chunks based on their relevance to the query. Once the LLM has ranked the set of document chunks, the methodproceeds to block.

1704 1702 1700 1706 2 8 FIGS.and At block, a subset of the document chunks is selected based on the ranking generated in block. Any method for selecting a subset of the document chunks, such as those described above with respect to, may be used. Once a subset of the document chunks is selected, the methodproceeds to block.

1706 1504 1700 2 FIG. 11 FIG. At block, an LLM is used to generate a response to the query based on the subset of document chunks and their corresponding synthetic information generated in block. This may comprise providing the LLM with a generation prompt such as that described above with respect toor providing the LLM with a CoT generation prompt such as that described above with respect to. Once the LLM has generated the response the methodmay end.

Various systems or processes have been described to provide examples of embodiments of the claimed subject matter. No such example embodiment described limits any claim and any claim may cover processes or systems that differ from those described. The claims are not limited to systems or processes having all the features of any one system or process described above or to features common to multiple or all the systems or processes described above. It is possible that a system or process described above is not an embodiment of any exclusive right granted by issuance of this patent application. Any subject matter described above and for which an exclusive right is not granted by issuance of this patent application may be the subject matter of another protective instrument, for example, a continuing patent application, and the applicants, inventors or owners do not intend to abandon, disclaim or dedicate to the public any such subject matter by its disclosure in this document.

For simplicity and clarity of illustration, reference numerals may be repeated among the figures to indicate corresponding or analogous elements. In addition, numerous specific details are set forth to provide a thorough understanding of the subject matter described herein. However, it will be understood by those of ordinary skill in the art that the subject matter described herein may be practiced without these specific details. In other instances, well-known methods, procedures, and components have not been described in detail so as not to obscure the subject matter described herein.

The terms “coupled” or “coupling” as used herein can have several different meanings depending in the context in which these terms are used. For example, the terms coupled or coupling can have a mechanical, electrical or communicative connotation. For example, as used herein, the terms coupled or coupling can indicate that two elements or devices are directly connected to one another or connected to one another through one or more intermediate elements or devices via an electrical element, electrical signal, or a mechanical element depending on the particular context. Furthermore, the term “operatively coupled” may be used to indicate that an element or device can electrically, optically, or wirelessly send data to another element or device as well as receive data from another element or device.

As used herein, the wording “and/or” is intended to represent an inclusive-or. That is, “X and/or Y” is intended to mean X or Y or both, for example. As a further example, “X, Y, and/or Z” is intended to mean X or Y or Z or any combination thereof.

Terms of degree such as “substantially”, “about”, and “approximately” as used herein mean a reasonable amount of deviation of the modified term such that the result is not significantly changed. These terms of degree may also be construed as including a deviation of the modified term if this deviation would not negate the meaning of the term it modifies.

Any recitation of numerical ranges by endpoints herein includes all numbers and fractions subsumed within that range (e.g., 1 to 5 includes 1, 1.5, 2, 2.75, 3, 3.90, 4, and 5). It is also to be understood that all numbers and fractions thereof are presumed to be modified by the term “about” which means a variation of up to a certain amount of the number to which reference is being made if the result is not significantly changed.

112 112 112 a b Some elements herein may be identified by a part number, which is composed of a base number followed by an alphabetical or subscript-numerical suffix (e.g.,, or). All elements with a common base number may be referred to collectively or generically using the base number without a suffix (e.g.,).

The systems and methods described herein may be implemented as a combination of hardware or software. In some cases, the systems and methods described herein may be implemented, at least in part, by using one or more computer programs, executing on one or more programmable devices including at least one processing element, and a data storage element (including volatile and non-volatile memory and/or storage elements). These systems may also have at least one input device (e.g., a pushbutton keyboard, mouse, a touchscreen, and the like), and at least one output device (e.g., a display screen, a printer, a wireless radio, and the like) depending on the nature of the device. Further, in some examples, one or more of the systems and methods described herein may be implemented in or as part of a distributed or cloud-based computing system having multiple computing components distributed across a computing network. For example, the distributed or cloud-based computing system may correspond to a private distributed or cloud-based computing cluster that is associated with an organization. Additionally, or alternatively, the distributed or cloud-based computing system be a publicly accessible, distributed or cloud-based computing cluster, such as a computing cluster maintained by Microsoft Azure™, Amazon Web Services™, Google Cloud™, or another third-party provider. In some instances, the distributed computing components of the distributed or cloud-based computing system may be configured to implement one or more parallelized, fault-tolerant distributed computing and analytical processes, such as processes provisioned by an Apache Spark™ distributed, cluster-computing framework or a Databricks™ analytical platform. Further, and in addition to the CPUs described herein, the distributed computing components may also include one or more graphics processing units (GPUs) capable of processing thousands of operations (e.g., vector operations) in a single clock cycle, and additionally, or alternatively, one or more tensor processing units (TPUs) capable of processing hundreds of thousands of operations (e.g., matrix operations) in a single clock cycle.

Some elements that are used to implement at least part of the systems, methods, and devices described herein may be implemented via software that is written in a high-level procedural language such as object-oriented programming language. Accordingly, the program code may be written in any suitable programming language such as Python or Java, for example. Alternatively, or in addition thereto, some of these elements implemented via software may be written in assembly language, machine language or firmware as needed. In either case, the language may be a compiled or interpreted language.

At least some of these software programs may be stored on a storage media (e.g., a computer readable medium such as, but not limited to, read-only memory, magnetic disk, optical disc) or a device that is readable by a general or special purpose programmable device. The software program code, when read by the programmable device, configures the programmable device to operate in a new, specific, and predefined manner to perform at least one of the methods described herein.

Furthermore, at least some of the programs associated with the systems and methods described herein may be capable of being distributed in a computer program product including a computer readable medium that bears computer usable instructions for one or more processors. The medium may be provided in various forms, including non-transitory forms such as, but not limited to, one or more diskettes, compact disks, tapes, chips, and magnetic and electronic storage. Alternatively, the medium may be transitory in nature such as, but not limited to, wire-line transmissions, satellite transmissions, internet transmissions (e.g., downloads), media, digital and analog signals, and the like. The computer usable instructions may also be in various formats, including compiled and non-compiled code.

While the above description provides examples of one or more processes or systems, it will be appreciated that other processes or systems may be within the scope of the accompanying claims.

To the extent any amendments, characterizations, or other assertions previously made (in this or in any related patent applications or patents, including any parent, sibling, or child) with respect to any art, prior or otherwise, could be construed as a disclaimer of any subject matter supported by the present disclosure of this application, Applicant hereby rescinds and retracts such disclaimer. Applicant also respectfully submits that any prior art previously considered in any related patent applications or patents, including any parent, sibling, or child, may need to be revisited.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G06F G06F16/93 G06F16/24578 G06F16/953

Patent Metadata

Filing Date

August 23, 2024

Publication Date

February 26, 2026

Inventors

Noël Vouitsis

Jiapeng Wu

Yi Sui

Graham Andrew Warner

Paulina Corona Ugalde

Maksims Volkovs

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search