Patentable/Patents/US-11468238
US-11468238

Data processing systems and methods

PublishedOctober 11, 2022
Assigneenot available in USPTO data we have
Inventorsnot available in USPTO data we have
Technical Abstract

Example data processing systems and methods are described. In one implementation, a system accesses a corpus of data and analyzes the data contained in the corpus of data to identify multiple documents. The system generates vector indexes for the multiple documents such that the vector indexes allow a computing system to quickly access the plurality of documents and identify an answer to a question associated with the corpus of data.

Patent Claims
11 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 2

Original Legal Text

2. The method of claim 1, further comprising applying extractive summarization to select the portion of the document from best text portions of the document.

Plain English Translation

This invention relates to document processing, specifically improving the selection of relevant text portions for summarization or analysis. The problem addressed is the challenge of accurately identifying and extracting the most informative segments of a document to generate concise, meaningful summaries or insights. Traditional methods often rely on heuristic or rule-based approaches, which may not effectively capture the most relevant content, especially in complex or lengthy documents. The method involves applying extractive summarization techniques to identify and select the best text portions of a document. Extractive summarization involves analyzing the document to determine which sentences or segments are most representative of the overall content. This is typically done by evaluating factors such as sentence importance, relevance, and coherence. The selected portions are then used to generate a summary or for further processing, ensuring that the most critical information is retained. The approach enhances the accuracy and efficiency of document summarization by leveraging automated techniques to pinpoint the most valuable text segments, reducing reliance on manual or less precise methods. This method is particularly useful in applications requiring automated document analysis, such as legal, medical, or business research, where extracting key information quickly and accurately is essential.

Claim 3

Original Legal Text

3. The method of claim 1, wherein the at least two different paraphrasing techniques include full backtranslation using a neural machine translation model.

Plain English Translation

This invention relates to natural language processing, specifically techniques for generating multiple paraphrased versions of a text using different paraphrasing methods. The problem addressed is the need for diverse and high-quality paraphrases to improve tasks like data augmentation, text generation, and machine learning model training. The solution involves using at least two distinct paraphrasing techniques, including full backtranslation with a neural machine translation model. Backtranslation involves translating the original text into a different language and then translating it back to the original language, producing a paraphrased version. The method ensures linguistic diversity by combining this with other paraphrasing approaches, such as rule-based or statistical techniques. The resulting paraphrases maintain semantic meaning while varying in structure and wording, enhancing applications like text summarization, plagiarism detection, and content generation. The approach leverages neural machine translation models, which are trained on large datasets to capture nuanced language patterns, ensuring high-quality translations and paraphrases. By integrating multiple techniques, the method mitigates biases and limitations inherent in any single approach, providing robust and varied outputs. This is particularly useful in scenarios requiring multiple perspectives or avoiding overfitting in machine learning models. The invention improves upon prior methods by systematically combining diverse paraphrasing strategies, ensuring both accuracy and diversity in generated text.

Claim 4

Original Legal Text

4. The method of claim 1, wherein the at least two different paraphrasing techniques include noun/verb phrase backtranslation.

Plain English Translation

This invention relates to natural language processing (NLP) and machine translation, specifically addressing the challenge of generating diverse paraphrases of a given text. The problem solved is the need for high-quality, varied paraphrases that avoid repetition and maintain semantic fidelity, which is critical for applications like content generation, plagiarism detection, and multilingual translation. The method involves using at least two different paraphrasing techniques to produce multiple paraphrases of an input text. One of these techniques is noun/verb phrase backtranslation, where the input text is first translated into a different language and then translated back to the original language, often altering phrasing while preserving meaning. The other technique may include rule-based transformations, such as synonym replacement or syntactic restructuring, to further diversify the output. The system processes the input text through these techniques in parallel or sequentially, ensuring the resulting paraphrases differ in structure and wording while retaining the original meaning. This approach enhances the quality and variety of paraphrases, making it useful for applications requiring linguistic diversity, such as automated content creation, machine learning training data augmentation, and multilingual document processing. The method may also incorporate post-processing steps to refine the paraphrases, such as grammatical correction or semantic validation, to ensure accuracy and coherence.

Claim 5

Original Legal Text

5. The method of claim 1, wherein the at least two different paraphrasing techniques include synonym replacement.

Plain English Translation

This invention relates to natural language processing (NLP) and text generation, specifically improving the quality and diversity of paraphrased text. The problem addressed is the limited variability in paraphrasing systems, which often produce repetitive or low-quality outputs when generating alternative phrasings of a given text. The solution involves using multiple distinct paraphrasing techniques, including synonym replacement, to enhance the diversity and accuracy of generated paraphrases. The method processes input text by applying at least two different paraphrasing techniques, one of which is synonym replacement. Synonym replacement involves substituting words or phrases in the input text with semantically equivalent alternatives from a predefined vocabulary or knowledge base. The other paraphrasing techniques may include structural transformations, such as reordering sentences or modifying syntactic structures, or semantic transformations, such as rephrasing clauses or altering sentence meaning while preserving core information. The combined application of these techniques generates a set of paraphrased outputs that are both linguistically varied and contextually accurate. The system may further refine the outputs by evaluating their fluency, coherence, and semantic similarity to the original text, ensuring high-quality results. This approach improves the robustness of NLP applications like machine translation, text summarization, and content generation by providing more diverse and contextually appropriate paraphrases.

Claim 6

Original Legal Text

6. The method of claim 1, wherein the at least two different paraphrasing techniques include phrase replacement.

Plain English Translation

The invention relates to natural language processing (NLP) and text generation, specifically addressing the challenge of creating diverse paraphrased text outputs. The core problem is generating multiple high-quality paraphrases of a given input text while avoiding redundancy and maintaining semantic accuracy. Existing methods often rely on a single paraphrasing technique, leading to limited variability in output. The invention improves upon prior art by employing at least two distinct paraphrasing techniques, including phrase replacement, to enhance output diversity. Phrase replacement involves substituting specific phrases or segments of the input text with semantically equivalent alternatives. The method may also incorporate other techniques such as synonym substitution, syntactic restructuring, or contextual rephrasing to further diversify the generated paraphrases. By combining multiple approaches, the system produces a broader range of paraphrased outputs, reducing repetition and improving the quality of generated text. This is particularly useful in applications like content generation, machine translation, and automated summarization, where varied paraphrasing improves readability and avoids redundancy. The invention ensures that the paraphrased outputs remain coherent and contextually accurate while introducing meaningful variations in phrasing and structure.

Claim 7

Original Legal Text

7. The method of claim 1, wherein de-duplication comprises: determining, via the neural sentence encoder of the computing system, a first vector representation of a portion of the document and a respective vector representation of each of the plurality of candidate paraphrases; and discarding the one or more candidate paraphrases based on a computed respective cosine similarity between the first vector representation and the respective vector representation of each of the plurality of candidate paraphrases.

Plain English translation pending...
Claim 11

Original Legal Text

11. The method of claim 1, wherein filtering the plurality of candidate paraphrases includes removing irrelevant sentences.

Plain English translation pending...
Claim 13

Original Legal Text

13. The method of claim 12, wherein the at least two different paraphrasing techniques include at least two of full backtranslation, noun/verb phrase backtranslation, synonym replacement, and phrase replacement.

Plain English translation pending...
Claim 14

Original Legal Text

14. The method of claim 12, further comprising discarding a particular paraphrase if the respective cosine similarity is less than 0.5 or greater than 0.95.

Plain English translation pending...
Claim 15

Original Legal Text

15. The method of claim 12, wherein filtering the plurality of candidate paraphrases includes removing irrelevant sentences.

Plain English Translation

This invention relates to natural language processing (NLP) and text generation, specifically improving the quality of paraphrased text by filtering out irrelevant sentences. The problem addressed is the generation of low-quality or irrelevant paraphrases in automated text generation systems, which can reduce the usefulness of paraphrasing tools in applications like machine translation, content creation, and summarization. The method involves generating a plurality of candidate paraphrases from an input text and then filtering these candidates to remove irrelevant sentences. The filtering process may include analyzing semantic relevance, coherence, or contextual appropriateness to ensure the remaining paraphrases are meaningful and contextually accurate. The method may also involve ranking the filtered paraphrases based on quality metrics, such as fluency, diversity, or semantic similarity to the original text. By removing irrelevant sentences, the system improves the accuracy and reliability of paraphrased outputs, making it more suitable for automated content generation and language processing tasks. The approach enhances the efficiency of NLP systems by reducing the need for manual review and correction of generated text.

Claim 20

Original Legal Text

20. The method of claim 17, wherein the at least two paraphrasing techniques include full backtranslation via a neural machine translation model, noun/verb phrase backtranslation via a neural parser, or a combination thereof.

Plain English Translation

This invention relates to natural language processing, specifically techniques for generating paraphrased text using multiple approaches. The problem addressed is the need for diverse and high-quality paraphrasing methods to improve tasks like text augmentation, data augmentation for machine learning, and reducing redundancy in text processing. The method involves applying at least two distinct paraphrasing techniques to input text. One technique is full backtranslation, where the text is first translated into a different language using a neural machine translation model and then translated back to the original language. Another technique is noun/verb phrase backtranslation, where specific grammatical components (nouns, verbs, or phrases) are isolated, translated, and then reintegrated into the original text. The system may combine these techniques to produce varied paraphrased outputs. The approach leverages neural models to ensure linguistic coherence while introducing diversity in paraphrased text. By using multiple techniques, the method avoids over-reliance on a single paraphrasing strategy, improving robustness and applicability across different text types. The invention is particularly useful in applications requiring text variation, such as training language models, generating synthetic data, or enhancing content generation systems.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

Patent Metadata

Filing Date

November 6, 2019

Publication Date

October 11, 2022

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, FAQs, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “Data processing systems and methods” (US-11468238). https://patentable.app/patents/US-11468238

© 2026 Nomic Interactive Technology LLC. Machine-readable context available at /api/llm-context/US-11468238. See llms.txt for full attribution policy.