10691891

Information Extraction from Natural Language Texts

PublishedJune 23, 2020
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
18 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A method, comprising: extracting, by a computer system, a first plurality of information objects from a natural language text; extracting, from the natural language text, a second plurality of information objects; determining that a first textual annotation associated with a first information object of the first plurality of information objects is overlapping with a second textual annotation associated with a second information object of the second plurality of information objects; applying, to the first information object and the second information object, a conflict arbitration function represented by a machine learning classifier yielding a likelihood of the first information object and the second information object representing a same object.

Plain English translation pending...
Claim 2

Original Legal Text

2. The method of claim 1 , wherein extracting the first plurality of information objects is performed by a first information extraction technique and extracting the second plurality of information objects is performed by a second information extraction technique.

Plain English Translation

This invention relates to a system for extracting information from data using multiple extraction techniques. The problem addressed is the inefficiency and inaccuracy of relying on a single information extraction method, which may fail to capture diverse or complex data structures. The solution involves using at least two distinct extraction techniques to process the same data, improving coverage and accuracy. The method begins by obtaining a dataset containing unstructured or semi-structured information. A first extraction technique is applied to the dataset to identify and extract a first set of information objects, such as entities, relationships, or attributes. Simultaneously or sequentially, a second extraction technique is applied to the same dataset to extract a second set of information objects. The two techniques may differ in their approach—for example, one might use rule-based pattern matching while the other employs machine learning-based natural language processing. The extracted information objects from both techniques are then combined, either by merging overlapping results or integrating complementary findings. This hybrid approach ensures that the final output is more comprehensive and robust than what a single technique could achieve. The method may also include post-processing steps, such as conflict resolution or validation, to refine the extracted data. This invention is applicable in fields like document analysis, knowledge graph construction, and data integration, where accurate and thorough information extraction is critical. By leveraging multiple extraction methods, the system enhances the reliability and completeness of the extracted data.

Claim 3

Original Legal Text

3. The method of claim 1 , further comprising: producing a final list of information objects extracted from the natural language text; and utilizing the final list of information objects for performing a natural language processing operation.

Plain English Translation

This invention relates to natural language processing (NLP) systems that extract structured information from unstructured text. The problem addressed is the difficulty in accurately identifying and organizing relevant information from natural language text for downstream NLP tasks. The method involves analyzing a natural language text to identify and extract information objects, which are discrete pieces of structured data derived from the text. These objects are then processed to produce a final list of extracted information, which is used to perform further NLP operations such as text classification, entity recognition, or semantic analysis. The extraction process may involve techniques like pattern matching, rule-based parsing, or machine learning models to identify and categorize information objects within the text. The final list of extracted objects serves as a structured input for subsequent NLP tasks, improving accuracy and efficiency in processing unstructured text data. The method ensures that the extracted information is relevant and properly formatted for use in automated text analysis systems.

Claim 4

Original Legal Text

4. The method of claim 1 , further comprising: producing a final list of information objects extracted from the natural language text; and representing the final list of information objects by a Resource Definition Framework (RDF) graph.

Plain English Translation

This invention relates to natural language processing and knowledge extraction, specifically addressing the challenge of converting unstructured text into structured, machine-readable data. The method involves analyzing natural language text to identify and extract information objects, such as entities, relationships, or facts, from the text. These extracted objects are then organized into a final list, which is subsequently represented as a Resource Definition Framework (RDF) graph. RDF is a standard model for data interchange on the web, enabling semantic queries and interoperability between different data sources. The process ensures that the extracted information is structured in a way that supports automated reasoning, data integration, and knowledge discovery. The method may also include preprocessing steps to clean or normalize the text, as well as post-processing to refine the extracted information. The RDF graph representation allows for efficient storage, retrieval, and analysis of the extracted knowledge, facilitating applications in semantic search, data mining, and artificial intelligence. The invention improves upon existing techniques by providing a systematic approach to transforming unstructured text into a structured, queryable knowledge base.

Claim 5

Original Legal Text

5. The method of claim 1 , further comprising: evaluating a logical condition comprising a first attribute of the first information object and a second attribute of the second information object.

Plain English Translation

This invention relates to a system for processing and comparing information objects, particularly in contexts where logical conditions must be evaluated based on attributes of the objects. The problem addressed is the need to efficiently and accurately assess relationships or dependencies between information objects by analyzing their attributes in a structured manner. The method involves evaluating a logical condition that compares a first attribute of a first information object with a second attribute of a second information object. This evaluation determines whether a predefined relationship or condition is satisfied, enabling automated decision-making or further processing based on the comparison. The logical condition may involve operations such as equality, inequality, or more complex relational checks, depending on the attributes being analyzed. The information objects may represent data records, documents, or other structured entities, and their attributes could include metadata, content values, or derived properties. The evaluation of the logical condition allows for dynamic filtering, matching, or validation of the objects based on their attributes, which is useful in applications like data integration, workflow automation, or compliance checking. The method ensures that the comparison is performed in a systematic way, reducing errors and improving efficiency in systems that rely on attribute-based decision-making. By formalizing the evaluation of logical conditions between object attributes, the invention supports scalable and reliable processing of information in various domains.

Claim 6

Original Legal Text

6. The method of claim 1 , further comprising: determining that the first information object has a number of attributes of a certain type exceeding a threshold number of attributes of the certain type.

Plain English Translation

A system and method for managing information objects in a data processing environment addresses the challenge of efficiently organizing and retrieving large volumes of structured and unstructured data. The invention provides a technique for analyzing and categorizing information objects based on their attributes to improve data organization and searchability. The method involves processing a first information object, which may be any digital data entity such as a file, database record, or metadata entry, and examining its attributes. These attributes are properties or characteristics associated with the object, such as tags, metadata fields, or content descriptors. The method further includes determining whether the first information object has a number of attributes of a certain type that exceeds a predefined threshold. The "certain type" of attribute may refer to a specific category, such as keywords, timestamps, or user-defined tags. If the threshold is exceeded, the system may trigger additional processing steps, such as indexing, filtering, or prioritizing the object for further analysis. This approach enhances data management by identifying and handling objects with a high density of relevant attributes, improving efficiency in data retrieval and processing workflows. The method may be applied in various domains, including document management, database systems, and content recommendation engines, to optimize how information is structured and accessed.

Claim 7

Original Legal Text

7. The method of claim 1 , further comprising: appending, to a training data set, the natural language text accompanied by metadata comprising definitions and textual annotations of the first information object and the second information object; and training, utilizing the training data set, a machine learning classifier implementing the conflict arbitration function.

Plain English Translation

This invention relates to conflict resolution in information systems, specifically for arbitrating inconsistencies between different information objects. The problem addressed is the difficulty in automatically resolving conflicts when multiple sources provide conflicting information, such as in databases, knowledge graphs, or document processing systems. The invention improves upon prior methods by enhancing conflict arbitration through machine learning. The method involves analyzing a first information object and a second information object to detect conflicts, such as semantic or logical inconsistencies. Once a conflict is identified, the system retrieves contextual information about the conflicting objects, including their definitions and textual annotations. This contextual data is then appended to a training dataset, which is used to train a machine learning classifier. The classifier is designed to implement a conflict arbitration function, meaning it learns to determine the correct resolution for conflicts based on the annotated training data. By leveraging machine learning, the system improves its ability to resolve conflicts accurately over time, reducing the need for manual intervention. The approach is particularly useful in applications where automated conflict resolution is critical, such as in data integration, knowledge management, and document processing.

Claim 8

Original Legal Text

8. The method of claim 1 , further comprising: determining a first confidence level associated with the first information object.

Plain English Translation

A system and method for processing information objects, such as documents or data entries, to improve accuracy and reliability in automated systems. The technology addresses the challenge of ensuring that information objects used in decision-making or data processing are accurate and trustworthy, particularly in environments where automated systems rely on such objects for tasks like classification, retrieval, or analysis. The method involves analyzing an information object to determine its relevance or validity, then assigning a confidence level to indicate the likelihood that the object is correct or useful. This confidence level is derived from factors such as the object's source, structure, or consistency with other data. The method may also involve comparing the object to reference data or applying machine learning models to assess its reliability. Additionally, the method includes determining a first confidence level associated with the first information object, which helps in evaluating the object's trustworthiness before further processing. This confidence assessment can be used to filter out low-confidence objects, prioritize high-confidence objects, or trigger additional verification steps. The system may also adjust processing steps based on the confidence level, such as applying stricter validation rules for low-confidence objects or skipping redundant checks for high-confidence objects. The technology is applicable in fields like document management, data validation, and automated decision-making systems where ensuring the accuracy of information objects is critical.

Claim 9

Original Legal Text

9. A computer system, comprising: a memory; a processor, coupled to the memory, the processor configured to: extract a first plurality of information objects from a natural language text; extract, from the natural language text, a second plurality of information objects; determine that a first textual annotation associated with a first information object of the first plurality of information objects is overlapping with a second textual annotation associated with a second information object of the second plurality of information objects; and apply, to the first information object and the second information object, a conflict arbitration function represented by a machine learning classifier yielding a likelihood of the first information object and the second information object representing a same object.

Plain English Translation

This invention relates to a computer system for resolving conflicts between overlapping textual annotations in natural language text processing. The system addresses the problem of ambiguity when multiple information objects extracted from text overlap, making it unclear whether they refer to the same real-world entity or different ones. The system includes a memory and a processor that performs several key functions. First, it extracts a first set of information objects from the text, such as entities, relationships, or attributes. Then, it extracts a second set of information objects from the same text. The processor detects when a textual annotation (e.g., a span of text) associated with an object from the first set overlaps with an annotation from the second set. To resolve this conflict, the system applies a machine learning classifier that evaluates the likelihood that the overlapping objects represent the same real-world entity. The classifier uses learned patterns to arbitrate conflicts, improving the accuracy of information extraction. This approach enhances natural language understanding by reducing ambiguity in overlapping annotations, which is critical for applications like document analysis, question answering, and knowledge graph construction.

Claim 10

Original Legal Text

10. The computer system of claim 9 , wherein extracting the first plurality of information objects is performed by a first information extraction technique and extracting the second plurality of information objects is performed by a second information extraction technique.

Plain English Translation

This invention relates to a computer system for extracting and processing information objects from data sources using multiple extraction techniques. The system addresses the challenge of efficiently retrieving diverse information from unstructured or semi-structured data by employing different extraction methods tailored to specific data characteristics. The system includes a data source interface for accessing input data, an extraction module that applies at least two distinct information extraction techniques, and a processing module that analyzes the extracted information objects. The first extraction technique is optimized for a particular type of data or structure, while the second technique is designed for another type, ensuring comprehensive coverage. The system may also include a validation module to verify the accuracy of extracted objects and a storage module to organize the results. By using multiple extraction methods, the system improves the breadth and accuracy of information retrieval compared to single-technique approaches. This is particularly useful in applications requiring multi-faceted data analysis, such as natural language processing, document classification, or knowledge graph construction. The system dynamically adapts to different data formats and extraction requirements, enhancing flexibility and performance.

Claim 11

Original Legal Text

11. The computer system of claim 9 , wherein the processor is further configured to: produce a final list of information objects extracted from the natural language text; and utilize the final list of information objects for performing a natural language processing operation.

Plain English Translation

This invention relates to a computer system for extracting and processing information objects from natural language text. The system addresses the challenge of efficiently identifying and utilizing structured data within unstructured text to enable advanced natural language processing (NLP) operations. The computer system includes a processor that processes natural language text to extract relevant information objects, which are discrete pieces of data or entities such as names, dates, or concepts. These extracted objects are then compiled into a final list, which serves as input for further NLP tasks. The system may also include a memory for storing the extracted information objects and a display for presenting the results. The processor is configured to perform operations such as filtering, categorizing, or analyzing the extracted objects to enhance the accuracy and utility of the NLP operations. This approach improves the efficiency and effectiveness of text analysis by transforming unstructured text into structured, actionable data. The invention is particularly useful in applications requiring automated text interpretation, such as document summarization, sentiment analysis, or knowledge extraction.

Claim 12

Original Legal Text

12. The computer system of claim 9 , wherein the processor is further configured to: produce a final list of information objects extracted from the natural language text; and represent the final list of information objects by a Resource Definition Framework (RDF) graph.

Plain English Translation

This invention relates to natural language processing (NLP) systems that extract structured information from unstructured text and represent it in a machine-readable format. The system addresses the challenge of converting unstructured natural language text into a standardized, queryable knowledge graph. The processor extracts information objects from the text, such as entities, relationships, and attributes, and organizes them into a final list. These objects are then represented as a Resource Definition Framework (RDF) graph, enabling semantic querying and knowledge representation. The RDF graph structure allows for efficient storage, retrieval, and analysis of the extracted information, supporting applications like knowledge management, semantic search, and data integration. The system may also include preprocessing steps to clean and normalize the input text, as well as post-processing to refine the extracted objects before RDF conversion. The RDF graph may include nodes representing entities and edges representing relationships, with optional metadata annotations for further context. This approach enhances interoperability and enables advanced reasoning over the extracted data.

Claim 13

Original Legal Text

13. The computer system of claim 9 , wherein the processor is further configured to: evaluate a logical condition comprising a first attribute of the first information object and a second attribute of the second information object.

Plain English Translation

This invention relates to a computer system for managing and processing information objects, particularly focusing on evaluating logical conditions between attributes of different information objects. The system addresses the challenge of efficiently comparing and analyzing relationships between distinct data entities in a structured manner, which is critical for applications such as data integration, decision-making, and automated workflows. The computer system includes a processor configured to evaluate a logical condition that involves a first attribute of a first information object and a second attribute of a second information object. This evaluation enables the system to determine relationships or dependencies between the objects based on their attributes, facilitating tasks such as data validation, rule enforcement, or conditional processing. The processor may also be configured to perform additional operations, such as generating a result based on the evaluation or triggering further actions if the logical condition is satisfied. The system ensures that the evaluation is performed in a computationally efficient and scalable manner, supporting large datasets and complex attribute relationships. This capability is particularly useful in environments where automated decision-making or data consistency checks are required, such as in enterprise software, database management, or artificial intelligence applications. The invention enhances the ability to derive meaningful insights from interconnected data objects by systematically assessing their attributes against predefined logical conditions.

Claim 14

Original Legal Text

14. The computer system of claim 9 , wherein the processor is further configured to: append, to a training data set, the natural language text accompanied by metadata comprising definitions and textual annotations of the first information object and the second information object; and train, utilizing the training data set, a machine learning classifier implementing the conflict arbitration function.

Plain English Translation

This invention relates to a computer system for resolving conflicts between information objects in a knowledge graph or database. The system identifies conflicts where two or more information objects represent the same real-world entity but contain inconsistent data. The processor analyzes the objects to detect such conflicts and applies a conflict arbitration function to determine the correct version. The arbitration function may use predefined rules, statistical methods, or machine learning to resolve discrepancies. The system then updates the knowledge graph to reflect the resolved version, ensuring data consistency. The processor is further configured to enhance the conflict resolution process by appending natural language text and metadata to a training dataset. This metadata includes definitions and textual annotations of the conflicting information objects. The system then trains a machine learning classifier using this augmented dataset to improve the accuracy of the conflict arbitration function. The classifier learns from the annotated examples to better distinguish between correct and incorrect data, reducing manual intervention and improving automated conflict resolution. This approach leverages machine learning to adapt to evolving data patterns and improve over time.

Claim 15

Original Legal Text

15. The computer system of claim 9 , wherein the processor is further configured to: determine a first confidence level associated with the first information object.

Plain English Translation

A computer system is designed to process and analyze information objects, such as data entries, documents, or digital assets, to improve decision-making or automation tasks. The system addresses challenges in accurately assessing the reliability or relevance of information objects, which can lead to errors in automated systems or misinformed decisions. The system includes a processor that evaluates information objects by determining a first confidence level associated with a first information object. This confidence level quantifies the system's certainty in the accuracy, validity, or relevance of the information object, enabling the system to prioritize, filter, or validate the object based on its reliability. The processor may also compare the confidence level against predefined thresholds or use it to refine subsequent processing steps, such as classification, extraction, or decision-making. The system may further integrate additional data sources or contextual information to adjust the confidence level dynamically. By quantifying confidence, the system enhances the robustness of automated processes, reduces errors, and improves the reliability of information-driven decisions.

Claim 16

Original Legal Text

16. A computer-readable non-transitory storage medium comprising executable instructions that, when executed by a computer system, cause the computer system to: extract a first plurality of information objects from a natural language text; extract, from the natural language text, a second plurality of information objects; determine that a first textual annotation associated with a first information object of the first plurality of information objects is overlapping with a second textual annotation associated with a second information object of the second plurality of information objects; and apply, to the first information object and the second information object, a conflict arbitration function represented by a machine learning classifier yielding a likelihood of the first information object and the second information object representing a same object.

Plain English Translation

This invention relates to natural language processing (NLP) systems that extract and resolve conflicting information objects from text. The problem addressed is the ambiguity that arises when multiple overlapping annotations identify different entities or concepts within the same text segment, leading to inconsistencies in data extraction. The solution involves a machine learning classifier that arbitrates conflicts by determining the likelihood that overlapping annotations refer to the same object. The system processes a natural language text by first extracting a first set of information objects, such as named entities, relationships, or attributes, each associated with a textual annotation indicating its position in the text. A second set of information objects is also extracted, with its own annotations. The system then identifies overlaps where annotations from the first and second sets cover the same or partially overlapping text segments. To resolve these conflicts, a machine learning classifier is applied, which evaluates the likelihood that the conflicting objects represent the same real-world entity or concept. The classifier may use features such as textual context, semantic similarity, or domain-specific rules to make this determination. The output is a resolved set of information objects with reduced ambiguity, improving the accuracy of downstream applications like knowledge graphs, search engines, or data integration systems. The approach is particularly useful in domains where text contains complex or ambiguous references, such as legal documents, medical records, or technical specifications.

Claim 17

Original Legal Text

17. The computer-readable non-transitory storage medium of claim 16 , wherein extracting the first plurality of information objects is performed by a first information extraction technique and extracting the second plurality of information objects is performed by a second information extraction technique.

Plain English Translation

This invention relates to a system for extracting and processing information objects from data sources using different extraction techniques. The system addresses the challenge of efficiently retrieving and analyzing diverse information from multiple sources, where different extraction methods may be required to handle varying data formats, structures, or complexities. The system involves a computer-readable non-transitory storage medium containing instructions for extracting a first set of information objects using a first extraction technique and a second set of information objects using a second extraction technique. The first and second extraction techniques may differ in their approach, such as rule-based parsing, machine learning-based extraction, or natural language processing, depending on the nature of the data being processed. The system ensures that the appropriate extraction method is applied to each data source to maximize accuracy and completeness of the retrieved information. The extracted information objects are then processed and analyzed to derive insights, support decision-making, or integrate into larger data workflows. The use of distinct extraction techniques allows the system to adapt to different data types, improving flexibility and performance in information retrieval tasks. This approach is particularly useful in applications such as document analysis, web scraping, or data integration, where heterogeneous data sources require specialized extraction methods.

Claim 18

Original Legal Text

18. The computer-readable non-transitory storage medium of claim 16 , further comprising executable instructions that, when executed by the computer system, cause the computer system to: produce a final list of information objects extracted from the natural language text; and utilize the final list of information objects for performing a natural language processing operation.

Plain English Translation

This invention relates to natural language processing (NLP) systems designed to extract and utilize information objects from text. The technology addresses the challenge of efficiently identifying and processing structured data within unstructured natural language content, such as documents, articles, or other textual inputs. The system employs a multi-stage process to refine and extract relevant information objects, which are then used to perform downstream NLP tasks. The process begins by generating an initial set of information objects from the natural language text. These objects may include entities, relationships, facts, or other structured data extracted using linguistic analysis, machine learning, or rule-based methods. The system then refines this initial set by applying filtering, validation, or enrichment techniques to improve accuracy and relevance. This refinement may involve cross-referencing with external knowledge bases, resolving ambiguities, or disambiguating terms. After refinement, the system produces a final list of high-quality information objects. These objects are then utilized for various NLP operations, such as question answering, summarization, semantic search, or knowledge graph construction. The system ensures that the extracted information is both precise and contextually appropriate for the intended application. The invention enhances the reliability and utility of NLP systems by providing a structured, validated dataset derived from unstructured text inputs.

Patent Metadata

Filing Date

Unknown

Publication Date

June 23, 2020

Inventors

Stepan Evgenyevich Matskevich
Ilya Aleksandrovich Bulgakov

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, FAQs, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “INFORMATION EXTRACTION FROM NATURAL LANGUAGE TEXTS” (10691891). https://patentable.app/patents/10691891

© 2026 Nomic Interactive Technology LLC. Machine-readable context available at /api/llm-context/10691891. See llms.txt for full attribution policy.

INFORMATION EXTRACTION FROM NATURAL LANGUAGE TEXTS