10831762

Extracting and Denoising Concept Mentions Using Distributed Representations of Concepts

PublishedNovember 10, 2020
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
22 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A method, in an information handling system comprising a processor and a memory, for analyzing candidate concepts, the method comprising: generating, by the system, at least a first concept set comprising one or more candidate concepts extracted from a first source text; retrieving, by the system, a reference concept set comprising a plurality of concepts representing known characteristics of a desired vector space which may be used to identify concept outliers; generating or retrieving, by the system, a vector representation for each of the concepts in the first concept set and the reference concept set; performing, by the system, a natural language processing (NLP) analysis comparison of the vector representation of the first concept set to vector representation of the reference concept set to determine a similarity measure corresponding to each candidate concept; and validating, by the system, that the first concept set correctly identifies the first source text by using the similarity measure for each candidate concept which does not meet a minimum similarity threshold to detect concept outliers in the one or more candidate concepts extracted from the first source text, thereby expediting and qualitatively improving the analysis of candidate concepts in the first concept set.

Plain English Translation

The field of natural language processing (NLP) and concept analysis involves extracting meaningful concepts from text and comparing them to known reference concepts to identify relevant or anomalous information. A challenge in this domain is efficiently and accurately validating extracted concepts to ensure they align with desired characteristics of a target domain or knowledge space. Existing methods may struggle with scalability, accuracy, or computational efficiency when analyzing large text corpora. This invention addresses these challenges by providing a method for analyzing candidate concepts in an information handling system with a processor and memory. The system generates a first concept set containing one or more candidate concepts extracted from a source text. It then retrieves a reference concept set representing known characteristics of a desired vector space, which helps identify concept outliers. Vector representations are generated or retrieved for each concept in both the first concept set and the reference concept set. The system performs an NLP analysis to compare these vector representations, producing a similarity measure for each candidate concept. These measures are used to validate the first concept set by detecting outliers—concepts that fall below a minimum similarity threshold. This process improves the accuracy and efficiency of concept analysis by ensuring extracted concepts align with the desired vector space, thereby expediting and enhancing the quality of the analysis.

Claim 2

Original Legal Text

2. The method of claim 1 , further comprising presenting, by the system, an indication that at least one candidate concept is an outlier or erroneous concept if the similarity metric corresponding to the at last one candidate concept does not meet a minimum similarity threshold.

Plain English Translation

This invention relates to a system for analyzing and validating candidate concepts in a technical or analytical context. The system identifies and evaluates candidate concepts by comparing them to a reference set of known or validated concepts using a similarity metric. The system calculates a similarity score for each candidate concept based on its relevance or correspondence to the reference concepts. If a candidate concept's similarity score falls below a predefined minimum threshold, the system flags it as an outlier or erroneous concept, indicating that it may be invalid, irrelevant, or incorrect. This helps users or automated processes filter out low-quality or misleading concepts, improving the reliability of the analysis. The system may also include additional steps such as generating candidate concepts from input data, selecting a subset of candidate concepts for further evaluation, and refining the similarity metric based on feedback or additional data. The invention is particularly useful in fields requiring high accuracy, such as data analysis, machine learning, or knowledge management, where distinguishing valid concepts from outliers is critical.

Claim 3

Original Legal Text

3. The method of claim 1 , wherein generating at least the first concept set comprises extracting a plurality of candidate concepts from the first source text using an annotator.

Plain English Translation

The invention relates to a method for generating concept sets from source text, addressing the challenge of efficiently extracting meaningful concepts from unstructured or semi-structured text data. The method involves processing a first source text to generate at least a first concept set, which is done by extracting a plurality of candidate concepts using an annotator. The annotator identifies and categorizes relevant concepts within the text, such as entities, relationships, or attributes, based on predefined criteria or machine learning models. The extracted concepts are then refined or filtered to form the final concept set, which can be used for further analysis, knowledge representation, or information retrieval tasks. The method may also involve generating additional concept sets from other source texts or combining multiple concept sets to enhance the accuracy or completeness of the extracted knowledge. This approach improves the automation and scalability of concept extraction, making it useful in applications like natural language processing, semantic search, and data mining.

Claim 4

Original Legal Text

4. The method of claim 1 , wherein performing the NLP analysis comprises analyzing a vector similarity function sim(Vi,Vj)) between (1) one or more vectors Vi for the one or more candidate concepts Ci and (2) the vector Vj for the reference concept set.

Plain English Translation

This invention relates to natural language processing (NLP) techniques for analyzing and comparing concepts within a text corpus. The problem addressed is the need to accurately assess the relevance or similarity of candidate concepts to a predefined reference concept set, which is critical for applications such as semantic search, content recommendation, and knowledge extraction. The method involves performing NLP analysis by computing a vector similarity function sim(Vi, Vj) between vectors representing candidate concepts and a reference concept set. The candidate concepts are derived from a text corpus, and each concept is transformed into a vector Vi using techniques like word embeddings or neural language models. The reference concept set is similarly represented as a vector Vj. The similarity function measures the degree of relatedness between the candidate vectors and the reference vector, enabling the identification of concepts that are semantically close to the reference set. The analysis may include preprocessing steps such as tokenization, normalization, and dimensionality reduction to optimize the vector representations. The similarity function can be based on cosine similarity, Euclidean distance, or other metrics tailored to the specific application. The output of this analysis can be used to rank, filter, or cluster concepts based on their relevance to the reference set, improving the accuracy of downstream tasks like information retrieval or topic modeling. This approach enhances the precision of concept-based applications by leveraging vector-based semantic representations.

Claim 5

Original Legal Text

5. The method of claim 1 , wherein performing the NLP analysis comprises analyzing a vector similarity function sim(Vi,Vj)) between (1) each vector Vi for the one or more candidate concepts Ci and (2) each remaining vector Vj from the one or more candidate concepts Ci.

Plain English Translation

This invention relates to natural language processing (NLP) techniques for analyzing relationships between concepts in text data. The problem addressed is the need to efficiently and accurately determine semantic similarities or differences between candidate concepts extracted from text, which is crucial for applications like information retrieval, text classification, and knowledge graph construction. The method involves performing NLP analysis by computing a vector similarity function sim(Vi, Vj) between each vector Vi representing a candidate concept Ci and every other vector Vj from the remaining candidate concepts. The vectors are derived from embeddings or other numerical representations of the concepts, capturing their semantic meaning. By comparing these vectors, the method quantifies how closely related or distinct the concepts are, enabling tasks such as clustering, ranking, or filtering concepts based on their similarity. The analysis may involve various similarity metrics, such as cosine similarity, Euclidean distance, or other statistical measures, to assess the degree of alignment between the vectors. The results can be used to identify redundant concepts, group related concepts, or prioritize concepts based on their relevance to a given query or context. This approach improves the accuracy and efficiency of NLP systems by leveraging vector-based comparisons to refine concept relationships.

Claim 6

Original Legal Text

6. The method of claim 5 , wherein analyzing the vector similarity function sim(Vi,Vj)) comprises, for each candidate concept Ci for i=1 . . . N: computing, by the system, the similarity measure corresponding to said candidate concept Ci as a cosine distance measure between each vector pair Vi, Vj for j=1 . . . N, i≠j; and selecting a nearest neighbor Ni to said candidate concept Ci having a maximum cosine distance measure.

Plain English Translation

This invention relates to analyzing vector similarity in a computational system to identify relationships between concepts. The problem addressed is efficiently determining the most relevant or similar concepts within a dataset by leveraging vector representations and similarity metrics. The method involves computing a similarity measure for each candidate concept in a dataset. For each concept, the system calculates a cosine distance measure between its vector representation and the vectors of all other concepts. The cosine distance quantifies the angular difference between vectors, providing a measure of dissimilarity. The system then identifies the nearest neighbor for each concept, defined as the concept with the maximum cosine distance measure, indicating the most dissimilar or least similar concept. This process is repeated for all candidate concepts in the dataset, resulting in a set of nearest neighbors that can be used for further analysis, such as clustering, classification, or recommendation tasks. The approach enables efficient comparison and ranking of concepts based on their vector representations, improving the accuracy and scalability of similarity-based applications.

Claim 7

Original Legal Text

7. The method of claim 1 , wherein retrieving the reference concept set comprises extracting the plurality of concepts extracted from the first source text to generate the reference concept set.

Plain English Translation

This invention relates to a method for processing text data to extract and compare concepts from different sources. The method addresses the challenge of identifying and analyzing key concepts within a text to generate a reference set for comparison with other texts. The process involves extracting a plurality of concepts from a first source text, where these concepts represent meaningful elements such as ideas, entities, or topics. These extracted concepts are then used to generate a reference concept set, which serves as a baseline for further analysis. The reference concept set can be compared with concepts extracted from a second source text to determine similarities, differences, or relationships between the two texts. This method is useful in applications such as document comparison, content analysis, and information retrieval, where understanding the conceptual alignment between texts is critical. The extraction process may involve natural language processing techniques, such as tokenization, part-of-speech tagging, or semantic analysis, to identify and categorize concepts accurately. The generated reference concept set can be refined or filtered to ensure relevance and precision, enhancing the accuracy of subsequent comparisons. This approach enables automated and scalable analysis of textual data, improving efficiency in tasks that require conceptual understanding and alignment between documents.

Claim 8

Original Legal Text

8. The method of claim 1 , wherein retrieving the reference concept set comprises constructing one or more anchor reference concepts which comprise the plurality of concepts.

Plain English Translation

This invention relates to a method for retrieving and utilizing reference concept sets in a knowledge processing system. The method addresses the challenge of efficiently identifying and organizing relevant concepts from a large knowledge base to support tasks such as information retrieval, semantic analysis, or decision-making. The method involves constructing one or more anchor reference concepts, which serve as foundational elements within a broader set of concepts. These anchor concepts are used to define and structure the reference concept set, ensuring that the retrieved concepts are coherent and relevant to the intended application. The method may also include steps for refining or expanding the reference concept set based on contextual or domain-specific requirements. By leveraging anchor concepts, the method improves the accuracy and efficiency of concept retrieval, enabling more effective knowledge processing in applications such as natural language understanding, expert systems, or data mining. The invention is particularly useful in systems where concept relationships and hierarchies are critical for accurate information processing.

Claim 9

Original Legal Text

9. The method of claim 1 , wherein validating that the first concept set correctly identifies the first source text comprises identifying any candidate concept having a corresponding similarity metric that meets a minimum similarity threshold.

Plain English Translation

This invention relates to natural language processing and text analysis, specifically improving the accuracy of concept extraction from source texts. The problem addressed is ensuring that extracted concepts correctly represent the original text by validating their similarity to the source content. The method involves analyzing a source text to generate a first concept set, which represents key ideas or entities extracted from the text. To validate the accuracy of this concept set, the method compares each candidate concept against the source text using a similarity metric. If a concept's similarity metric meets or exceeds a predefined minimum threshold, it is deemed valid. This validation step ensures that only relevant and accurate concepts are retained, improving the reliability of downstream applications such as information retrieval, summarization, or semantic analysis. The similarity metric may be based on semantic, syntactic, or statistical measures, depending on the application. The minimum similarity threshold can be adjusted to balance precision and recall, allowing the method to adapt to different text domains or user requirements. By filtering out low-similarity concepts, the method reduces noise and enhances the quality of the extracted knowledge. This approach is particularly useful in applications requiring high-fidelity text representation, such as legal, medical, or technical documentation analysis.

Claim 10

Original Legal Text

10. An information handling system comprising: one or more processors; a memory coupled to at least one of the processors; a set of instructions stored in the memory and executed by at least one of the processors to analyze candidate concepts, wherein the set of instructions are executable to perform actions of: generating, by the system, at least a first concept set comprising one or more candidate concepts extracted from a first source text; retrieving, by the system, a reference concept set comprising a plurality of concepts representing known characteristics of a desired vector space which may be used to identify concept outliers; generating or retrieving, by the system, a vector representation for each of the concepts in the first concept set and the reference concept set; performing, by the system, a natural language processing (NLP) analysis comparison of the vector representation of the first concept set to vector representation of the reference concept set to determine a similarity measure corresponding to each candidate concept; and validating, by the system, that the first concept set correctly identifies the first source text by using the similarity measure for each candidate concept which does not meet a minimum similarity threshold to detect concept outliers in the one or more candidate concepts, thereby expediting and qualitatively improving the analysis of candidate concepts in the first concept set.

Plain English Translation

This invention relates to an information handling system designed to analyze and validate candidate concepts extracted from text data. The system addresses the challenge of accurately identifying and validating relevant concepts in a given text by comparing them against a reference set of known concepts. The system includes one or more processors and a memory storing instructions that, when executed, perform several key functions. First, it generates a concept set from a source text, extracting candidate concepts. It then retrieves a reference concept set representing known characteristics of a desired vector space, which helps identify outliers. The system generates or retrieves vector representations for each concept in both the candidate and reference sets. Using natural language processing (NLP), it compares these vector representations to determine similarity measures for each candidate concept. The system validates the concept set by detecting outliers—candidate concepts that fall below a minimum similarity threshold—thereby improving the accuracy and efficiency of concept analysis. This approach expedites the validation process and enhances the quality of concept identification in text analysis tasks.

Claim 11

Original Legal Text

11. The information handling system of claim 10 , wherein the set of instructions are executable to present an indication that at least one candidate concept is an outlier or erroneous concept if the similarity metric corresponding to the at last one candidate concept does not meet a minimum similarity threshold.

Plain English Translation

The invention relates to an information handling system designed to analyze and process data concepts, particularly focusing on identifying and flagging outliers or erroneous concepts within a dataset. The system includes a processor and a memory storing a set of instructions executable by the processor. These instructions enable the system to generate a set of candidate concepts from input data, compute similarity metrics between these candidate concepts and a reference concept, and compare these metrics against a predefined minimum similarity threshold. If a candidate concept's similarity metric falls below this threshold, the system presents an indication that the concept is an outlier or erroneous. This functionality helps users or downstream processes recognize potentially invalid or anomalous data points, improving data quality and reliability. The system may also include additional features such as generating a similarity matrix or visualizing the relationships between concepts, further enhancing the analysis process. The invention is particularly useful in applications requiring high data accuracy, such as machine learning, data validation, and decision-making systems.

Claim 12

Original Legal Text

12. The information handling system of claim 10 , wherein the set of instructions are executable to generate at least the first concept set by extracting a plurality of candidate concepts from the first source text using an annotator.

Plain English Translation

The invention relates to information handling systems designed to process and analyze text data. Specifically, it addresses the challenge of extracting meaningful concepts from source text to facilitate knowledge representation and retrieval. The system includes a processor and a memory storing instructions that, when executed, enable the extraction of concepts from text sources. The process involves using an annotator to identify and extract a plurality of candidate concepts from a first source text. These candidate concepts are then processed to generate at least one concept set, which represents structured knowledge derived from the text. The system may further refine these concepts by filtering or organizing them into a hierarchical or relational structure. The extracted concepts can be used for various applications, such as semantic search, knowledge graph construction, or natural language understanding. The annotator may employ techniques like named entity recognition, part-of-speech tagging, or dependency parsing to identify relevant concepts. The system ensures that the extracted concepts are contextually relevant and accurately represent the information contained in the source text. This approach enhances the efficiency and accuracy of text analysis, enabling better knowledge extraction and representation in digital systems.

Claim 13

Original Legal Text

13. The information handling system of claim 10 , wherein the set of instructions are executable to perform the NLP analysis by analyzing a vector similarity function sim(Vi,Vj)) between (1) one or more vectors Vi for the one or more candidate concepts Ci and (2) the vector Vj for the reference concept set.

Plain English Translation

This invention relates to information handling systems that use natural language processing (NLP) to analyze and compare concepts. The system addresses the challenge of accurately identifying and categorizing candidate concepts within a given reference concept set, which is critical for applications like semantic search, document classification, and knowledge graph construction. The system includes a processor and a memory storing instructions that, when executed, perform NLP analysis by computing a vector similarity function sim(Vi, Vj) between vectors Vi and Vj. Here, Vi represents one or more vectors corresponding to candidate concepts Ci, while Vj represents a vector for the reference concept set. The similarity function quantifies how closely related the candidate concepts are to the reference concepts, enabling the system to determine relevance or categorization. The system may also include additional components such as a display for presenting results, a network interface for data exchange, and storage for maintaining concept vectors. The NLP analysis may involve techniques like word embeddings, semantic similarity scoring, or clustering to refine concept relationships. This approach improves accuracy in concept mapping and reduces ambiguity in automated text analysis tasks.

Claim 14

Original Legal Text

14. The information handling system of claim 10 , wherein the set of instructions are executable to perform the NLP analysis by analyzing a vector similarity function sim(Vi,Vj)) between (1) each vector Vi for the one or more candidate concepts Ci and (2) each remaining vector Vj from the one or more candidate concepts Ci.

Plain English Translation

This invention relates to information handling systems that use natural language processing (NLP) to analyze and compare concepts. The system addresses the challenge of accurately identifying and relating concepts within text data by leveraging vector similarity analysis. The system processes input data to extract one or more candidate concepts, each represented as a vector. These vectors are then compared using a vector similarity function, such as a cosine similarity or Euclidean distance, to determine the relationship between each candidate concept and other concepts in the dataset. The similarity function evaluates the closeness or distance between vectors, enabling the system to identify semantically related or distinct concepts. This analysis helps improve tasks like information retrieval, text classification, and concept mapping by providing a quantitative measure of conceptual similarity. The system may also include additional processing steps, such as preprocessing the input data or refining the similarity function, to enhance accuracy. The invention is particularly useful in applications requiring automated concept extraction and comparison, such as search engines, recommendation systems, and knowledge management tools.

Claim 15

Original Legal Text

15. The information handling system of claim 14 , wherein analyzing the vector similarity function sim(Vi,Vj)) comprises, for each candidate concept Ci for i=1 . . . N: computing, by the system, the similarity measure corresponding to said candidate concept Ci as a cosine distance measure between each vector pair Vi, Vj for j=1 . . . N, i≠j; and selecting a nearest neighbor Ni to said candidate concept Ci having a maximum cosine distance measure.

Plain English Translation

This invention relates to information handling systems that analyze vector similarity functions to identify relationships between concepts. The system addresses the challenge of efficiently determining semantic or contextual relationships between concepts represented as vectors in a high-dimensional space, which is computationally intensive and prone to noise or irrelevant matches. The system computes a similarity measure for each candidate concept by calculating the cosine distance between vector pairs. For each concept Ci among N candidate concepts, the system evaluates the cosine distance between its vector Vi and every other vector Vj (where j ranges from 1 to N and i≠j). The cosine distance measures the angular difference between vectors, providing a normalized similarity score regardless of vector magnitude. The system then identifies the nearest neighbor Ni for each concept Ci as the concept with the maximum cosine distance measure, indicating the strongest semantic or contextual relationship. This approach improves the accuracy and efficiency of concept relationship analysis by leveraging cosine similarity, which is robust to variations in vector magnitude and effectively captures directional relationships. The method is particularly useful in applications like natural language processing, recommendation systems, and knowledge graph construction, where understanding concept relationships is critical.

Claim 16

Original Legal Text

16. The information handling system of claim 10 , wherein the set of instructions are executable to retrieve the reference concept set by extracting the plurality of concepts extracted from the first source text to generate the reference concept set.

Plain English Translation

The invention relates to information handling systems designed to process and analyze text data by extracting and comparing concepts from different sources. The system addresses the challenge of identifying and comparing key concepts within large volumes of text to improve information retrieval, analysis, and decision-making processes. The system includes a processor and a memory storing instructions that, when executed, enable the extraction of concepts from a first source text to generate a reference concept set. This reference concept set is then used to compare or analyze other text data, enhancing the accuracy and relevance of information processing tasks. The system may also include additional components for storing, retrieving, or displaying the extracted concepts, ensuring efficient handling of textual information. The extraction process involves analyzing the first source text to identify and isolate relevant concepts, which are then compiled into the reference concept set for further use. This approach improves the system's ability to derive meaningful insights from text data, supporting applications in fields such as natural language processing, data mining, and knowledge management.

Claim 17

Original Legal Text

17. The information handling system of claim 10 , wherein the set of instructions are executable to retrieve the reference concept set by constructing one or more anchor reference concepts which comprise the plurality of concepts.

Plain English Translation

The invention relates to information handling systems designed to process and analyze conceptual data. The system addresses the challenge of efficiently retrieving and organizing reference concepts from a dataset to improve information retrieval, data analysis, or knowledge management tasks. The system includes a set of instructions that, when executed, construct one or more anchor reference concepts from a plurality of concepts. These anchor reference concepts serve as key reference points within the dataset, enabling the system to retrieve a reference concept set. The reference concept set is a structured collection of concepts derived from the anchor reference concepts, facilitating more accurate and contextually relevant information retrieval. The system may also include additional components such as a processor, memory, and input/output interfaces to support these operations. The method of constructing anchor reference concepts involves analyzing relationships between concepts, identifying central or representative concepts, and organizing them into a hierarchical or networked structure. This approach enhances the system's ability to handle large-scale datasets, improve search efficiency, and provide more meaningful results in applications like semantic search, recommendation systems, or knowledge graph construction.

Claim 18

Original Legal Text

18. The information handling system of claim 10 , wherein the set of instructions are executable to validate that the first concept set correctly identifies the first source text by identifying any candidate concept having a corresponding similarity metric that meets a minimum similarity threshold.

Plain English Translation

The invention relates to information handling systems designed to process and validate text data by identifying and validating concepts within source text. The system addresses the challenge of accurately extracting and confirming meaningful concepts from text to ensure reliable data interpretation and processing. The system includes a set of instructions that, when executed, perform concept identification and validation. Specifically, the instructions validate that a first concept set correctly identifies a first source text by evaluating candidate concepts. Each candidate concept is assessed using a similarity metric, which measures how closely the concept matches the source text. The system checks whether the similarity metric for each candidate concept meets a predefined minimum similarity threshold. If the metric meets or exceeds this threshold, the concept is deemed valid and correctly identified. This validation process ensures that only relevant and accurate concepts are retained, improving the reliability of text analysis and processing tasks. The system may also include additional components, such as a processor and memory, to support these operations. The overall goal is to enhance the accuracy and efficiency of text-based data handling in various applications, such as natural language processing, document analysis, and information retrieval.

Claim 19

Original Legal Text

19. A computer program product stored in a computer readable storage medium, comprising computer instructions that, when executed by an information handling system, causes the system to analyze candidate concepts by performing actions comprising: generating, by the system, at least a first concept set comprising one or more candidate concepts extracted from a first source text using an annotator; retrieving, by the system, a reference concept set comprising a plurality of concepts representing known characteristics of a desired vector space which may be used to identify concept outliers; generating or retrieving, by the system, a vector representation for each of the concepts in the first concept set and the reference concept set; performing, by the system, a natural language processing (NLP) analysis comparison of the vector representation of the first concept set to vector representation of the reference concept set to determine a similarity measure corresponding to each candidate concept by analyzing a vector similarity function sim(Vi,Vj)) between (1) each vector Vi for the one or more candidate concepts Ci and (2) each remaining vector Vj from the one or more candidate concepts Ci; validating, by the system, that the first concept set correctly identifies the first source text by using the similarity measure for each candidate concept which does not meet a minimum similarity threshold to detect concept outliers in the one or more candidate concepts extracted from the first source text; and presenting, by the system, an indication that at least one candidate concept is an outlier or erroneous concept if the similarity metric corresponding to the at last one candidate concept does not meet a minimum similarity threshold, thereby expediting and qualitatively improving the analysis of candidate concepts in the first concept set.

Plain English Translation

This invention relates to natural language processing (NLP) and concept analysis, specifically addressing the challenge of identifying erroneous or outlier concepts in text data. The system extracts candidate concepts from a source text using an annotator, generating a first concept set. It then retrieves a reference concept set representing known characteristics of a desired vector space, which serves as a benchmark for identifying outliers. Vector representations are generated or retrieved for concepts in both sets. The system performs an NLP analysis by comparing the vector representations of the first concept set to the reference set using a vector similarity function. This comparison produces a similarity measure for each candidate concept, which is evaluated against a minimum similarity threshold. Concepts failing to meet this threshold are flagged as outliers or erroneous. The system presents these results, improving the accuracy and efficiency of concept analysis by detecting and highlighting anomalies. This approach enhances the reliability of text-based concept extraction by ensuring extracted concepts align with expected characteristics, reducing errors in downstream applications.

Claim 20

Original Legal Text

20. The computer program product of claim 19 , wherein performing the NLP analysis comprises analyzing relationship strengths between concepts that persist in the first set of concept sequences and the second set of concept sequences.

Plain English Translation

This invention relates to natural language processing (NLP) techniques for analyzing relationships between concepts in sequential data. The problem addressed is the need to identify and quantify persistent relationships between concepts across different sets of concept sequences, which is useful in applications like text mining, knowledge extraction, and semantic analysis. The invention involves a computer program product that performs NLP analysis to compare two sets of concept sequences. The analysis includes determining the strength of relationships between concepts that appear in both sets. This helps identify which relationships are consistently present across different contexts or time periods, providing insights into the stability and significance of those relationships. The analysis may involve statistical or machine learning techniques to measure the strength of these relationships, such as co-occurrence frequencies, correlation scores, or semantic similarity metrics. The invention also includes preprocessing steps to extract concept sequences from input data, such as tokenization, part-of-speech tagging, and named entity recognition. The concept sequences are then compared to identify overlapping concepts and their relationships. The results can be used to build knowledge graphs, improve information retrieval systems, or enhance semantic search capabilities. The invention is particularly useful in fields like scientific literature analysis, social media monitoring, and automated content summarization, where understanding persistent relationships between concepts is critical.

Claim 21

Original Legal Text

21. The computer program product of claim 19 , wherein generating at least a first concept set comprises extracting a set of candidate concepts which non-uniquely correspond to one or more text segments in the first source text.

Plain English Translation

The invention relates to natural language processing and concept extraction from text. The problem addressed is the difficulty in accurately identifying and extracting meaningful concepts from source text, particularly when concepts may correspond to multiple text segments or lack unique identifiers. The invention involves a computer program product that processes source text to generate concept sets. A first concept set is created by extracting candidate concepts that do not uniquely correspond to individual text segments. This means a single concept may be linked to multiple segments in the source text, allowing for broader or overlapping concept definitions. The system may also generate a second concept set by extracting concepts that uniquely correspond to specific text segments, ensuring precise mapping where applicable. The invention further includes methods for refining these concept sets, such as filtering, merging, or prioritizing concepts based on relevance or frequency. The extracted concepts can then be used for further analysis, such as knowledge graph construction, semantic search, or content summarization. The approach improves concept extraction accuracy by accommodating both unique and non-unique mappings, enhancing the system's ability to handle ambiguous or context-dependent language.

Claim 22

Original Legal Text

22. The computer program product of claim 21 , wherein validating that the first concept set correctly identifies the first source text comprises selecting between two or more concepts in the set of candidate concepts which correspond to a first text segment based on a similarity measure comparison between a vector representation for each of the two or more concepts and the vector representation of the reference concept set.

Plain English Translation

This invention relates to natural language processing (NLP) and concept extraction, specifically improving the accuracy of identifying relevant concepts from text. The problem addressed is ensuring that a system correctly maps text segments to the most accurate concepts from a candidate set, which is critical for applications like semantic search, knowledge graphs, or automated content analysis. The invention involves a method for validating concept extraction by comparing vector representations of candidate concepts against a reference concept set. When analyzing a text segment, the system evaluates multiple candidate concepts by computing similarity measures between their vector representations and those of the reference concepts. The most similar concept is selected to validate whether the initial concept set correctly identifies the source text. This process helps refine concept extraction by ensuring that the chosen concepts accurately reflect the meaning of the text segment, reducing errors in downstream applications. The technique leverages vector embeddings, which capture semantic relationships, to improve the precision of concept mapping. By dynamically comparing candidate concepts against a reference, the system adapts to variations in language and context, enhancing reliability in concept extraction tasks. This approach is particularly useful in domains requiring high accuracy, such as legal document analysis, medical text processing, or automated summarization.

Patent Metadata

Filing Date

Unknown

Publication Date

November 10, 2020

Inventors

Tin Kam Ho
Luis A. Lastras-Montano

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, FAQs, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “Extracting and Denoising Concept Mentions Using Distributed Representations of Concepts” (10831762). https://patentable.app/patents/10831762

© 2026 Nomic Interactive Technology LLC. Machine-readable context available at /api/llm-context/10831762. See llms.txt for full attribution policy.