A computer-implemented technique is described herein for extracting facts from unstructured text documents provided by one or more information sources. The technique uses a pipeline to perform this operation that involves, at least in part, providing a corpus of information items, extracting candidate facts from the information items, merging synonymous argument values associated with the candidate facts, organizing the candidate facts into relation clusters, and assessing the confidence level of the candidate facts within the relation clusters.
Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
4. The system of claim 1, wherein the text and the other text comprise freeform text.
7. The system of claim 1, wherein respective final extracted relations identify first entities, second entities, and relationships between the first entities and the second entities.
9. The system of claim 8, the first information source being an online encyclopedia, the second information source comprising other web pages accessible over the World Wide Web.
14. The method of claim 12, further comprising: employing an impression log to identify the second information items.
A system and method for identifying and processing information items in a digital environment. The technology addresses the challenge of efficiently tracking and retrieving relevant information items from large datasets, particularly in scenarios where user interactions or system-generated data need to be analyzed for patterns or trends. The method involves collecting and storing interaction data, such as user clicks or system logs, to generate an impression log. This log records instances where information items are presented to users or systems, even if no direct interaction occurs. The impression log is then used to identify a subset of information items, referred to as second information items, which may include items that were viewed but not explicitly selected. This identification process helps in analyzing user behavior, optimizing content delivery, or improving recommendation algorithms. The method may also involve filtering or prioritizing these second information items based on additional criteria, such as relevance, recency, or frequency of impressions. By leveraging the impression log, the system can gain insights into passive user engagement, enhancing the accuracy of data-driven decisions in digital platforms. The approach is particularly useful in advertising, content personalization, and user experience optimization.
17. The method of claim 12, wherein outputting the final extracted relations comprises adding the final extracted relations to an ontology.
18. The method of claim 17, wherein outputting the final extracted relations comprises providing the final extracted relations to one or more knowledge-consuming applications.
19. The method of claim 18, the one or more knowledge-consuming applications comprising a question answering system, a recommendation system, and a personal assistant.
20. The method of claim 12, wherein the text and the other text comprise freeform text.
This invention relates to a method for processing text data, specifically addressing the challenge of analyzing and comparing freeform text inputs. Freeform text refers to unstructured or natural language text that does not follow a predefined format, making it difficult to extract meaningful information or perform accurate comparisons. The method involves receiving a first text input and a second text input, both of which are freeform text, and then analyzing these inputs to determine their relationship or similarity. The analysis may include comparing the content, context, or semantic meaning of the texts to identify matches, differences, or other relevant patterns. The method may also involve preprocessing the text inputs, such as removing noise, normalizing formatting, or extracting key features, to improve the accuracy of the analysis. The results of the comparison can be used for various applications, including document classification, plagiarism detection, or content recommendation. The method is designed to handle the variability and complexity of freeform text, providing a robust solution for text-based data processing tasks.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
May 31, 2019
October 18, 2022
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.