10585893

Data Processing

PublishedMarch 10, 2020
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
22 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A method, said method comprising: identifying, by one or more processors of a computer system, a plurality of entities within a first data source; for each entity identified within the first data source, said one or more processors identifying within the first data source attributes of the entity identified within the first data source and relationships between the entity identified within the first data source and other entities identified within the first data source, and associating the attributes and relationships identified within the first data source with a first entity identified within a data structure; said one or more processors generating, for each entity identified within the first data source, a frequency metric characterizing the entity identified within the first data source, said frequency metric based on a frequency at which each attribute and relationship identified within the first data source is associated with the entity identified within the first data source, said generating, for each entity identified within the first data source, the frequency metric characterizing the entity identified within the first data source comprising: generating multiple virtual triples for the entity identified within the first data source, each virtual triple consisting of a subject, a predicate, and an object, wherein the subject is the entity identified within the first data source, the predicate is the relationship identified within the first data source, and the object is the attribute identified within the first data source; and computing the frequency metric (Score (triple)) characterizing the entity identified within the first data source for each triple according to: Score (triple)=SUM(TF)×SUM(ABS(LOG 10(1.0×(ALL.ACNT)/(I.NB_ENTITY)))), wherein SUM(TF)=count of number of instances of the triple per entity identified within the first data source, summed over the entities, wherein ALL.ACNT=total number of entities within the first data source, and wherein I.NB_ENTITY=count of number of entities of a predicate-object pair within the each triple; said one or more processors identifying a degree of similarity between two entities of the plurality of entities by comparing the respective frequency metrics of the two entities; and said one or more processors associating the two entities within the data structure in response to a determination that an identified degree of similarity between the two entities is greater than a first predetermined threshold.

Plain English Translation

This invention relates to data analysis and entity resolution within computer systems. The problem addressed is the efficient identification and comparison of similar entities within a data source. The method involves a computer system with processors that first identify multiple entities in a primary data source. For each identified entity, the processors then find its attributes and its relationships with other entities within the same data source. These attributes and relationships are then stored in a data structure, linked to the corresponding entity. Next, a frequency metric is generated for each entity. This metric is calculated based on how often each attribute and relationship is associated with that entity. The generation of this metric involves creating virtual triples, where each triple represents an entity (subject), a relationship (predicate), and an attribute (object). The frequency metric for each triple is computed using a specific formula: Score(triple) = SUM(TF) * ABS(LOG10(1.0 * (ALL.ACNT) / (I.NB_ENTITY))). Here, SUM(TF) is the count of triple instances for an entity across all entities, ALL.ACNT is the total number of entities, and I.NB_ENTITY is the count of entities sharing a specific predicate-object pair. Finally, the degree of similarity between any two entities is determined by comparing their calculated frequency metrics. If this similarity degree exceeds a predefined threshold, the two entities are associated within the data structure.

Claim 2

Original Legal Text

2. The method of claim 1 , said method comprising: said one or more processors identifying one or more entities within a second data source; for each entity identified in the second data source, said one or more processors identifying within the second data source attributes and relationships of the entity identified within the second data source, and associating the attributes and entities identified in the second data source with the first entity identified within the data structure; generating, for each entity identified in the second data source, a frequency metric characterizing the entity identified in the second data source based on a frequency at which each attribute and relationship identified within the second data source is associated with the entity identified within the second data source; wherein a degree of similarity between an entity in the first data source and an entity in the second data source is identified by comparing the respective frequency metrics of the two entities.

Plain English Translation

This invention relates to data processing systems that compare entities across different data sources by analyzing their attributes and relationships. The problem addressed is the difficulty in accurately matching or linking entities from disparate datasets due to variations in naming, structure, or context. The solution involves a method for identifying and comparing entities by extracting and analyzing their attributes and relationships from multiple data sources. The method begins by identifying entities within a first data source and storing them in a structured format. Then, entities within a second data source are identified. For each entity in the second data source, the system extracts its attributes and relationships, linking them to the corresponding entity in the first data source. A frequency metric is generated for each entity in the second data source, quantifying how often its attributes and relationships appear in association with it. This metric is then used to determine the similarity between entities from the first and second data sources by comparing their respective frequency metrics. The approach enables more accurate entity matching by leveraging statistical patterns in attribute and relationship occurrences.

Claim 3

Original Legal Text

3. The method according to claim 1 , wherein the frequency metric characterizing the entity identified within the first data source represents a degree of association between the entity identified within the first data source and the attributes and relationships identified within the first data source.

Plain English Translation

This invention relates to analyzing data sources to identify entities and their associations with attributes and relationships. The method involves extracting entities from a first data source and determining a frequency metric that quantifies the degree of association between each entity and the attributes and relationships found in the same data source. This metric helps assess how strongly an entity is linked to its associated attributes and relationships within the data. The method may also involve comparing entities across multiple data sources to identify discrepancies or correlations. The frequency metric can be used to rank entities based on their relevance or significance within the data, enabling more accurate data analysis, entity resolution, or knowledge graph construction. The approach improves data processing by providing a measurable way to evaluate entity-attribute relationships, which is useful in applications like data integration, semantic search, and entity recognition. The method ensures that entities with stronger associations are prioritized, leading to more reliable and meaningful insights from the data.

Claim 4

Original Legal Text

4. The method of claim 1 , wherein said identifying the degree of similarity between the two entities comprises using a cosine distance computation between the respective frequency metrics of the two entities.

Plain English Translation

This invention relates to a method for determining the similarity between two entities by analyzing their frequency metrics. The method addresses the challenge of accurately measuring similarity in datasets where entities are represented by frequency distributions, such as text documents, user behavior patterns, or other structured data. The core problem is that traditional similarity measures may not effectively capture the nuanced relationships between entities when their frequency metrics vary significantly. The method involves computing a cosine distance between the frequency metrics of the two entities. Cosine distance is a measure of the angle between two vectors in a multi-dimensional space, making it particularly effective for comparing frequency distributions. By converting the frequency metrics into vectors, the method calculates the cosine of the angle between them, where a smaller angle (or lower cosine distance) indicates higher similarity. This approach is robust to variations in magnitude, focusing instead on the directional similarity of the frequency distributions. The method may be applied in various domains, such as natural language processing, recommendation systems, or anomaly detection, where understanding the similarity between entities is critical. By leveraging cosine distance, the method provides a computationally efficient and interpretable way to assess similarity, improving accuracy in tasks like clustering, classification, or retrieval. The technique is particularly useful when entities are represented as high-dimensional frequency vectors, ensuring scalability and effectiveness in large datasets.

Claim 5

Original Legal Text

5. The method of claim 1 , wherein said identifying the plurality of entities within the first data source comprises defining a set of entities to be searched for in the first data source.

Plain English Translation

This invention relates to data processing systems that identify and extract entities from a first data source, such as a database or document, and correlate them with a second data source. The problem addressed is the difficulty in accurately identifying and matching entities across different data sources due to variations in naming conventions, formats, or structural differences. The solution involves a method that first defines a set of entities to be searched for within the first data source. These entities may include specific data fields, records, or structured elements that are relevant to the correlation process. The method then searches the first data source using predefined criteria to locate these entities. Once identified, the entities are extracted and compared with corresponding entities in the second data source to establish relationships or mappings between them. This approach ensures that the search is targeted and efficient, reducing errors caused by irrelevant or mismatched data. The method may also involve preprocessing steps to standardize entity formats or applying rules to improve matching accuracy. The overall goal is to enable reliable data integration and analysis across disparate sources.

Claim 6

Original Legal Text

6. The method of claim 1 , wherein said identifying attributes of an entity identified within the first data source comprises decomposing text of the first data source into an entity, relationship and attribute triple, wherein the relationship is the relationship between the entity identified within the first data source and the attribute, or between the entity identified within the first data source and another entity identified within the first data source.

Plain English Translation

This invention relates to data processing systems that extract and analyze structured information from unstructured text. The problem addressed is the difficulty of automatically identifying and categorizing entities, their relationships, and attributes within large volumes of text data, which is crucial for applications like knowledge graph construction, semantic search, and data integration. The method involves decomposing text from a data source into structured triples consisting of an entity, a relationship, and an attribute. The relationship defines the connection between the entity and its attribute or between the entity and another entity within the same data source. For example, in the sentence "The company is located in New York," the entity "company" is linked to the attribute "New York" via the relationship "is located in." Similarly, in "The CEO leads the company," the relationship "leads" connects the entity "CEO" to the entity "company." The decomposition process involves natural language processing techniques to parse sentences, identify entities, and determine their relationships and attributes. This structured representation enables machines to understand and process textual information more effectively, supporting tasks such as knowledge extraction, data enrichment, and automated reasoning. The method improves upon traditional approaches by explicitly modeling relationships and attributes, enhancing the accuracy and usability of extracted data.

Claim 7

Original Legal Text

7. The method of claim 1 , said method comprising: said one or more processors providing a facility for a user to confirm an association between the two entities, or between an entity identified within the first data source and an attribute identified within the first data source.

Plain English Translation

This invention relates to data processing systems that manage and link entities and attributes within a data source. The problem addressed is the need for a user to efficiently confirm or verify associations between entities or between an entity and an attribute within a data source. The system includes one or more processors that provide a user interface or facility to enable a user to confirm these associations. The processors facilitate the display of potential associations, allowing the user to review and validate them. The method ensures that the relationships between entities or between an entity and an attribute are accurately established, improving data integrity and reliability. The system may also include mechanisms to track and store confirmed associations, ensuring that the data remains consistent and up-to-date. This approach enhances the accuracy of data linkages, reducing errors and improving the overall quality of the data source. The invention is particularly useful in applications where precise entity and attribute relationships are critical, such as in databases, knowledge graphs, or information management systems.

Claim 8

Original Legal Text

8. The method of claim 1 , said method comprising: said one or more processors providing a facility for a user to remove an association between the two entities, or between an entity identified within the first data source and an attribute of the entity identified within the first data source.

Plain English Translation

This invention relates to data management systems that handle relationships between entities and their attributes. The problem addressed is the need for users to dynamically modify or remove associations between entities or between an entity and its attributes within a data processing system. The invention provides a method for managing such associations in a structured data environment, where entities and their attributes are stored in one or more data sources. The method involves using one or more processors to enable a user to remove an association between two entities or between an entity and one of its attributes. The system allows users to interact with the data sources to modify relationships, ensuring flexibility in data organization. The processors facilitate the removal of these associations while maintaining the integrity of the remaining data structure. This functionality is particularly useful in systems where relationships between entities or attributes may change over time, requiring updates to reflect current data accuracy. The invention ensures that users can efficiently manage data relationships without requiring complex technical knowledge, improving usability and adaptability in data management applications. The method supports dynamic updates to associations, allowing for real-time adjustments to the data model. This capability is essential for applications where data relationships evolve, such as in customer relationship management, enterprise resource planning, or knowledge graph systems. The system processes these updates while preserving the overall structure of the data, ensuring consistency and reliability.

Claim 9

Original Legal Text

9. The method of claim 1 , said method comprising: said one or more processors providing a facility for a user to manually associate an attribute with an entity identified within the first data source.

Plain English Translation

This invention relates to data management systems that enable users to manually associate attributes with entities identified within a data source. The technology addresses the challenge of efficiently organizing and categorizing data by allowing users to manually tag or label entities, improving data retrieval and analysis. The system includes one or more processors that facilitate this manual association process. Users can interact with the system to select an entity from a first data source and then assign one or more attributes to that entity. The attributes may include metadata, labels, or other descriptive information that enhances the entity's context or relevance. The system supports flexible attribute assignment, allowing users to define custom attributes or select from predefined options. This manual association capability ensures that data remains accurately categorized according to user-defined criteria, improving searchability and usability. The invention may be applied in various domains, such as database management, knowledge graphs, or content management systems, where precise entity-attribute relationships are critical. By enabling direct user input, the system avoids reliance on automated tagging, which may introduce errors or inaccuracies. The method ensures that entities are properly labeled, enhancing data integrity and supporting more effective decision-making.

Claim 10

Original Legal Text

10. The method of claim 1 , said method comprising: said one or more processors providing a facility for a user to combine the two entities together in the data structure, wherein attributes of both entities of the two entities are associated with the combined two entities in the data structure.

Plain English Translation

This invention relates to data processing systems that manage and manipulate structured data, particularly for combining entities within a data structure while preserving their attributes. The problem addressed is the need to merge two distinct entities in a data structure while maintaining the integrity and accessibility of their individual attributes, avoiding data loss or fragmentation during the combination process. The method involves using one or more processors to provide a user with a facility to combine two entities within a data structure. When the entities are merged, the attributes of both original entities are retained and associated with the newly combined entity. This ensures that all relevant data from the individual entities remains accessible and linked to the combined entity, preventing information loss and maintaining data consistency. The approach allows users to dynamically integrate entities while preserving their unique characteristics, supporting applications in databases, knowledge graphs, or other structured data environments where entity relationships and attributes must be maintained during operations. The solution enhances data management by enabling seamless entity consolidation without sacrificing attribute details.

Claim 11

Original Legal Text

11. The method of claim 10 , said method comprising: said one or more processors calculating a frequency metric for the combined two entities based on a frequency at which each attribute of the combined two entities is associated with the combined two entities.

Plain English Translation

This invention relates to data processing systems that analyze relationships between entities, such as in knowledge graphs or databases, to determine how frequently attributes are associated with combined entities. The problem addressed is the need to quantify the strength or relevance of relationships between entities by evaluating the frequency of their shared attributes. The method involves processing data representing two or more entities and their associated attributes. A combined representation of the entities is generated, and a frequency metric is calculated for this combined representation. The frequency metric is determined by analyzing how often each attribute of the combined entities appears in association with them. This metric helps assess the significance or likelihood of the relationship between the entities based on their shared or related attributes. The system uses one or more processors to perform these calculations, ensuring efficient and automated analysis. The frequency metric can be used in various applications, such as recommendation systems, fraud detection, or knowledge graph refinement, where understanding entity relationships is critical. By quantifying attribute frequency, the method provides a measurable way to evaluate the strength of connections between entities in a dataset.

Claim 12

Original Legal Text

12. The method of claim 1 , said method comprising: said one or more processors combining the two entities into a single entity in response to a determination that an identified degree of similarity between the two entities is greater than a second predetermined threshold.

Plain English Translation

This invention relates to data processing systems that manage and analyze entities, such as documents, records, or other data objects, to identify and merge similar entities. The problem addressed is the inefficiency and inaccuracy of existing systems in determining when two entities should be combined, leading to either redundant data or missed opportunities for consolidation. The method involves using one or more processors to compare two entities and calculate a degree of similarity between them. If the similarity exceeds a first predetermined threshold, the system further evaluates whether the similarity is sufficient to justify merging the entities. This second evaluation involves comparing the similarity to a second, higher threshold. If the similarity surpasses this second threshold, the processors automatically combine the two entities into a single entity. The merging process ensures that redundant or duplicate data is eliminated while preserving the integrity of the consolidated information. The system may also include preprocessing steps to prepare the entities for comparison, such as normalizing data formats or extracting relevant features. The thresholds for similarity can be dynamically adjusted based on system performance, user preferences, or the nature of the data being processed. This approach improves data management by reducing redundancy and enhancing the accuracy of subsequent analyses.

Claim 13

Original Legal Text

13. The method of claim 1 , said method comprising: said one or more processors associating the two entities with each other in response to a determination that an identified degree of similarity between the two entities is greater than a second predetermined threshold and the two entities have a same entity name or a similar entity name.

Plain English Translation

This invention relates to entity association in data processing systems, specifically for determining and linking related entities based on similarity and naming conventions. The method involves analyzing two entities to assess their relationship by comparing their attributes and names. A processor calculates a degree of similarity between the two entities, comparing it against a predefined threshold. If the similarity exceeds this threshold and the entities share the same or a similar name, the system associates them. The association process may involve linking the entities in a database, updating records, or flagging them for further review. The method ensures accurate entity matching by combining similarity metrics with name-based validation, reducing false positives in entity resolution tasks. This approach is useful in applications like data deduplication, record linkage, and knowledge graph construction, where distinguishing between distinct but similarly named entities is critical. The system dynamically adjusts thresholds to balance precision and recall, improving the reliability of entity associations in large-scale datasets.

Claim 14

Original Legal Text

14. The method of claim 1 , wherein the said identifying entities within the first data source comprises including the first data source within a natural language algorithm.

Plain English Translation

This invention relates to a method for identifying entities within a data source using natural language processing (NLP). The method addresses the challenge of accurately extracting and categorizing entities from unstructured or semi-structured data, such as text documents, to improve data analysis, search, and knowledge management. The method involves processing a first data source, which may include text, documents, or other unstructured data, to identify and extract entities. The extraction process includes feeding the data source into a natural language algorithm designed to recognize and classify entities such as names, dates, locations, organizations, or other relevant terms. The algorithm may employ techniques like tokenization, part-of-speech tagging, named entity recognition (NER), or semantic analysis to detect and categorize entities within the data. The identified entities are then stored or output for further use, such as indexing, search optimization, or data enrichment. The method may also involve refining the entity extraction process by adjusting the natural language algorithm based on feedback or additional data sources. This approach enhances the accuracy and efficiency of entity recognition in large-scale data processing tasks.

Claim 15

Original Legal Text

15. The method of claim 1 , said method comprising: said one or more processors displaying a representation of the data structure to identify to a user associations between entities within the data structure.

Plain English Translation

This invention relates to data visualization techniques for displaying associations between entities within a data structure. The problem addressed is the difficulty users face in understanding complex relationships in large datasets, where traditional methods often fail to provide clear, intuitive visual representations. The method involves using one or more processors to display a visual representation of a data structure, where the representation highlights associations between different entities. The data structure may include various types of entities, such as nodes, links, or other interconnected elements, and the visualization helps users quickly identify relationships that might otherwise be obscured in raw data. The representation can be interactive, allowing users to explore different aspects of the data structure dynamically. The method may also involve preprocessing the data to extract relevant associations before visualization, ensuring that the displayed relationships are meaningful and relevant to the user's needs. The visualization can be customized based on user preferences or specific analysis goals, such as filtering certain types of associations or adjusting the layout for better clarity. The system may also support annotations or labels to provide additional context for the displayed relationships. By providing an intuitive visual representation of entity associations, this method enhances data comprehension, making it easier for users to analyze and interpret complex datasets. The approach is particularly useful in fields like network analysis, bioinformatics, or social network research, where understanding relationships between entities is critical.

Claim 16

Original Legal Text

16. The method of claim 15 , wherein the associations between entities within the data structure are displayed in response to a determination that the degree of similarity between the two entities is greater than a third predetermined threshold.

Plain English Translation

This invention relates to data visualization systems that analyze and display relationships between entities within a structured dataset. The problem addressed is the challenge of effectively presenting complex interconnections in large datasets, where traditional visualization methods may overwhelm users with excessive or irrelevant information. The method involves constructing a data structure that represents entities and their relationships, where each entity is associated with metadata describing its attributes. The system calculates similarity scores between entities based on their metadata, comparing these scores against predefined thresholds to determine whether relationships should be displayed. A first threshold determines whether entities are considered similar enough to form a relationship, while a second threshold filters out relationships that are too weak to be meaningful. The invention further includes a dynamic display mechanism that only shows relationships between entities when their similarity exceeds a third threshold, ensuring that only the most relevant connections are presented to the user. This selective visualization helps reduce clutter and improves the clarity of the data representation. The system may also allow users to adjust these thresholds to customize the level of detail in the displayed relationships.

Claim 17

Original Legal Text

17. The method of claim 15 , wherein the representation of the data structure identifies to the user associations between the entities within the data structure and attributes of the entities within the data structure.

Plain English Translation

This invention relates to data visualization techniques for representing complex data structures in a user-friendly manner. The problem addressed is the difficulty users face in understanding relationships and attributes within large, interconnected datasets, which often lack intuitive visualization methods. The method involves generating a visual representation of a data structure that highlights associations between entities and their attributes. The representation is designed to make these relationships immediately apparent to the user, improving comprehension and usability. The visualization may include graphical elements such as nodes, links, or annotations that explicitly show how entities are connected and what attributes they possess. This approach is particularly useful in fields like database management, network analysis, or knowledge graph exploration, where clarity in data relationships is critical. The method may also incorporate interactive features, allowing users to explore different aspects of the data structure dynamically. For example, users might hover over or select entities to view detailed attribute information or navigate through associated entities. The visualization can adapt to different levels of detail, providing both high-level overviews and granular insights as needed. By making entity relationships and attributes visually explicit, the method enhances user efficiency in analyzing and interpreting complex datasets.

Claim 18

Original Legal Text

18. The method of claim 1 , said method comprising: said one or more processors providing a facility for a user to manually input text data to be processed as another data source.

Plain English Translation

This invention relates to data processing systems that integrate multiple data sources, particularly for enhancing data analysis or machine learning tasks. The problem addressed is the need to incorporate user-provided text data alongside other data sources in a seamless and efficient manner. Traditional systems often require complex preprocessing or separate workflows to integrate manually entered text, limiting flexibility and usability. The invention provides a method for processing data that includes a facility for users to manually input text data, which is then treated as an additional data source alongside other structured or unstructured data. The system processes this text data in a unified manner, ensuring consistency with other data sources. This allows for more comprehensive analysis, as the manually input text can be combined with existing datasets to improve accuracy, relevance, or context in applications such as natural language processing, data enrichment, or predictive modeling. The method ensures that the manually entered text is properly formatted, validated, and integrated into the data pipeline without requiring separate preprocessing steps, enhancing efficiency and user experience. The system may also include features to preprocess or normalize the text data before integration, ensuring compatibility with downstream processing tasks. This approach enables users to dynamically supplement existing datasets with real-time or ad-hoc text inputs, improving adaptability in data-driven applications.

Claim 19

Original Legal Text

19. The method of claim 1 , said method comprising: said one or more processors providing a facility for the user to apply a weighting to a first attribute of an entity identified within the first data source to influence the impact of that the first attribute on the frequency metrics characterizing the entity identified within the first data source.

Plain English Translation

This invention relates to data analysis systems that process information from multiple sources to generate frequency metrics for entities. The problem addressed is the need to customize how different attributes of an entity influence these metrics, allowing users to prioritize certain data points over others. The system includes one or more processors that analyze data from at least two sources to identify entities and calculate frequency metrics for those entities. These metrics quantify how often the entities appear or are referenced across the data sources. The invention introduces a user-configurable weighting mechanism that allows users to adjust the influence of specific attributes of an entity on the calculated frequency metrics. For example, if an entity has multiple attributes (such as name, location, or category), the user can assign higher or lower weights to certain attributes to emphasize or de-emphasize their contribution to the frequency metrics. This weighting adjusts how the system interprets and prioritizes the data, ensuring that the metrics reflect the user's specific analytical needs. The system dynamically applies these weights during the analysis, recalculating the frequency metrics based on the user's preferences. This flexibility allows for more nuanced and tailored data insights, particularly in applications like market research, fraud detection, or customer behavior analysis, where certain attributes may be more relevant than others. The invention improves upon existing systems by providing granular control over attribute influence, enhancing the accuracy and relevance of the generated metrics.

Claim 20

Original Legal Text

20. The method of claim 1 , wherein the first data source is a web page or document.

Plain English Translation

A system and method for data processing involves extracting and analyzing information from a first data source, which can be a web page or document, to generate structured data. The system identifies relevant data elements within the source, such as text, tables, or metadata, and applies natural language processing (NLP) or other analytical techniques to extract meaningful information. This extracted data is then structured into a predefined format, such as a database schema or API response, enabling further processing, storage, or integration with other systems. The method may also include validating the extracted data against predefined rules or schemas to ensure accuracy and consistency. Additionally, the system can support multiple data sources, including databases, APIs, or other digital files, and may apply different extraction techniques based on the source type. The structured data can be used for reporting, analytics, or automated decision-making processes. The system improves data accessibility and usability by converting unstructured or semi-structured data into a standardized format, reducing manual effort and enhancing data reliability.

Claim 21

Original Legal Text

21. A computer program product, comprising one or more computer readable hardware storage devices having computer readable program code stored therein, said program code containing instructions executable by one or more processors of a computer system to implement a method, said method comprising: said one or more processors identifying a plurality of entities within a first data source; for each entity identified within the first data source, said one or more processors identifying within the first data source attributes of the entity identified within the first data source and relationships between the entity identified within the first data source and other entities identified within the first data source, and associating the attributes and relationships identified within the first data source with a first entity identified within a data structure; said one or more processors generating, for each entity identified within the first data source, a frequency metric characterizing the entity identified within the first data source, said frequency metric based on a frequency at which each attribute and relationship identified within the first data source is associated with the entity identified within the first data source, said generating, for each entity identified within the first data source, the frequency metric characterizing the entity identified within the first data source comprising: generating multiple virtual triples for the entity identified within the first data source, each virtual triple consisting of a subject, a predicate, and an object, wherein the subject is the entity identified within the first data source, the predicate is the relationship identified within the first data source, and the object is the attribute identified within the first data source; and computing the frequency metric (Score (triple)) characterizing the entity identified within the first data source for each triple according to: Score (triple)=SUM(TF)×SUM(ABS(LOG 10(1.0×(ALL.ACNT)/(I.NB_ENTITY)))), wherein SUM(TF)=count of number of instances of the triple per entity identified within the first data source, summed over the entities, wherein ALL.ACNT=total number of entities within the first data source, and wherein I.NB_ENTITY=count of number of entities of a predicate-object pair within the each triple; said one or more processors identifying a degree of similarity between two entities of the plurality of entities by comparing the respective frequency metrics of the two entities; and said one or more processors associating the two entities within the data structure in response to a determination that an identified degree of similarity between the two entities is greater than a first predetermined threshold.

Plain English Translation

This invention relates to a system for analyzing and comparing entities within a data source to identify similarities and relationships. The method involves processing a first data source to identify multiple entities and their associated attributes and relationships. For each entity, the system generates virtual triples consisting of a subject (the entity), a predicate (a relationship), and an object (an attribute). A frequency metric is computed for each triple, where the metric is based on the frequency of the triple's occurrence and a logarithmic function of the total number of entities in the data source relative to the number of entities sharing the same predicate-object pair. The frequency metric quantifies how characteristic a given attribute or relationship is for a particular entity. The system then compares the frequency metrics of different entities to determine their degree of similarity. If the similarity exceeds a predefined threshold, the entities are linked within a data structure. This approach enables automated entity resolution and relationship mapping in large datasets by leveraging statistical analysis of attribute and relationship frequencies.

Claim 22

Original Legal Text

22. A computer system, comprising one or more processors, one or more memories, and one or more computer readable hardware storage devices, said one or more hardware storage device containing program code executable by the one or more processors via the one or more memories to implement a method, said method comprising: said one or more processors identifying a plurality of entities within a first data source; for each entity identified within the first data source, said one or more processors identifying within the first data source attributes of the entity identified within the first data source and relationships between the entity identified within the first data source and other entities identified within the first data source, and associating the attributes and relationships identified within the first data source with a first entity identified within a data structure; said one or more processors generating, for each entity identified within the first data source, a frequency metric characterizing the entity identified within the first data source, said frequency metric based on a frequency at which each attribute and relationship identified within the first data source is associated with the entity identified within the first data source, said generating, for each entity identified within the first data source, the frequency metric characterizing the entity identified within the first data source comprising: generating multiple virtual triples for the entity identified within the first data source, each virtual triple consisting of a subject, a predicate, and an object, wherein the subject is the entity identified within the first data source, the predicate is the relationship identified within the first data source, and the object is the attribute identified within the first data source; and computing the frequency metric (Score (triple)) characterizing the entity identified within the first data source for each triple according to: Score (triple)=SUM(TF)×SUM(ABS(LOG 10(1.0×(ALL.ACNT)/(I.NB_ENTITY)))), wherein SUM(TF)=count of number of instances of the triple per entity identified within the first data source, summed over the entities, wherein ALL.ACNT=total number of entities within the first data source, and wherein I.NB_ENTITY=count of number of entities of a predicate-object pair within the each triple; said one or more processors identifying a degree of similarity between two entities of the plurality of entities by comparing the respective frequency metrics of the two entities; and said one or more processors associating the two entities within the data structure in response to a determination that an identified degree of similarity between the two entities is greater than a first predetermined threshold.

Plain English Translation

The invention relates to a computer system for analyzing and comparing entities within a data source by extracting attributes and relationships, then quantifying their significance to determine similarity between entities. The system processes a first data source to identify multiple entities and, for each entity, extracts its attributes and relationships with other entities. These attributes and relationships are stored in a data structure, forming a structured representation of the entities. The system generates a frequency metric for each entity by creating virtual triples (subject-predicate-object) where the subject is the entity, the predicate is a relationship, and the object is an attribute. The frequency metric is calculated using a formula that accounts for the occurrence frequency of each triple and the distribution of predicate-object pairs across all entities. The metric combines term frequency (SUM(TF)) with a logarithmic inverse frequency component (SUM(ABS(LOG 10(1.0×(ALL.ACNT)/(I.NB_ENTITY))))), where ALL.ACNT is the total number of entities and I.NB_ENTITY is the count of entities sharing a predicate-object pair. The system then compares the frequency metrics of different entities to determine their similarity. If the similarity exceeds a predefined threshold, the entities are linked in the data structure. This approach enables automated entity resolution and clustering based on semantic and relational patterns within the data.

Patent Metadata

Filing Date

Unknown

Publication Date

March 10, 2020

Inventors

Patrick Dantressangle
Simon Laws
Stacey H. Ronaghan
Peter Wooldridge

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, FAQs, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “DATA PROCESSING” (10585893). https://patentable.app/patents/10585893

© 2026 Nomic Interactive Technology LLC. Machine-readable context available at /api/llm-context/10585893. See llms.txt for full attribution policy.

DATA PROCESSING