Single Identifier Platform for Storing Entity Data

PublishedJuly 8, 2025

Assigneenot available in USPTO data we have

InventorsHua Li Sophie Liu Yi He Zhixuan Wang Chi Zhang+6 more

Technical Abstract

Patent Claims

19 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A computer-implemented method, comprising: receiving a plurality of data records from one or more data sources; providing at least a subset of the data records to a scoring model that determines scores for various pairings of the data records, a score for a given pair of the data records representing a probability or likelihood that the given pair of data records contains data elements about the same entity; generating a graph data structure that includes a plurality of nodes, each individual node of the plurality of nodes representing a different data record from the plurality of data records, where edges between given node pairs are associated with corresponding determined scores for respective pairs of data records; performing optimal weighted clustering of the graph data structure to determine final clusters of the plurality of nodes, wherein computer processing time is reduced in performing the optimal weighted clustering at least in part by reduction to a linear programming problem such that only a subset of millions of potential clusters possible from the plurality of data records are analyzed in determining the final clusters; assigning a different unique identifier to each individual cluster of the final clusters, where different identifiers represent different entities; and responding to a request for data regarding a given entity by providing aggregated data elements from those data records of the plurality of data records associated with a cluster of the final clusters having an identifier that represents the given entity.

2. The method of claim 1, further comprising, prior to performing the optimal weighted clustering, performing a connected component analysis of the graph data structure, including pruning one or more edges that fall below a threshold score.

3. The method of claim 2, wherein the threshold score indicates whether data records represented by a node pair of the given node pairs connected by an edge of the edges belong to the same entity as each other, wherein a first given node pair having a first edge of the edges associated with a first score that exceeds the threshold score belong to the same entity as each other, and wherein a second given node pair having a second edge of the edges associated with a second score that is less than the threshold score belong to different entities than each other.

4. The method of claim 3, wherein pruning one or more edges that fall below the threshold score comprises removing the one or more edges associated with the score that falls below the threshold score such that one or more of the given node pairs connected by the removed one or more edges are no longer connected via the removed one or more edges.

5. The method of claim 1, wherein the scoring model comprises a machine learning algorithm that learns weights associated with different attributes of a portion of the data records to generate the scores for individual pairs of data records.

6. The method of claim 1, further comprising excluding at least one node of the plurality of nodes from at least the optimal weighted clustering based on a source of the at least one node according to a rule or restriction that limits combinations of sources of the data records.

7. The method of claim 1, further comprising merging at least two of the final clusters into a larger cluster based on the at least two of the final clusters being associated with the same entity.

8. A non-transitory computer readable medium comprising instructions that, when executed by one or more hardware processors, cause the one or more hardware processors to perform operations comprising at least: receiving a plurality of data records from one or more data sources; providing at least a subset of the data records to a scoring model that determines scores for various pairings of the data records, a score for a given pair of the data records representing a probability that the given pair of data records contains data elements about the same entity; generating a graph data structure that includes a plurality of nodes, each individual node of the plurality of nodes representing a different data record from the plurality of data records, where edges between given node pairs are associated with corresponding determined scores for respective pairs of data records; performing optimal weighted clustering of the graph data structure to determine final clusters of the plurality of nodes, wherein computer processing time is reduced in performing the optimal weighted clustering at least in part by reduction to a linear programming problem that starts from zero cluster and iteratively increases the number of clusters until an optimal clustering is determined for the graph data structure such that only a subset of potential clusters possible from the plurality of data records are analyzed in determining the final clusters; assigning a different unique identifier to each individual cluster of the final clusters, where different identifiers represent different entities; and responding to a request for data regarding a given entity by providing aggregated data elements from those data records of the plurality of data records associated with a cluster of the final clusters having an identifier that represents the given entity.

9. The non-transitory computer readable medium of claim 8, wherein the operations further comprise, prior to performing the optimal weighted clustering, performing a connected component analysis of the graph data structure, including pruning one or more edges that fall below a threshold score.

10. The non-transitory computer readable medium of claim 8, wherein the threshold score varies dynamically according to one or more parameters.

11. The non-transitory computer readable medium of claim 9, wherein the threshold score indicates whether data records represented by a node pair of the given node pairs connected by an edge of the edges belong to the same entity as each other, wherein a first given node pair having a first edge of the edges associated with a first score that exceeds the threshold score belong to the same entity as each other, and wherein a second given node pair having a second edge of the edges associated with a second score that is less than the threshold score belong to different entities than each other.

12. The non-transitory computer readable medium of claim 8, wherein the scoring model comprises a machine learning algorithm that learns weights associated with different attributes of a portion of the data records to generate the scores for individual pairs of data records.

13. A data aggregation and computation system, comprising: a data store configured to store a plurality of data records related to a plurality of individual entities; and a hardware processor configured to: receive the plurality of data records from one or more data sources; provide at least a subset of the data records to a scoring model that determines scores for various pairings of the data records, a score for a given pair of the data records representing a probability or likelihood that the given pair of data records contains data elements about the same entity; generate a graph data structure that includes a plurality of nodes, each individual node of the plurality of nodes representing a different data record from the plurality of data records, where edges between given node pairs are associated with corresponding determined scores for respective pairs of data records; perform optimal weighted clustering of the graph data structure to determine final clusters of the plurality of nodes, wherein computer processing time is reduced in performing the optimal weighted clustering at least in part by reduction to a linear programming problem that starts from zero cluster and iteratively increases the number of clusters until an optimal clustering is determined for the graph data structure such that only a subset of potential clusters possible from the plurality of data records are analyzed in determining the final clusters; assign a different unique identifier to each individual cluster of the final clusters, where different identifiers represent different entities of the individual entities; and respond to a request for data regarding a given entity of the individual entities by providing aggregated data elements from those data records of the plurality of data records associated with a cluster of the final clusters having an identifier that represents the given entity.

14. The system of claim 13, wherein the hardware processor is further configured to, prior to performing the optimal weighted clustering, perform a connected component analysis of the graph data structure, including pruning one or more edges that fall below a threshold score.

15. The system of claim 14, wherein the threshold score indicates whether data records represented by a node pair of the given node pairs connected by an edge of the edges belong to the same entity as each other, wherein a first given node pair having a first edge of the edges associated with a first score that exceeds the threshold score belong to the same entity as each other, and wherein a second given node pair having a second edge of the edges associated with a second score that is less than the threshold score belong to different entities than each other.

16. The system of claim 14, wherein pruning one or more edges that fall below the threshold score comprises removing the one or more edges associated with the score that falls below the threshold score such that one or more of the given node pairs connected by the removed one or more edges are no longer connected via the removed one or more edges.

17. The system of claim 13, wherein the scoring model comprises a machine learning algorithm that learns weights associated with different attributes of a portion of the data records to generate the scores for individual pairs of data records.

18. The system of claim 13, wherein the hardware processor is further configured to exclude at least one node of the plurality of nodes from at least the optimal weighted clustering based on a source of the at least one node according to a rule or restriction that limits combinations of sources of the data records.

19. The system of claim 13, wherein the hardware processor is further configured to merge at least two of the final clusters into a larger cluster based on the at least two of the final clusters being associated with the same entity.

Patent Metadata

Filing Date

Unknown

Publication Date

July 8, 2025

Inventors

Hua Li

Sophie Liu

Yi He

Zhixuan Wang

Chi Zhang

Kevin Chen

Shanji Xiong

Christer Dichiara

Mason Carpenter

Mark Hirn

Julian Yarkony

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search