Patentable/Patents/US-10430417
US-10430417

System and method for visual bayesian data fusion

PublishedOctober 1, 2019
Assigneenot available in USPTO data we have
Inventorsnot available in USPTO data we have
Technical Abstract

System and method for visual Bayesian data fusion are disclosed. In an example, a plurality of datasets associated with a topic are obtained from a data lake. Each of the plurality of datasets include information corresponding to various attributes of the topic. Further, the plurality of datasets are joined to obtain a joined dataset. Furthermore, distribution associated with a target attribute is predicted using Bayesian modeling by selecting a plurality of attributes (k) based on mutual information with the target attribute in the joined dataset, learning a minimum spanning tree based Bayesian structure using the selected attributes and the target attribute, learning conditional probabilistic tables at each node of the minimum spanning tree based Bayesian structure; and predicting the distribution associated with the target attribute by querying the conditional probabilistic tables, thereby facilitating visual Bayesian data fusion.

Patent Claims
13 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1

1. A processor-implemented method comprising: obtaining, by one or more hardware processors, a plurality of datasets associated with a topic from a data lake, wherein each of the plurality of datasets comprise information corresponding to various attributes of the topic; joining, by the one or more hardware processors, the plurality of datasets to obtain a joined dataset; predicting, by the one or more hardware processors, distribution associated with a target attribute using Bayesian modeling by selecting a plurality of attributes (k) based on mutual information with the target attribute in the joined dataset; learning a minimum spanning tree based Bayesian structure on a feature graph that is created by calculating pairwise mutual information between selected attributes and the target attributes; learning conditional probabilistic tables at each node of the minimum spanning tree based Bayesian structure; predicting the distribution associated with the target attribute by querying the conditional probabilistic tables, thereby facilitating visual Bayesian data fusion; and automatically generating a plurality of tags, based on column headers, to index files in the conditional probabilistic tables.

2

2. The method of claim 1 , wherein the plurality of datasets are joined based on a type of join and wherein the type of join comprises an inner join, an outer join, a left join, and a right join.

3

3. The method of claim 1 , wherein learning a minimum spanning tree based Bayesian structure on the feature graph that is created by calculating the pairwise mutual information between the selected attributes and the target attributes comprises: learning the minimum spanning tree on the plurality of attributes and the target attribute using the pairwise mutual information as a threshold; initializing each edge in the minimum spanning tree to random direction and dropping edge with mutual information less than the threshold; flipping each edge direction to compute 2^(k) directed graphs; calculating the cross entropy of each graph; and selecting a graph with least cross entropy as the minimum spanning tree based Bayesian structure.

4

4. The method of claim 1 , wherein the plurality of attributes and the target attribute comprise discrete and continuous variables.

5

5. The method of claim 4 , wherein learning the conditional probabilistic tables at each node of the minimum spanning tree based Bayesian structure comprises: discretizing the continuous variables by fixed size binning; and learning the conditional probabilistic tables at each node of the minimum spanning tree based Bayesian structure upon discretizing the continuous variables.

6

6. The method of claim 1 , further comprising: computing a confidence score for the distribution associated with the target attribute predicted by querying the conditional probabilistic tables using ideal distribution and probabilistic distribution; predicting distribution associated with the target attribute using textual similarity; and selecting one of a) the distribution associated with the target attribute predicted using the textual similarity and b) the distribution associated with the target attribute predicted by querying the conditional probabilistic tables based on the computed confidence score.

7

7. A system comprising: one or more memories; and one or more hardware processors, the one or more memories coupled to the one or more hardware processors, wherein the one or more hardware processors are configured to execute programmed instructions stored in the one or more memories to: obtain a plurality of datasets associated with a topic from a data lake, wherein each of the plurality of datasets comprise information corresponding to various attributes of the topic; join the plurality of datasets to obtain a joined dataset; predict distribution associated with a target attribute using Bayesian modeling by selecting a plurality of attributes (k) based on mutual information with the target attribute in the joined dataset; learning a minimum spanning tree based Bayesian structure on a feature graph that is created by calculating pairwise mutual information between selected attributes and the target attributes; learning conditional probabilistic tables at each node of the minimum spanning tree based Bayesian structure; predicting the distribution associated with the target attribute by querying the conditional probabilistic tables, thereby facilitating visual Bayesian data fusion; and automatically generating a plurality of tags, based on column headers, to index files in the conditional probabilistic tables.

8

8. The system of claim 7 , wherein the plurality of datasets are joined based on a type of join and wherein the type of join comprises an inner join, an outer join, a left join, and a right join.

9

9. The system of claim 7 , wherein one or more hardware processors are further configured to execute the programmed instructions to: learn the minimum spanning tree on the plurality of attributes and the target attribute using pairwise mutual information as a threshold; initialize each edge in the minimum spanning tree to random direction and dropping edge with mutual information less than the threshold; flip each edge direction to compute 2^(k) directed graphs; calculate the cross entropy of each graph; and select a graph with least cross entropy as the minimum spanning tree based Bayesian structure.

10

10. The system of claim 7 , wherein the plurality of attributes and the target attribute comprise discrete and continuous variables.

11

11. The system of claim 10 , wherein one or more hardware processors are further configured to execute the programmed instructions to: discretize the continuous variables by fixed size binning; and learn the conditional probabilistic tables at each node of the minimum spanning tree based Bayesian structure upon discretizing the continuous variables.

12

12. The system of claim 7 , wherein one or more hardware processors are further configured to execute the programmed instructions to: compute a confidence score for the distribution associated with the target attribute predicted by querying the conditional probabilistic tables using ideal distribution and probabilistic distribution; predict distribution associated with the target attribute using textual similarity; and select one of a) the distribution associated with the target attribute predicted using the textual similarity and b) the distribution associated with the target attribute predicted by querying the conditional probabilistic tables based on the computed confidence score.

13

13. A non-transitory computer readable medium embodying a program executable in a computing device, said program comprising: a program code for obtaining a plurality of datasets associated with a topic from a data lake, wherein each of the plurality of datasets comprise information corresponding to various attributes of the topic; a program code for joining the plurality of datasets to obtain a joined dataset; a program code for predicting distribution associated with a target attribute using Bayesian modeling by selecting a plurality of attributes (k) based on mutual information with the target attribute in the joined dataset; learning a minimum spanning tree based Bayesian structure on a feature graph that is created by calculating pairwise mutual information between selected attributes and the target attributes; learning conditional probabilistic tables at each node of the minimum spanning tree based Bayesian structure; predicting the distribution associated with the target attribute by querying the conditional probabilistic tables, thereby facilitating visual Bayesian data fusion; and automatically generating a plurality of tags, based on column headers, to index files in the conditional probabilistic tables.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

Patent Metadata

Filing Date

March 9, 2017

Publication Date

October 1, 2019

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “System and method for visual bayesian data fusion” (US-10430417). https://patentable.app/patents/US-10430417

© 2026 Patentable. All rights reserved.

Patentable is a research and drafting-assistant tool, not a law firm, and does not provide legal advice. Documents we generate are drafts for review by a licensed patent attorney.