US-11461671

Data quality tool

PublishedOctober 4, 2022

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

An apparatus includes a database and a processor. The database stores a set of columns and rules assigned to each column. The rules are used to assess the quality of the data stored in the columns. The processor determines, based in part on the set of rules, the set of columns, and metadata and statistical properties of the columns, a machine learning policy adapted to generate a set of candidate rules for a given column. The processor further determines those columns of the set of columns that are similar to a subject column based on the names of the columns and the names of the tables storing the columns. The processor applies the machine learning policy to the subject column of data, rules of the similar columns, and metadata and statistical properties of the subject column to determine a set of candidate rules for the subject column.

Patent Claims

11 claims

Legal claims defining the scope of protection, as filed with the USPTO.

3. The apparatus of claim 1, wherein the third subset of statistical properties comprises at least one of a mean of the third column of data and a range of values of the third column of data.

4. The apparatus of claim 1, wherein the third set of metadata assigned to the third column of data comprises information about a use of the third column of data, information about a property of the third column of data, and information about an availability of the third column of data.

6. The apparatus of claim 5, wherein determining the second set of candidate rules for the third column of data is further based on a set of inputs from an administrator, the set of inputs comprising a known factor that affected one or more pieces of data of the third column of data.

7. The apparatus of claim 5, wherein determining that the first probability that the first result of applying the first rule of the first set of candidate rules to the first piece of data is affected by the first event of the event log affecting the first piece of data of the third column of data is less than the threshold comprises using a second machine learning policy.

10. The method of claim 8, wherein the third subset of statistical properties comprises at least one of a mean of the third column of data and a range of values of the third column of data.

11. The method of claim 8, wherein the third set of metadata assigned to the third column of data comprises information about a use of the third column of data, information about a property of the third column of data, and information about an availability of the third column of data.

13. The method of claim 12, wherein determining the second set of candidate rules for the third column of data is further based on a set of inputs from an administrator, the set of inputs comprising a known factor that affected one or more pieces of data of the third column of data.

14. The method of claim 12, wherein determining that the first probability that the first result of applying the first rule of the first set of candidate rules to the first piece of data is affected by the first event of the event log affecting the first piece of data of the third column of data is less than the threshold comprises using a second machine learning policy.

17. The system of claim 15, wherein the third set of metadata assigned to the third column of data comprises information about a use of the third column of data, information about a property of the third column of data, and information about an availability of the third column of data.

19. The system of claim 18, wherein determining the second set of candidate rules for the third column of data is further based on a set of inputs from an administrator, the set of inputs comprising a known factor that affected one or more pieces of data of the third column of data.

20. The system of claim 18, wherein determining that the first probability that the first result of applying the first rule of the first set of candidate rules to the first piece of data is affected by the first event of the event log affecting the first piece of data of the third column of data is less than the threshold comprises using a second machine learning policy.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G06N G06F

Patent Metadata

Filing Date

June 3, 2019

Publication Date

October 4, 2022

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search