Patentable/Patents/US-11461671
US-11461671

Data quality tool

PublishedOctober 4, 2022
Assigneenot available in USPTO data we have
Inventorsnot available in USPTO data we have
Technical Abstract

An apparatus includes a database and a processor. The database stores a set of columns and rules assigned to each column. The rules are used to assess the quality of the data stored in the columns. The processor determines, based in part on the set of rules, the set of columns, and metadata and statistical properties of the columns, a machine learning policy adapted to generate a set of candidate rules for a given column. The processor further determines those columns of the set of columns that are similar to a subject column based on the names of the columns and the names of the tables storing the columns. The processor applies the machine learning policy to the subject column of data, rules of the similar columns, and metadata and statistical properties of the subject column to determine a set of candidate rules for the subject column.

Patent Claims
11 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 3

Original Legal Text

3. The apparatus of claim 1, wherein the third subset of statistical properties comprises at least one of a mean of the third column of data and a range of values of the third column of data.

Plain English translation pending...
Claim 4

Original Legal Text

4. The apparatus of claim 1, wherein the third set of metadata assigned to the third column of data comprises information about a use of the third column of data, information about a property of the third column of data, and information about an availability of the third column of data.

Plain English Translation

This invention relates to data management systems, specifically improving the organization and usability of data columns in databases or data tables. The problem addressed is the lack of comprehensive metadata associated with data columns, which makes it difficult for users to understand the purpose, properties, and availability of the data. The solution involves assigning a third set of metadata to a third column of data, where this metadata includes three key components: information about the use of the column, information about the column's properties, and information about the column's availability. The use metadata describes how the column is intended to be utilized, such as its role in calculations, filtering, or reporting. The property metadata details characteristics like data type, constraints, or relationships with other columns. The availability metadata indicates whether the column is active, deprecated, or restricted. This structured metadata approach enhances data governance, ensures proper usage, and improves system interoperability by providing clear, actionable insights into the data's context and status. The invention is particularly useful in large-scale databases where maintaining data integrity and usability is critical.

Claim 6

Original Legal Text

6. The apparatus of claim 5, wherein determining the second set of candidate rules for the third column of data is further based on a set of inputs from an administrator, the set of inputs comprising a known factor that affected one or more pieces of data of the third column of data.

Plain English translation pending...
Claim 7

Original Legal Text

7. The apparatus of claim 5, wherein determining that the first probability that the first result of applying the first rule of the first set of candidate rules to the first piece of data is affected by the first event of the event log affecting the first piece of data of the third column of data is less than the threshold comprises using a second machine learning policy.

Plain English translation pending...
Claim 10

Original Legal Text

10. The method of claim 8, wherein the third subset of statistical properties comprises at least one of a mean of the third column of data and a range of values of the third column of data.

Plain English translation pending...
Claim 11

Original Legal Text

11. The method of claim 8, wherein the third set of metadata assigned to the third column of data comprises information about a use of the third column of data, information about a property of the third column of data, and information about an availability of the third column of data.

Plain English Translation

This invention relates to data management systems, specifically methods for assigning and utilizing metadata to enhance data organization and accessibility. The problem addressed is the lack of comprehensive metadata in datasets, which makes it difficult to understand data usage, properties, and availability, leading to inefficiencies in data processing and analysis. The method involves assigning metadata to columns of data within a dataset. A third set of metadata is specifically assigned to a third column of data, containing detailed information about the column's use, properties, and availability. The metadata about use describes how the column is utilized in data processing or analysis, such as its role in calculations or decision-making. The metadata about properties defines characteristics of the column, such as data type, format, or constraints. The metadata about availability indicates whether the column is accessible, restricted, or subject to specific conditions. This approach improves data management by providing a structured way to document and retrieve critical information about data columns, enabling better data governance, compliance, and usability. The metadata can be leveraged by applications or users to make informed decisions about data handling, ensuring efficient and accurate data operations.

Claim 13

Original Legal Text

13. The method of claim 12, wherein determining the second set of candidate rules for the third column of data is further based on a set of inputs from an administrator, the set of inputs comprising a known factor that affected one or more pieces of data of the third column of data.

Plain English Translation

This invention relates to data analysis systems that automatically generate candidate rules for identifying patterns or anomalies in datasets. The problem addressed is the challenge of efficiently and accurately detecting meaningful patterns in large datasets, particularly when certain known factors influence the data. The method involves analyzing a dataset with multiple columns of data to identify candidate rules that describe relationships or patterns within the data. For a given column of data, the system determines a set of candidate rules by evaluating statistical properties, correlations, or other relevant factors. The system then refines these candidate rules based on additional inputs, such as known factors that influenced the data in that column. These known factors may include external events, system changes, or other variables that the administrator identifies as relevant. By incorporating these inputs, the system improves the accuracy and relevance of the generated rules, ensuring that the detected patterns align with real-world influences on the data. The method may be applied iteratively across multiple columns to build a comprehensive set of rules that describe the dataset's behavior.

Claim 14

Original Legal Text

14. The method of claim 12, wherein determining that the first probability that the first result of applying the first rule of the first set of candidate rules to the first piece of data is affected by the first event of the event log affecting the first piece of data of the third column of data is less than the threshold comprises using a second machine learning policy.

Plain English translation pending...
Claim 17

Original Legal Text

17. The system of claim 15, wherein the third set of metadata assigned to the third column of data comprises information about a use of the third column of data, information about a property of the third column of data, and information about an availability of the third column of data.

Plain English translation pending...
Claim 19

Original Legal Text

19. The system of claim 18, wherein determining the second set of candidate rules for the third column of data is further based on a set of inputs from an administrator, the set of inputs comprising a known factor that affected one or more pieces of data of the third column of data.

Plain English translation pending...
Claim 20

Original Legal Text

20. The system of claim 18, wherein determining that the first probability that the first result of applying the first rule of the first set of candidate rules to the first piece of data is affected by the first event of the event log affecting the first piece of data of the third column of data is less than the threshold comprises using a second machine learning policy.

Plain English Translation

The system relates to data processing and rule evaluation in event-driven environments. The problem addressed is accurately assessing the impact of events on data transformations, particularly when applying rules to datasets. The system determines whether an event in an event log affects the outcome of applying a rule to a specific data entry, using probabilistic analysis. A key challenge is distinguishing meaningful event impacts from noise, ensuring reliable rule-based data processing. The system includes a machine learning policy that evaluates the likelihood (probability) that an event influences the result of applying a rule to a data entry. If this probability falls below a predefined threshold, the system concludes that the event does not significantly affect the rule's outcome. The machine learning policy is trained to assess these probabilities based on historical event data and rule application results. This approach improves the accuracy of rule-based data transformations by filtering out irrelevant events, reducing false positives in impact assessments. The system also includes a second machine learning policy that refines the probability calculation. This policy further analyzes whether the event's impact on the data entry is statistically significant, ensuring robust decision-making. The combination of these policies enhances the system's ability to distinguish meaningful event impacts from irrelevant ones, improving the reliability of rule-based data processing in dynamic environments.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

Patent Metadata

Filing Date

June 3, 2019

Publication Date

October 4, 2022

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, FAQs, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “Data quality tool” (US-11461671). https://patentable.app/patents/US-11461671

© 2026 Nomic Interactive Technology LLC. Machine-readable context available at /api/llm-context/US-11461671. See llms.txt for full attribution policy.