7426497

Method and Apparatus for Analysis and Decomposition of Classifier Data Anomalies

PublishedSeptember 16, 2008
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
31 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1

1. A human assisted method, implemented with a computing device, of debugging training data used to train a machine learning classifier, the method comprising: obtaining a machine learning classifier training data set, wherein the machine learning classifier training data set comprises data instances mapped to predictions for those data instances, wherein each of the data instances comprises a data triplet containing a prediction ID, a weight, and an input description; evaluating potential errors in the data set according to one or more prediction-centric metrics; displaying one or more of the potential errors in the data set along with one or more of the metrics in a format that is user-configurable with reference to one or more of the metrics; and debugging the machine learning classifier training data set using an integrated debugging tool configured to implement a debugging loop, including removing one or more of the potential errors in the data set, to obtain a debugged machine learning classifier data set for use in training a machine learning classifier.

2

2. The method of claim 1 , wherein debugging the machine learning classifier training data set using the integrated debugging tool further comprises: determining with the integrated debugging tool whether data noise in the machine learning classifier training data set exceeds a threshold; and performing an estimation and simplification step, with the integrated debugging tool, on the machine learning classifier training data set if the data noise in the machine learning classifier training data set exceeds the threshold to obtain a simplified training data set.

3

3. The method of claim 2 , wherein the data noise is a distribution skewness type of data noise.

4

4. The method of claim 2 , wherein the data noise is an ambiguity type of data noise.

5

5. The method of claim 2 , wherein the step of performing the estimation and simplification step, with the integrated debugging tool, further comprises identifying hypothetical fixes to the machine learning classifier training data set and estimating effects of the hypothetical fixes on training data errors.

6

6. The method of claim 5 , wherein the step of identifying hypothetical fixes to the machine learning classifier training data set and estimating effects of the hypothetical fixes on training data errors further comprises reducing data errors in the machine learning classifier training data set caused by at least one of distribution skewness and ambiguity, thereby exposing other types of data errors in the machine learning classifier training data set.

7

7. The method of claim 1 , wherein debugging the machine learning classifier training data set using the integrated debugging tool further comprises: running a panel of prediction-centric diagnostic metrics on the machine learning classifier training data set; and providing to a user prediction based listings of the results of the panel of prediction-centric diagnostic metrics.

8

8. The method of claim 7 , wherein the step of providing to the user the prediction based listings of the results of the panel of prediction-centric diagnostic metrics further comprises providing user configurable prediction based listings of the results.

9

9. The method of claim 8 , wherein providing the user configurable prediction based listings of the results further comprises providing to the user sortable prediction based listings of the results.

10

10. The method of claim 8 , wherein providing the user configurable prediction based listings of the results further comprises providing to the user filtered prediction based listings of the results.

11

11. The method of claim 8 , wherein providing the user configurable prediction based listings of the results further comprises generating a graphical user interface which displays the prediction based listings of the results, and which is configured to receive user inputs and in response to configure the prediction based listings of the results.

12

12. The method of claim 11 , wherein generating the graphical user interface further comprises highlighting statistical outliers in the prediction based listings of the results.

13

13. The method of claim 11 , wherein generating the graphical user interface further comprises highlighting failed queries to associate the failed queries with failure causes.

14

14. The method of claim 13 , wherein highlighting failed queries to associate the failed queries with failure causes further comprises color coding failed queries by failure cause.

15

15. The method of claim 14 , wherein the graphical user interface is configured to display identified probable causes of the failure of failed queries.

16

16. The method of claim 1 , and further comprising training the machine learning classifier using the debugged machine learning classifier data set.

17

17. A human assisted method, implemented by a computing device, of debugging training data used to train a machine learning classifier, the method comprising: obtaining a machine learning classifier training data set, wherein the machine learning classifier training data set comprises data instances mapped to predictions for those data instances; debugging the machine learning classifier training data set using a computer-implemented integrated debugging tool configured to implement a debugging loop to obtain a debugged machine learning classifier data set for use in training a machine learning classifier, wherein debugging the machine learning classifier training data set using the integrated debugging tool further comprises: running a panel of prediction-centric diagnostic metrics on the machine learning classifier training data; set and providing to a user prediction-based listings of the results of the panel of prediction-centric diagnosis metrics; wherein providing to the user the prediction-based listings of the results of the panel of prediction-centric diagnostic metrics further comprises providing user-configurable prediction based listings of the results; wherein providing the user-configurable prediction based listings of the results further comprises generating a graphical user interface which displays the prediction based listings of the results, and which is configured to receive user inputs and in response to configure the prediction based listings of the results; wherein generating the graphical user interface further comprises highlighting failed queries to associate the failed queries with failure causes; and wherein the graphical user interface is configured to receive a user input corresponding to a prediction cluster, and in response to zoom into the prediction cluster to display individual predictions included in the prediction cluster.

18

18. A classifier analyzer, executed on a computing device, which provides human assisted debugging of machine learning classifier training data used to train a machine learning classifier, the classifier analyzer being configured to implement steps comprising: obtaining a machine learning classifier training data set, wherein the machine learning classifier training data set comprises data instances mapped to predictions for those data instances, wherein each of the data instances comprises a data triplet containing a prediction ID, a weight and an input description; evaluating potential errors in the data set according to one or more prediction-centric metrics; displaying one or more of the potential errors in the data set along with one or more of the metrics in a format that is user-configurable with reference to one or more of the metrics; and debugging the machine learning classifier training data set using a debugging loop, including removing one or more of the potential errors in the data set to obtain a debugged machine learning classifier data set for use in training a machine learning classifier.

19

19. The classifier analyzer of claim 18 , wherein the step of debugging the machine learning classifier training data set further comprises: determining whether data noise in the machine learning classifier training data set exceeds a threshold; and performing an estimation and simplification step on the machine learning classifier training data set if the data noise in the machine learning classifier training data set exceeds the threshold to obtain a simplified training data set.

20

20. The classifier analyzer of claim 19 , wherein performing the estimation and simplification step further comprises identifying hypothetical fixes to the machine learning classifier training data set and estimating effects of the hypothetical fixes on training data errors.

21

21. The classifier analyzer of claim 20 , wherein the step of identifying hypothetical fixes to the machine learning classifier training data set and estimating effects of the hypothetical fixes on training data errors further comprises reducing data errors in the machine learning classifier training data set caused by at least one of distribution skewness and ambiguity, thereby exposing other types of data errors in the machine learning classifier training data set.

22

22. The classifier analyzer of claim 18 , wherein debugging the machine learning classifier training data set using the debugging loop further comprises: running a panel of prediction-centric diagnostic metrics on the machine learning classifier training data set; and providing to a user prediction based listings of the results of the panel of prediction-centric diagnostic metrics.

23

23. The classifier analyzer of claim 22 , wherein the step of providing to the user the prediction based listings of the results of the panel of prediction-centric diagnostic metrics further comprises providing user configurable prediction based listings of the results.

24

24. The classifier analyzer of claim 23 , wherein providing the user configurable prediction based listings of the results further comprises providing to the user sortable prediction based listings of the results.

25

25. The classifier analyzer of claim 23 , wherein providing the user configurable prediction based listings of the results further comprises providing to the user filtered prediction based listings of the results.

26

26. The classifier analyzer of claim 23 , wherein providing the user configurable prediction based listings of the results further comprises generating a graphical user interface which displays the prediction based listings of the results, and which is configured to receive user inputs and in response to configure the prediction based listings of the results.

27

27. The classifier analyzer of claim 26 , wherein generating the graphical user interface further comprises highlighting statistical outliers in the prediction based listings of the results.

28

28. The classifier analyzer of claim 26 , wherein generating the graphical user interface further comprises highlighting failed queries to associate the failed queries with failure causes.

29

29. The classifier analyzer of claim 28 , wherein highlighting failed queries to associate the failed queries with failure causes further comprises color coding failed queries by failure cause.

30

30. The classifier analyzer of claim 29 , wherein the graphical user interface is configured to display identified probable causes of the failure of failed queries.

31

31. The classifier analyzer of claim 26 , wherein the graphical user interface is configured to receive a user input corresponding to a prediction cluster, and in response to zoom into the prediction cluster to display individual predictions included in the prediction cluster.

Patent Metadata

Filing Date

Unknown

Publication Date

September 16, 2008

Inventors

Ana Sultana Bacioiu
David Michael Sauntry
James Scott Boyle
Leon Chih Wen Wong
Peter F. Leonard
Raman Chandrasekar

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “METHOD AND APPARATUS FOR ANALYSIS AND DECOMPOSITION OF CLASSIFIER DATA ANOMALIES” (7426497). https://patentable.app/patents/7426497

© 2026 Patentable. All rights reserved.

Patentable is a research and drafting-assistant tool, not a law firm, and does not provide legal advice. Documents we generate are drafts for review by a licensed patent attorney.

METHOD AND APPARATUS FOR ANALYSIS AND DECOMPOSITION OF CLASSIFIER DATA ANOMALIES — Ana Sultana Bacioiu | Patentable