7542959

Feature Selection Method Using Support Vector Machine Classifier

PublishedJune 2, 2009
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
19 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1

1. A computer-implemented method for predicting patterns in biological data, wherein the data comprises a large set of features that describe the data and a sample set from which the biological data is obtained is much smaller than the large set of features, the method comprising: identifying a determinative subset of features that are most correlated to the patterns comprising: (a) inputting the data into a computer processor programmed for executing support vector machine classifiers; (b) training a support vector machine classifier with a training data set comprising at least a portion of the sample set and having known outcomes with respect to the patterns, wherein the classifier comprises weights having weight values that correspond to the features in the data set and removal of a subset of features affects the weight values; (c) ranking the features according to their corresponding weight values; (d) removing one or more features corresponding to the smallest weight values; (e) training a new classifier with the remaining features; (f) repeating steps (c) through (e) for a plurality of iterations until a final subset having a pre-determined number of features remains; and generating at a printer or display device a report comprising a listing of the features in the final subset, wherein the final subset comprises the determinative subset of features for determining biological characteristics of the sample set.

2

2. The method of claim 1 , wherein step (d) comprises eliminating multiple features corresponding to the smallest ranking criteria so that the number of features is reduced by the closest power of two to the number of remaining features.

3

3. The method of claim 1 , wherein the one or more features removed in step (d) comprises up to half of the remaining features.

4

4. The method of claim 1 , wherein step (d) comprises eliminating a plurality of features corresponding to the smallest ranking criteria so that the number of features in the first iteration is reduced by up to half of the remaining features until a specified number of features remain and thereafter removing one feature per iteration.

5

5. The method of claim 1 , wherein the patterns comprise disease and normal.

6

6. The method of claim 1 , wherein the patterns comprise different diseases or conditions.

7

7. The method of claim 1 , wherein the sample set is divided into a first portion and a second, smaller portion, the method further comprising using the second, smaller portion of the sample set as a test data set for determining classifier quality.

8

8. The method of claim 5 , wherein the biological data is gene expression data and the features comprise genes.

9

9. The method of claim 5 , wherein the features comprise proteins.

10

10. The method of claim 6 , wherein the biological data is gene expression data and the features comprise genes.

11

11. The method of claim 6 , wherein the features comprise proteins.

12

12. A computer program product embodied on a computer readable medium for predicting patterns in data without overfitting by identifying a determinative subset of features that are most correlated to the patterns, wherein the data comprises a large set of features that describe the data, the computer program product comprising instructions for executing support vector machine classifiers and further for causing a computer processor to: (a) receive the data; (b) train a support vector machine classifier with a training data set having known outcomes with respect to the patterns, wherein the training data set has a number of training patterns that is much smaller than the number of features in the large set of features, and wherein the classifier comprises weights having weight values that correspond to the features in the data set and removal of a subset of features affects the weight values; (c) rank the features according to their corresponding weight values; (d) remove one or more features corresponding to the smallest weight values; (e) train a new classifier with the remaining features; (f) repeat steps (c) through (e) for a plurality of iterations until a final subset having a pre-determined number of features remains; and (g) generate at a printer or display device a report comprising a listing of the features in the final subset, wherein the final subset comprises the determinative subset of features.

13

13. The computer program product of claim 12 , wherein step (d) comprises eliminating multiple features corresponding to the smallest ranking criteria so that the number of features is reduced by the closest power of two to the number of remaining features.

14

14. The computer program product of claim 12 , wherein the one or more features removed in step (d) comprises up to half of the remaining features.

15

15. The computer program product of claim 12 , wherein step (d) comprises eliminating a plurality of features corresponding to the smallest ranking criteria so that the number of features in the first iteration is reduced by up to half of the remaining features until a specified number of features remain and thereafter removing one feature per iteration.

16

16. An apparatus comprising: a computer processor; a memory; a computer readable medium storing a computer program product for predicting patterns in data without overfitting by identifying a determinative subset of features that are most correlated to the patterns, wherein the data comprises a large set of features that describe the data, the computer program product comprising instructions for executing support vector machine classifiers and further for causing a computer processor to: (a) receive the data; (b) train a support vector machine classifier with a training data set having known outcomes with respect to the patterns, wherein the training data set has a number of training patterns that is much smaller than the number of features in the large set of features, and wherein the classifier comprises weights having weight values that correspond to the features in the data set and removal of a subset of features affects the weight values; (c) rank the features according to their corresponding weight values; (d) remove one or more features corresponding to the smallest weight values; (e) train a new classifier with the remaining features; (f) repeat steps (c) through (e) for a plurality of iterations until a final subset having a pre-determined number of features remains; and (g) generate at a printer or display device a report comprising a listing of the features in the final subset, wherein the final subset comprises the determinative subset of features.

17

17. The apparatus of claim 16 , wherein step (d) comprises eliminating multiple features corresponding to the smallest ranking criteria so that the number of features is reduced by the closest power of two to the number of remaining features.

18

18. The apparatus of claim 16 , wherein the one or more features removed in step (d) comprises up to half of the remaining features.

19

19. The apparatus of claim 16 , wherein step (d) comprises eliminating a plurality of features corresponding to the smallest ranking criteria so that the number of features in the first iteration is reduced by up to half of the remaining features until a specified number of features remain and thereafter removing one feature per iteration.

Patent Metadata

Filing Date

Unknown

Publication Date

June 2, 2009

Inventors

Stephen Barnhill
Isabelle Guyon
Jason Weston

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “FEATURE SELECTION METHOD USING SUPPORT VECTOR MACHINE CLASSIFIER” (7542959). https://patentable.app/patents/7542959

© 2026 Patentable. All rights reserved.

Patentable is a research and drafting-assistant tool, not a law firm, and does not provide legal advice. Documents we generate are drafts for review by a licensed patent attorney.

FEATURE SELECTION METHOD USING SUPPORT VECTOR MACHINE CLASSIFIER — Stephen Barnhill | Patentable