Legal claims defining the scope of protection, as filed with the USPTO.
1. A computer-implemented method for predicting patterns in biological data, wherein the data comprises a large set of features that describe the data and a sample set from which the biological data is obtained is much smaller than the large set of features, the method comprising: identifying a determinative subset of features that are most correlated to the patterns comprising: (a) inputting the data into a computer processor programmed for executing support vector machine classifiers; (b) training a support vector machine classifier with a training data set comprising at least a portion of the sample set and having known outcomes with respect to the patterns, wherein the classifier comprises weights having weight values that correspond to the features in the data set and removal of a subset of features affects the weight values; (c) ranking the features according to their corresponding weight values; (d) removing one or more features corresponding to the smallest weight values; (e) training a new classifier with the remaining features; (f) repeating steps (c) through (e) for a plurality of iterations until a final subset having a pre-determined number of features remains; and generating at a printer or display device a report comprising a listing of the features in the final subset, wherein the final subset comprises the determinative subset of features for determining biological characteristics of the sample set.
2. The method of claim 1 , wherein step (d) comprises eliminating multiple features corresponding to the smallest ranking criteria so that the number of features is reduced by the closest power of two to the number of remaining features.
3. The method of claim 1 , wherein the one or more features removed in step (d) comprises up to half of the remaining features.
4. The method of claim 1 , wherein step (d) comprises eliminating a plurality of features corresponding to the smallest ranking criteria so that the number of features in the first iteration is reduced by up to half of the remaining features until a specified number of features remain and thereafter removing one feature per iteration.
5. The method of claim 1 , wherein the patterns comprise disease and normal.
6. The method of claim 1 , wherein the patterns comprise different diseases or conditions.
7. The method of claim 1 , wherein the sample set is divided into a first portion and a second, smaller portion, the method further comprising using the second, smaller portion of the sample set as a test data set for determining classifier quality.
8. The method of claim 5 , wherein the biological data is gene expression data and the features comprise genes.
9. The method of claim 5 , wherein the features comprise proteins.
10. The method of claim 6 , wherein the biological data is gene expression data and the features comprise genes.
11. The method of claim 6 , wherein the features comprise proteins.
12. A computer program product embodied on a computer readable medium for predicting patterns in data without overfitting by identifying a determinative subset of features that are most correlated to the patterns, wherein the data comprises a large set of features that describe the data, the computer program product comprising instructions for executing support vector machine classifiers and further for causing a computer processor to: (a) receive the data; (b) train a support vector machine classifier with a training data set having known outcomes with respect to the patterns, wherein the training data set has a number of training patterns that is much smaller than the number of features in the large set of features, and wherein the classifier comprises weights having weight values that correspond to the features in the data set and removal of a subset of features affects the weight values; (c) rank the features according to their corresponding weight values; (d) remove one or more features corresponding to the smallest weight values; (e) train a new classifier with the remaining features; (f) repeat steps (c) through (e) for a plurality of iterations until a final subset having a pre-determined number of features remains; and (g) generate at a printer or display device a report comprising a listing of the features in the final subset, wherein the final subset comprises the determinative subset of features.
13. The computer program product of claim 12 , wherein step (d) comprises eliminating multiple features corresponding to the smallest ranking criteria so that the number of features is reduced by the closest power of two to the number of remaining features.
14. The computer program product of claim 12 , wherein the one or more features removed in step (d) comprises up to half of the remaining features.
15. The computer program product of claim 12 , wherein step (d) comprises eliminating a plurality of features corresponding to the smallest ranking criteria so that the number of features in the first iteration is reduced by up to half of the remaining features until a specified number of features remain and thereafter removing one feature per iteration.
16. An apparatus comprising: a computer processor; a memory; a computer readable medium storing a computer program product for predicting patterns in data without overfitting by identifying a determinative subset of features that are most correlated to the patterns, wherein the data comprises a large set of features that describe the data, the computer program product comprising instructions for executing support vector machine classifiers and further for causing a computer processor to: (a) receive the data; (b) train a support vector machine classifier with a training data set having known outcomes with respect to the patterns, wherein the training data set has a number of training patterns that is much smaller than the number of features in the large set of features, and wherein the classifier comprises weights having weight values that correspond to the features in the data set and removal of a subset of features affects the weight values; (c) rank the features according to their corresponding weight values; (d) remove one or more features corresponding to the smallest weight values; (e) train a new classifier with the remaining features; (f) repeat steps (c) through (e) for a plurality of iterations until a final subset having a pre-determined number of features remains; and (g) generate at a printer or display device a report comprising a listing of the features in the final subset, wherein the final subset comprises the determinative subset of features.
17. The apparatus of claim 16 , wherein step (d) comprises eliminating multiple features corresponding to the smallest ranking criteria so that the number of features is reduced by the closest power of two to the number of remaining features.
18. The apparatus of claim 16 , wherein the one or more features removed in step (d) comprises up to half of the remaining features.
19. The apparatus of claim 16 , wherein step (d) comprises eliminating a plurality of features corresponding to the smallest ranking criteria so that the number of features in the first iteration is reduced by up to half of the remaining features until a specified number of features remain and thereafter removing one feature per iteration.
Unknown
June 2, 2009
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.