Legal claims defining the scope of protection, as filed with the USPTO.
1. A method for identifying pharmaceutical compounds for the treatment of particular solid tumor cells, the method comprising the steps of: (i) selecting at least one solid tumor cell line from each subgroups from a panel of solid tumor cells classified according to genomic subgroups, wherein the panel is assembled from a method comprising: (a) obtaining a plurality of m samples comprising at least one tumor or cancer cell line; (b) acquiring a first data set comprising copy number alteration information from at least one locus from each chromosome from each sample obtained in step (a); (c) identifying in the first data set, copy number alteration information obtained from samples contaminated by normal cells and eliminating the contaminated samples from the first data set, wherein the identifying and eliminating comprises: (1) applying a machine learning algorithm tuned to parameters that represent the differences between tumor and normal samples to the data; (2) assigning a probability score for normal cell contamination to each sample as determined by the machine learning algorithm; (3) eliminating data from the first data set for each sample scoring 50% or greater probability of being contaminated by normal cells; (d) estimating a range of a number of subgroups, r, in the data set by applying an unsupervised clustering algorithm using Pearson linear dissimilarity algorithm to the data set to generate a dendrogram; (e) assigning each sample in the data set to at least one subgroup using a modified genomic non-negative matrix factorization (gNMF) algorithm with each one of the r values estimated in step (d), wherein the modified gNMF algorithm comprises: (1) calculating divergence of the gNMF algorithm after every 100 steps of one run of multiplicative updating of the gNMF algorithm using the formula (1): D ( V WH ) = ∑ i = 1 n ∑ j = 1 m ( V ij log V ij ( WH ) ij - V ij + ( WH ) ij ) ( 1 ) wherein the V ij is the i th row and j th column of matrix V, (WH) ij is the i th row and j th column of matrix (W*H), i runs from 1 to n and n is the number of DNA segments in the data set, and j runs from 1 to m and m is the number of samples in the data set; (2) stopping the gNMF algorithm if the divergence calculated in step (e) (1) does not decrease by more than about 0.001% when compared to the divergence calculated for the previous 100 steps of multiplicative updating of the gNMF algorithm; (3) repeating the gNMF algorithm for a selected number of runs, each with a random start point, and calculating a Pearson correlation coefficient matrix of H for each run of the gNMF algorithm using the formula (2): C i , j = ρ ( H , i , H , j ) = 1 r - 1 ∑ k ( H k , i - H , j _ ) ( H k , j - H , j _ ) s H , i s H , j ( 2 ) wherein C is the correlation matrix, C ij is the i th row and j th column in the matrix C, H ,i and H ,j are the i th and j th column vector in matrix H, ρ(H ,i , H ,j ) is the Pearson correlation coefficient between H ,i and H ,j , i and j run from 1 to m and m is the number of samples in the data set, k runs from 1 to r and r is the number of subgroups from step (d); (4) averaging the Pearson correlation coefficient matrices for each run of the gNMF algorithm obtained from step (e) (3) to arrive at an average correlation matrix; (5) assigning samples in the data set into r subgroups by applying an unsupervised clustering algorithm using the identity matrix minus the average correlation matrix determined in step (e) (4) and cutting the dendrogram into r subgroups; (6) repeating steps (1)-(5) with a different value of r determined in step (d); (f) applying a Cophenetic correlation, Bayesian Information Criterion, or a combination thereof to provide a final number of subgroups from the data set, wherein each final subgroup defines a genomic subgroup for each tumor or cancer cell line sample; and (g) evaluating the stability of the final number of subgroups selected in step (f) using a ten-fold stability test; (h) selecting at least one solid tumor cell from each subgroup selected in step (f) and assembling into panels defined according to genomic subgroups; (ii) contacting the at least one solid tumor cell from each subgroup with the pharmaceutical compound; (iii) assaying the efficacy of the pharmaceutical compound to treat the at least one solid tumor cell from each subgroup; and (iv) classifying the pharmaceutical compound according to the determined efficacy of the pharmaceutical compound to treat the at least one solid tumor cell from each subgroup, wherein treating the at least one solid tumor cell from one subgroup, but not another, indicates specificity of the pharmaceutical compound to treat solid tumor cells of that subgroup.
2. The method of claim 1 , wherein the unsupervised clustering algorithm is a hierarchical clustering.
3. The method of claim 1 , wherein Cophenetic correlation is used to provide a final number of subgroups from the data set.
4. The method of claim 1 , wherein Bayesian Information Criterion is used to provide a final number of subgroups from the data set.
5. The method of claim 1 , wherein Cophenetic correlation and Bayesian Information Criterion are used to provide a final number of subgroups from the data set.
Unknown
April 7, 2015
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.