Legal claims defining the scope of protection, as filed with the USPTO.
1. A computing device, comprising: processing circuitry and memory; wherein the processing circuitry is configured to: accept input data relating to past clinical episodes collected from a patient over time; group the past clinical episodes into one or more classification clusters accordance with a similarity metric; receive additional input data relating to a new clinical episode collected from the patient; assign the new clinical episode to a classification cluster if the new clinical episode is similar to one or more of the past clinical episode contained within the classification cluster as determined the similarity metric; and, issue an alert or display the new clinical episode on a display if the new clinical episode is not assigned to a classification cluster; wherein past and new clinical episodes are represented by an episode vector in a p-dimensional vector space made up of p numerically valued features reflective of the patient's clinical condition, p being an integer and wherein the processing circuitry is further configured to: apply a clustering procedure to the past episode vectors to compute clusters that groups the past episode vectors into a plurality of clusters based upon a distance metric in the p-dimensional vector space; define the one or more classification clusters as represented by the centroids of clusters of past episode vectors generated by the clustering procedure; assign a new episode vector representing a recent clinical episode to the classification cluster that corresponds to the centroid closest to the new episode vector if the distance from the new episode vector to the closest centroid of a classification cluster is less than a specified value; wherein the processing circuitry is further configured to: 1) select some or all of the past episode vectors as a training set and divide the training set into a training subset and a pseudo-validation subset; 2) perform the clustering procedure on the training subset and computing an optimal number of clusters as part of the clustering procedure; 3) assign vectors of the pseudo-validation subset to the clusters of the training subset in a manner that minimizes the distance from the pseudo-validation subset vector to the centroid of the cluster to which it is assigned; 4) perform the clustering procedure on the pseudo-validation subset, wherein the number of clusters to be computed is specified to be the same as the optimal number of clusters computed as part of the clustering procedure applied to the training subset; 5) evaluate the accuracy of the clusters of the training subset to which the pseudo-validation subset vectors are assigned in classifying those vectors using the clusters computed as a result of applying the clustering procedure to the pseudo-validation subset as a ground-truth standard; 6) after re-dividing the training set into different training and pseudo-validation subsets, iteratively perform 2) through 5) a specified number of times; and, 7) select the centroids of the clusters of the training subset that are evaluated with the highest accuracy to represent the classification clusters.
2. The device of claim 1 wherein the processing circuitry is further configured to: designate one or more of the classification clusters as clinically significant so as to warrant an alert; and, if the new episode vector is assigned to a clinically significant classification cluster, issue an alert.
3. The device of claim 1 wherein the distance metric in the p-dimensional vector space is a Euclidean distance, a squared Euclidean distance, a Minkowski distance, a Manhattan distance, a Pearson correlation distance, a Pearson squared distance, a Chebyshev distance, or a Spearman distance.
4. The device of claim 1 wherein the processing circuitry is further configured to, if a new episode vector is not assigned to a classification cluster, classify the recent clinical episode represented by that new episode vector as dissimilar to past clinical episodes.
5. The device of claim 1 wherein the processing circuitry is further configured to issue an alert if a recent clinical episode is classified dissimilar to past clinical episodes in accordance with a specified value of the distance metric.
6. The device of claim 1 wherein the processing circuitry is further configured to, after a specified period time or after a specified number of new episode vectors have been collected, add the collected new episode vectors to the past episode vectors and re-compute the classification clusters.
7. The device of claim 1 wherein the processing circuitry is further configured to test the accuracy of the classification clusters in classifying vectors by: selecting some or all of the past episode vectors not selected as the training set to be a test set; assigning vectors of the test set to the classification clusters in a manner that minimizes the distance from the test set vector to the centroid of the cluster to which it is assigned; performing the clustering procedure on the test set, wherein the number of clusters to be computed is specified to be the same as the number of classification clusters; and, evaluating the accuracy of the classification clusters to which the test vectors have been assigned in classifying the test vectors using the clusters computed by applying the clustering procedure to the test set as a ground-truth standard.
8. The device of claim 1 wherein the processing circuitry is further configured to weight newer past episode vectors more heavily than older past episode vectors in computing the classification clusters.
9. The device of claim 1 further comprising a user interface and wherein the processing circuitry is further configured to incorporate patient or clinician feedback in computing the classification clusters by assigning past clinical episode vectors to a particular classification cluster in accordance with the feedback.
10. The device of claim 1 wherein the clustering procedure is a K-means algorithm performed by: 1) assigning the past episode vectors as belonging to one of K clusters, where K is an integer, and calculating the coordinates in the p-dimensional vector space of the centroid of each of the K clusters as the average of the feature values of the past episode vectors belonging to the cluster; 2) re-assigning each of the past episode vectors to the one of the K clusters whose centroid is closest to the past episode vector; 3) calculate the coordinates of the centroid of each of the K clusters as the average of the feature values of the past episode vectors belonging to the cluster; and, 4) iteratively performing 2) through 3) until a specified termination condition is met.
11. The device of claim 10 wherein the specified termination condition is selected from: a specified number of iterations having been performed, the coordinates of the centroid of each of the K clusters not changing between iterations, and the sum of the squared distances from each past episode vector to the centroid of the cluster to which it has been assigned being below a specified threshold value.
12. The device of claim 10 wherein the processing circuitry is further configured to calculate an optimal value for the number of clusters K by: computing clusters using the K-means clustering procedure for successively increasing values of K; for each set clusters computed using a particular K value, computing a dispersion function defined as the sum of the squared distances between a centroid of a cluster and the vectors belonging to the cluster summed over all of the clusters in the set; and, selecting the optimal value of K as the K value when the marginal decrease in the dispersion function as K is increased is maximized.
13. The device of claim 1 wherein the accuracy of clusters in classifying vectors is evaluated as being positively related to a mutual information MI between the set of clusters to be evaluated {w(1), w(2), . . . w(N) and the set of clusters regarded as ground truth {c(1), c(2), . . . c(N)}, where w(k) is the set of vectors in cluster k of the set of clusters to be evaluated, c(j) is the set of vectors in cluster j of the set clusters regarded as ground truth, N is the number of clusters in the set of clusters to be evaluated and the set of clusters regarded as ground truth, and wherein MI is calculated as: MI = ∑ k N ∑ j N P ( w ( k ) ⋂ c ( j ) ) log P ( w ( k ) ⋂ c ( j ) ) P ( w ( k ) ) P ( c ( k ) ) where P(w(k)), P(c(j)), and P(w(k)∩c(j)) are the probabilities of a vector being w(k), c(j), and the intersection of w(k) and c(j), respectively, estimated as the relative frequencies of the vectors in the clusters to be evaluated and the clusters regarded as ground truth.
14. A computer-implemented method comprising: accepting input data relating to past clinical episodes collected from a patient over time; grouping the past clinical episodes into one or more classification clusters accordance with a similarity metric; receiving additional input data relating to a new clinical episode collected from the patient; assigning the new clinical episode to a classification cluster if the new clinical episode is similar to one or more of the past clinical episode contained within the classification cluster as determined the similarity metric; and, issuing an alert or displaying the new clinical episode on a display if the new clinical episode is not assigned to a classification cluster; wherein each past clinical episode is represented by a past episode vector in a p-dimensional vector space made up of p numerically valued features reflective of the patient's clinical condition, p being an integer; applying a clustering procedure to the past episode vectors to compute clusters that groups the past episode vectors into a plurality of clusters based upon a distance metric in the p-dimensional vector space; defining the one or more classification clusters as represented by the centroids of clusters of past episode vectors generated by the clustering procedure; assigning a new episode vector representing a recent clinical episode to the classification cluster that corresponds to the centroid closest to the new episode vector if the distance from the new episode vector to the closest centroid of a classification cluster is less than a specified value; and further comprising: 1) selecting some or all of the past episode vectors as a training set and dividing the training set into a training subset and a pseudo-validation subset; 2) performing the clustering procedure on the training subset and computing an optimal number of clusters as part of the clustering procedure; 3) assigning vectors of the pseudo-validation subset to the clusters of the training subset in a manner that minimizes the distance from the pseudo-validation subset vector to the centroid of the cluster to which it is assigned; 4) performing the clustering procedure on the pseudo-validation subset, wherein the number of clusters to be computed is specified to be the same as the optimal number of clusters computed as part of the clustering procedure applied to the training subset; 5) evaluating the accuracy of the clusters of the training subset to which the pseudo-validation subset vectors are assigned in classifying those vectors using the clusters computed as a result of applying the clustering procedure to the pseudo-validation subset as a ground-truth standard; 6) after re-dividing the training set into different training and pseudo-validation subsets, iteratively performing 2) through 5) a specified number of times; and, 7) selecting the centroids of the clusters of the training subset that are evaluated with the highest accuracy to represent the classification clusters.
15. The method of claim 14 further comprising: after a specified period time or after a specified number of new episode vectors have been collected, add the collected new episode vectors to the past episode vectors and re-compute the classification clusters.
16. The method of claim 14 further comprising testing the accuracy of the classification clusters in classifying vectors by: selecting some or all of the past episode vectors not selected as the training set to be a test set; assigning vectors of the test set to the classification clusters in a manner that minimizes the distance from the test set vector to the centroid of the cluster to which it is assigned; performing the clustering procedure on the test set, wherein the number of clusters to be computed is specified to be the same as the number of classification clusters; and, evaluating the accuracy of the classification clusters to which the test vectors have been assigned in classifying the test vectors using the clusters computed by applying the clustering procedure to the test set as a ground-truth standard.
17. The method of claim 14 wherein the clustering procedure is a K-means algorithm performed by: 1) assigning the past episode vectors as belonging to one of K clusters, where K is an integer, and calculating the coordinates in the p-dimensional vector space of the centroid of each of the K clusters as the average of the feature values of the past episode vectors belonging to the cluster; 2) re-assigning each of the past episode vectors to the one of the K clusters whose centroid is closest to the past episode vector; 3) calculate the coordinates of the centroid of each of the K clusters as the average of the feature values of the past episode vectors belonging to the cluster; and, 4) iteratively performing 2) through 3) until a specified termination condition is met.
Unknown
March 29, 2022
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.