US-8521659

Systems and methods of discovering mixtures of models within data and probabilistic classification of data according to the model mixture

PublishedAugust 27, 2013

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

Discovering mixtures of models includes: initiating learning algorithms, determining, data sets including a cluster of points in a first region of a domain and a set of points distributed near a first line extending across the domain; inferencing parameters from the cluster and the set of points; creating a description of the cluster of points in the first region of the domain and computing approximations of a first learned mixture model and a second learned mixture model; determining a first and second probability, generating a confidence rating that each point of the cluster of points in the first region of the domain corresponds to the first learned mixture model and generating a confidence rating that each point of the set of points distributed near the first line correspond to the second learned mixture model, thus causing determinations of behavior of a system described by the learned mixture models.

Patent Claims

16 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A method, implemented in a computer readable and executable program on a computer processor, of discovering mixtures of models within data and probabilistic classification of data according to model mixtures, the method comprising: receiving, a request for discovering mixtures of models within data and probabilistic classification of data according to model mixtures; initiating a learning algorithm, by the computer processor, causing the computer processor to execute the computer readable and executable program for simultaneously discovering mixtures of models within data and probabilistic classification of data according to mixture models of a plurality of models; applying a random sampling operation to determine mathematical functions; determining multiple models of the plurality of models that fit portions of mixture models of the plurality of models; probabilistically assigning points to multiple models of the plurality of models by using abstractions of mathematical functions to form simulated equivalent mathematical functions, causing one or more mathematical functions to be processed as one or more of the plurality of models; comparing multiple models of the plurality of models by comparing different mathematical functions and by comparing a first quality of a first model to a second quality of a second model, wherein a number of points supporting an at least one candidate model are counted to determine whether sufficient data are modeled, wherein global accounting ensures that the number of points supporting the at least one candidate model are only counted once, when determining how many of the number of points in data are modeled by candidate functions, and wherein comparing different mathematical functions includes using geometric properties, including overlap, the number of points supporting the at least one candidate model counted, and density; and providing user settable thresholds for user interaction with computations of residual error and with computations of the number of points supporting the at least one candidate model corresponding to learned mixture models.

Plain English Translation

A computer-implemented method discovers and classifies mixtures of models within data. The process begins by receiving a request to discover these mixtures and probabilistically classify data based on them. A learning algorithm is initiated to simultaneously discover models and classify data based on mixture models. The system applies random sampling to determine mathematical functions that fit portions of the mixture models. Points are probabilistically assigned to these models using abstractions of mathematical functions. The models are compared based on their mathematical functions and qualities. The system counts points supporting each candidate model, ensuring each point is only counted once. This comparison incorporates geometric properties like overlap and density. User-settable thresholds are provided for computations of residual error and the number of points supporting the learned mixture models, allowing user interaction and control over the model discovery process.

Claim 2

Original Legal Text

2. The method according to claim 1 , wherein probabilistically assigning points to multiple models of the plurality of models includes generating a confidence rating that each point of a cluster of points in the first region of the domain corresponds to the first learned mixture model and generating a confidence rating that each point of the cluster of points in the first region of the domain correspond to the second learned mixture model and causing determination of a behavior of a system described by the learned mixture models.

Plain English Translation

Building upon the method for discovering mixtures of models (receiving a request; initiating a learning algorithm; applying random sampling; determining multiple models; probabilistically assigning points using abstractions of mathematical functions; comparing models by comparing different mathematical functions and model qualities; and providing user settable thresholds), this method probabilistically assigns points by generating a confidence rating for each point in a region, indicating its correspondence to a learned mixture model. This is repeated for other regions and other learned mixture models. This probabilistic assignment contributes to determining the behavior of a system described by the learned mixture models.

Claim 3

Original Legal Text

3. The method according to claim 1 , wherein the random sampling operation includes one of PROGRESSIVE SAMPLE CONSENSUS (PROSAC), RANDOM SAMPLE CONSENSUS (RANSAC), and MONTE CARLO operations.

Plain English Translation

Expanding on the method of discovering mixtures of models (receiving a request; initiating a learning algorithm; applying random sampling; determining multiple models; probabilistically assigning points using abstractions of mathematical functions; comparing models by comparing different mathematical functions and model qualities; and providing user settable thresholds), the random sampling operation used to find mathematical functions can specifically utilize PROGRESSIVE SAMPLE CONSENSUS (PROSAC), RANDOM SAMPLE CONSENSUS (RANSAC), or Monte Carlo methods. These methods aid in efficiently exploring the data space to identify potential mathematical models.

Claim 4

Original Legal Text

4. The method according to claim 3 , wherein applying the random sampling operation to determine mathematical functions determines mathematical functions consistent with a dataset of a cluster of points in a first region of a domain and a set of points distributed near one of a first line and a mathematical function, in the first region of the domain.

Plain English Translation

Continuing with the method using random sampling (receiving a request; initiating a learning algorithm; applying random sampling including PROSAC, RANSAC, or Monte Carlo; determining multiple models; probabilistically assigning points using abstractions of mathematical functions; comparing models by comparing different mathematical functions and model qualities; and providing user settable thresholds), the random sampling process determines mathematical functions that align with a dataset containing a cluster of points in a region and a set of points near a line or another mathematical function within that region. The system looks for functions that describe both the clustered and linearly distributed data.

Claim 5

Original Legal Text

5. The method according to claim 4 , wherein the mathematical function includes a transcendental function, a hyperbolic function and a polynomial function.

Plain English Translation

In the method, after random sampling and determining mathematical functions describing data clusters and lines/functions (receiving a request; initiating a learning algorithm; applying random sampling including PROSAC, RANSAC, or Monte Carlo to find mathematical functions; determining functions consistent with data clusters and lines; probabilistically assigning points using abstractions of mathematical functions; comparing models by comparing different mathematical functions and model qualities; and providing user settable thresholds), the type of mathematical function identified can include transcendental functions, hyperbolic functions, and polynomial functions.

Claim 6

Original Legal Text

6. A system of discovering mixtures of models within data and probabilistic classification of data according to model mixtures, the system comprising: a computer processor having a display, an input device and an output device; a network interface communicatively coupling the computer processor to a network; and a memory having a dynamic repository, an algorithm unit and a program unit containing a computer readable and computer executable program; and a memory controller communicatively coupling the computer processor with contents of the dynamic repository, the algorithm unit and the computer readable and computer executable program residing in the program unit, wherein when executed by the computer processor, the computer readable and computer executable program causes the computer processor to perform operations of discovering mixtures of models including operations of: receiving, a request for discovering mixtures of models within data and probabilistic classification of data according to model mixtures; initiating a learning algorithm, by the computer processor, causing the computer processor to execute the computer readable and executable program discovering mixtures of models within data and probabilistic classification of data according to mixture models; applying a random sampling operation to determine mathematical functions; determining, by the computer processor, one of when a data set consists of a cluster of points in a first region of a domain, and determining when a set of points distributed near a first line that extends across part of the domain exists; inferencing parameters, of the first line, that one of describe the set of points distributed near the first line, and describe a mean and variance of the cluster of points in the first region of the domain creating a description of the cluster of points in the first region of the domain, and describe other parameters needed to describe an instance of a function in a number of dimensions, wherein the number of dimensions includes 4D and higher dimensions; computing, by the computer processor, approximations of a first learned mixture model corresponding to the set of points distributed near the first function and a second learned mixture model corresponding to the set of points near the second function within the domain and similar approximations for functions determined to exist within data; probabilistically assigning points to multiple models of the plurality of models; using abstractions of mathematical functions to form simulated equivalent mathematical functions, causing one or more mathematical functions to be processed as one or more of the plurality of models; comparing multiple models of the plurality of models by comparing different mathematical functions and by comparing a first quality of a first model to a second quality of a second model, wherein a number of points supporting an at least one candidate model are counted to determine whether sufficient data are modeled, wherein global accounting ensures that the number of points supporting the at least one candidate model are only counted once, when determining how many of the number of points in data are modeled by candidate functions, and wherein comparing different mathematical functions includes using geometric properties, including overlap, the number of points supporting the at least one candidate model counted, and density; providing user settable thresholds for user interactions with computations of residual error and with computations of the number of points supporting the at least one candidate model corresponding to the first and second learned mixture models; and generating a confidence rating that each point of the cluster of points in the first region of the domain corresponds to the first learned mixture model and generating a confidence rating that each point of the cluster of points in the first region of the domain corresponds to the second learned mixture model and causing determination of a behavior of a system described by the first and second learned mixture models.

Plain English Translation

A system discovers and classifies mixtures of models within data. It contains a computer processor, display, input/output devices, a network interface, memory with a dynamic repository, an algorithm unit, and a program unit containing the executable program. The program, when executed, receives a request to discover mixtures of models and classify data. It initiates a learning algorithm, applies random sampling to determine mathematical functions. It determines whether the data is composed of clusters of points or points near lines. Parameters of these lines or cluster characteristics are inferred. Approximations of learned mixture models are computed. Points are assigned probabilistically to models. Abstractions of mathematical functions are used to simulate equivalents. Models are compared, counting supporting points (only once). Comparison uses geometric properties. User-adjustable thresholds control residual error and point counts. Confidence ratings are generated for point assignments.

Claim 7

Original Legal Text

7. The system according to claim 6 , having the computer processor performing operations of discovering mixtures by performing operations of determining, wherein determining, by the computer processor, further includes determining when a set of points constitutes a transcendental function, a hyperbolic function, a polynomial function, and other functions, described as embedded in any number of dimensions that describe input data.

Plain English Translation

Expanding on the system for discovering mixtures of models (processor, display, input/output, network, memory with repository, algorithm unit and executable program; receiving request; initiating algorithm; applying random sampling; determining datasets; inferring parameters; creating descriptions; computing approximations; probabilistically assigning points; using abstractions; comparing models; providing user thresholds; and generating confidence ratings), the system further determines when sets of points constitute transcendental, hyperbolic, or polynomial functions. This identification is performed in any number of dimensions to describe input data, including 4D or higher.

Claim 8

Original Legal Text

8. The system according to claim 6 , having the computer processor performing operations of discovering mixtures by performing operations of probabilistically assigning points to multiple models of the plurality of models, wherein probabilistically assigning points, by the computer processor, further includes: determining a first probability that the first learned mixture model corresponds to each point of the cluster of points in the first region of the domain and determining a second probability that the second learned mixture model corresponds to each point of the set of points distributed near the first line, wherein determining the first and second probabilities is performed by testing each point, wherein determining the first and second probabilities eliminates a requirement for a fit of each point displaced from a true position, wherein setting a minimum number of points for each of the first and second learned mixture models distinguishes the first and second learned mixture models from a combination learned mixture model formed from parameters of the first and second learned mixture models, wherein determining the first and second probabilities includes assigning a fixed percent probability up to about fifty percent for points of a line, depending on a residual error fit of the first and second learned mixture model, and wherein the learning algorithm probabilistically determines whether a series of Gaussian mixture models are found, by combining a number of points of the first and second learned mixture models with an average residual points to be excluded and repeating probabilistically assigning points to multiple models of the plurality of models for each function determined to exist within data.

Plain English Translation

Within the system discovering mixtures of models (processor, display, input/output, network, memory with repository, algorithm unit and executable program; receiving request; initiating algorithm; applying random sampling; determining datasets; inferring parameters; creating descriptions; computing approximations; probabilistically assigning points; using abstractions; comparing models; providing user thresholds; and generating confidence ratings), probabilistically assigning points includes determining the probability that a learned mixture model corresponds to each point, accomplished by testing each point individually. This eliminates the need for precise fits. A minimum number of points distinguishes mixture models from combinations. A fixed probability (up to 50%) is assigned to line points based on residual error fit. The algorithm probabilistically determines Gaussian mixture models by combining points, excluding average residuals, and repeating the point assignment for each function found.

Claim 9

Original Legal Text

9. The system according to claim 6 , having the computer processor applying a random sampling operation to determine mathematical functions, wherein the random sampling operation includes one of PROGRESSIVE SAMPLE CONSENSUS (PROSAC), RANDOM SAMPLE CONSENSUS (RANSAC), and MONTE CARLO operations.

Plain English Translation

In the described system (processor, display, input/output, network, memory with repository, algorithm unit and executable program; receiving request; initiating algorithm; applying random sampling; determining datasets; inferring parameters; creating descriptions; computing approximations; probabilistically assigning points; using abstractions; comparing models; providing user thresholds; and generating confidence ratings), the random sampling operation to determine mathematical functions can use PROGRESSIVE SAMPLE CONSENSUS (PROSAC), RANDOM SAMPLE CONSENSUS (RANSAC), or Monte Carlo operations.

Claim 10

Original Legal Text

10. The system according to claim 9 , having the computer processor applying the random sampling operation, determining mathematical functions, further causes determining mathematical functions consistent with a dataset of a cluster of points in a first region of a domain and a set of points distributed near one of a first line and a mathematical function, in the first region of the domain.

Plain English Translation

Continuing with the system using random sampling (processor, display, input/output, network, memory with repository, algorithm unit and executable program; receiving request; initiating algorithm; applying random sampling including PROSAC, RANSAC, or Monte Carlo; determining datasets; inferring parameters; creating descriptions; computing approximations; probabilistically assigning points; using abstractions; comparing models; providing user thresholds; and generating confidence ratings), the random sampling process is designed to identify mathematical functions consistent with clusters of points and points near lines or other functions within a defined region.

Claim 11

Original Legal Text

11. A non-transitory computer readable medium having a plurality of computer executable instructions in the form of a computer readable and computer executable program executed by a computer processor causing the computer processor to perform a method of discovering mixtures of models within data and probabilistic classification of data according to model mixtures, the plurality of computer executable instructions including: instructions causing receiving, a request for discovering mixtures of models within data and probabilistic classification of data according to model mixtures, wherein the non-transitory computer readable medium includes a plurality of non-transitory computer readable data storage media including storage devices, such as tape drives and disc drives; instructions initiating a learning algorithm, by the computer processor; instructions for applying a random sampling operation to determine mathematical functions; instructions causing determining, by the computer processor, one of when a data set consists of a cluster of points in a first region of a domain, determining when a set of points distributed near a first line that extends across part of the domain exists; instructions causing inferencing parameters, of the first line, that one of describe the set of points distributed near the first line, and describe a mean and variance of the cluster of points in the first region of the domain creating a description of the cluster of points in the first region of the domain, and describe other parameters needed to describe an instance of a function in a plurality of dimensions; instructions causing computing, by the computer processor, approximations of a first learned mixture model corresponding to the set of points distributed near a first function and a second learned mixture model corresponding to the set of points near a second function within the domain and similar approximations for functions determined to exist within data embedded in any subspace of the domain and total domain; instructions causing probabilistically assigning points to multiple models of a plurality of models; instructions for using abstractions of mathematical functions to form simulated equivalent mathematical functions, causing one or more mathematical functions to be processed as one or more of the plurality of models; instructions causing comparing multiple models of the plurality of models by comparing different mathematical functions and by comparing a first quality of a first model to a second quality of a second model, wherein a number of points supporting an at least one candidate model are counted to determine whether sufficient data are modeled, wherein global accounting ensures that the number of points supporting the at least one candidate model are only counted once, when determining how many of the number of points in data are modeled by candidate functions, and wherein comparing different mathematical functions includes using geometric properties, including overlap, the number of points supporting the at least one candidate model counted, and density; instructions for providing a user settable threshold for user interaction with computations residual error and with computations of the number of points supporting the at least one candidate model corresponding to the first and second learned mixture models; and instructions for generating a confidence rating that each point of the cluster of points in the first region of the domain corresponds to the first learned mixture model and generating a confidence rating that each point of the cluster of points in the first region of the domain correspond to the second learned mixture model and causing determination of a behavior of a system described by the learned mixture models.

Plain English Translation

A non-transitory computer-readable medium stores instructions for discovering mixtures of models and classifying data. These instructions cause the processor to: receive a request for discovering mixtures of models; initiate a learning algorithm; apply random sampling to determine mathematical functions; determine whether the dataset includes point clusters or points distributed near a line; infer parameters describing these features; compute approximations of learned mixture models; probabilistically assign points to models; use abstractions of mathematical functions to simulate equivalent functions; compare models, counting supporting points (only once) and using geometric properties; provide user-settable thresholds for residual error and point counts; and generate confidence ratings for point assignments.

Claim 12

Original Legal Text

12. The instructions of the non-transitory computer readable medium according to claim 11 , initiating a learning algorithm, by the computer processor, further include instructions causing the computer processor to execute the computer readable and executable program discovering mixtures of models within data and probabilistic classification of data according to mixture models.

Plain English Translation

The non-transitory computer readable medium containing instructions for discovering mixtures of models (receive request; initiate algorithm; apply random sampling; determine datasets; infer parameters; compute approximations; probabilistically assign points; use abstractions; compare models; provide user thresholds; generate confidence ratings), also includes instructions causing the processor to execute a program for discovering mixtures of models and probabilistic classification based on mixture models.

Claim 13

Original Legal Text

13. The instructions of the non-transitory computer readable medium according to claim 11 , causing determining, by the computer processor, one of when a data set consists of a cluster of points in a first region of a domain, determining when a set of points distributed near a first line that extends across part of the domain exists, includes further instructions causing determining when a set of points constitutes one of a transcendental, hyperbolic, polynomial, and other function, which is described as embedded in any number of dimensions that describe input data, wherein any number of dimensions includes 4D and higher dimensions.

Plain English Translation

Regarding the non-transitory computer readable medium with instructions for discovering models (receive request; initiate algorithm; apply random sampling; determine datasets; infer parameters; compute approximations; probabilistically assign points; use abstractions; compare models; provide user thresholds; generate confidence ratings), the instructions for determining dataset composition also include instructions to determine if a set of points represents transcendental, hyperbolic, polynomial, or other functions described in any number of dimensions, including 4D or higher.

Claim 14

Original Legal Text

14. The instructions of the non-transitory computer readable medium according to claim 11 , causing instructions for probabilistically assigning points to multiple models of the plurality of models, further includes instructions causing: determining a first probability that the first learned mixture model corresponds to each point of the cluster of points in the first region of the domain and determining a second probability that the second learned mixture model corresponds to each point of the set of points distributed near the first line, wherein determining the first and second probabilities is performed by testing each point, wherein determining the first and second probabilities eliminates a requirement for a fit of each point displaced from a true position, wherein setting a minimum number of points for each of the first and second learned mixture models distinguishes the first and second learned mixture models from a combination learned mixture model formed from parameters of the first and second learned mixture models, wherein determining the first and second probabilities includes assigning a fixed percent probability up to about fifty percent for points of a line, depending on a residual error fit of the first and second learned mixture model, and wherein the learning algorithm probabilistically determines whether a series of Gaussian mixture models are found, by combining a number of points of the first and second learned mixture models with an average residual points to be excluded and repeating probabilistically assigning points to multiple models of the plurality of models for each function determined to exist within data.

Plain English Translation

In the non-transitory computer readable medium (receive request; initiate algorithm; apply random sampling; determine datasets; infer parameters; compute approximations; probabilistically assign points; use abstractions; compare models; provide user thresholds; generate confidence ratings), the instructions for probabilistically assigning points include determining the probability that a learned mixture model corresponds to each point, accomplished by testing each point individually. This eliminates the need for precise fits. A minimum number of points distinguishes mixture models. A fixed probability (up to 50%) is assigned to line points based on residual error. The algorithm probabilistically determines Gaussian mixture models by combining points, excluding residuals, and repeating assignment for each function.

Claim 15

Original Legal Text

15. The instructions of the non-transitory computer readable medium according to claim 11 , of applying a random sampling operation to determine mathematical functions, further includes instructions calling an algorithm to perform one of PROGRESSIVE SAMPLE CONSENSUS (PROSAC), RANDOM SAMPLE CONSENSUS (RANSAC), and MONTE CARLO operations.

Plain English Translation

The non-transitory computer-readable medium (receive request; initiate algorithm; apply random sampling; determine datasets; infer parameters; compute approximations; probabilistically assign points; use abstractions; compare models; provide user thresholds; generate confidence ratings) includes instructions to apply a random sampling operation using PROGRESSIVE SAMPLE CONSENSUS (PROSAC), RANDOM SAMPLE CONSENSUS (RANSAC), or Monte Carlo methods.

Claim 16

Original Legal Text

16. The instructions of the non-transitory computer readable medium according to claim 15 , wherein applying the random sampling operation to determine mathematical functions determines mathematical functions consistent with a dataset of a cluster of points in a first region of a domain and a set of points distributed near one of a first line and a mathematical function, in the first region of the domain.

Plain English Translation

As for the non-transitory computer-readable medium employing random sampling (receive request; initiate algorithm; apply random sampling including PROSAC, RANSAC, or Monte Carlo; determine datasets; infer parameters; compute approximations; probabilistically assign points; use abstractions; compare models; provide user thresholds; generate confidence ratings), applying this random sampling operation aims to determine mathematical functions that are consistent with a dataset composed of both clustered points and points distributed near a line or another mathematical function in a defined region.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G06N G06F

Patent Metadata

Filing Date

August 14, 2009

Publication Date

August 27, 2013

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search