Patentable/Patents/US-11494614
US-11494614

Subsampling training data during artificial neural network training

PublishedNovember 8, 2022
Assigneenot available in USPTO data we have
Inventorsnot available in USPTO data we have
Technical Abstract

Perplexity scores are computed for training data samples during ANN training. Perplexity scores can be computed as a divergence between data defining a class associated with a current training data sample and a probability vector generated by the ANN model. Perplexity scores can alternately be computed by learning a probability density function (“PDF”) fitting activation maps generated by an ANN model during training. A perplexity score can then be computed for a current training data sample by computing a probability for the current training data sample based on the PDF. If the perplexity score for a training data sample is lower than a threshold, the training data sample is removed from the training data set so that it will not be utilized for training during subsequent epochs. Training of the ANN model continues following the removal of training data samples from the training data set.

Patent Claims
10 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 3

Original Legal Text

3. The computer-implemented method of claim 1, further comprising prior to a start of an epoch for training the ANN model, adding training data samples previously removed from the training data set back to the training data set.

Plain English translation pending...
Claim 4

Original Legal Text

4. The computer-implemented method of claim 1, wherein the divergence comprises a Kullback-Leibler divergence.

Plain English Translation

The invention relates to a computer-implemented method for analyzing data distributions, specifically addressing the challenge of quantifying differences between probability distributions. The method focuses on measuring divergence, a statistical concept used to assess how one probability distribution differs from another. A key aspect of this method is the use of the Kullback-Leibler (KL) divergence, a well-known information-theoretic measure that quantifies the relative entropy between two distributions. The KL divergence is particularly useful in fields such as machine learning, data compression, and statistical modeling, where understanding the disparity between distributions is critical for tasks like model evaluation, anomaly detection, and feature selection. By employing KL divergence, the method provides a mathematically rigorous way to compare distributions, enabling more accurate and interpretable results in data analysis. The method may involve computing the KL divergence between an empirical distribution derived from observed data and a reference distribution, such as a theoretical or expected distribution. This comparison helps identify discrepancies, optimize models, or detect anomalies in datasets. The approach is computationally efficient and scalable, making it suitable for large-scale data processing applications.

Claim 5

Original Legal Text

5. The computer-implemented method of claim 1, wherein a SoftMax layer of the ANN generates the probability vector.

Plain English Translation

A computer-implemented method involves using an artificial neural network (ANN) to process input data and generate a probability vector. The ANN includes multiple layers, with a SoftMax layer specifically configured to produce the probability vector as its output. The SoftMax layer applies a mathematical function to convert raw output values from the preceding layers into a probability distribution, where each element in the vector represents the likelihood of a corresponding class or category. This method is particularly useful in classification tasks, where the goal is to assign input data to one or more predefined classes based on learned patterns. The ANN may also include other layers, such as convolutional layers for feature extraction or fully connected layers for combining features, depending on the specific application. The probability vector generated by the SoftMax layer allows the system to make probabilistic predictions, which can be further refined or used for decision-making in applications like image recognition, natural language processing, or other machine learning tasks. The method leverages the ANN's ability to learn from training data to improve the accuracy of the probability vector over time.

Claim 6

Original Legal Text

6. The computer-implemented method of claim 1, wherein the data defining the class associated with the current training data sample comprises a one-hot vector.

Plain English Translation

The invention relates to machine learning systems, specifically to methods for processing training data in classification tasks. The problem addressed is the efficient representation and handling of class labels in training datasets, particularly when dealing with categorical data. Traditional methods may use various encoding schemes, but the invention focuses on using a one-hot vector to define the class associated with each training data sample. A one-hot vector is a sparse binary vector where only one element is set to 1, indicating the class, while all others are 0. This approach ensures clear, unambiguous class representation, which is crucial for training machine learning models, especially neural networks, where input data must be numerically encoded. The method involves generating or retrieving a one-hot vector for each class label in the dataset, where the vector's length corresponds to the total number of possible classes. During training, the model processes the input data alongside the one-hot encoded class labels, enabling accurate classification. This encoding method avoids ambiguity and simplifies the model's learning process by providing a standardized format for class labels. The invention is particularly useful in scenarios where multiple classes are present, and precise class differentiation is required. The use of one-hot vectors ensures that the model can effectively distinguish between classes, improving training efficiency and accuracy.

Claim 9

Original Legal Text

9. The computer-implemented method of claim 7, further comprising prior to a start of an epoch for training the ANN, adding training data samples previously removed from the training data set back to the training data set.

Plain English Translation

This invention relates to machine learning, specifically to techniques for managing training data in artificial neural networks (ANNs) to improve model performance. The problem addressed is the degradation of training efficiency and model accuracy when certain training data samples are removed from the training dataset, such as during data cleaning or outlier removal. The solution involves a method for selectively reintroducing previously removed training data samples back into the training dataset before the start of a new training epoch. The method includes identifying training data samples that were previously excluded from the training dataset, such as those flagged as outliers, duplicates, or noisy data. These samples are then evaluated to determine their potential value for improving the ANN's training process. If reintroducing the samples is deemed beneficial, they are added back to the training dataset before the next training epoch begins. This ensures that the ANN has access to a more comprehensive and diverse set of training examples, which can enhance model generalization and robustness. The method may also involve tracking the performance impact of reintroduced samples, allowing for iterative refinement of the training dataset. By dynamically adjusting the training data composition, the technique helps maintain high-quality training while mitigating the negative effects of data removal. This approach is particularly useful in scenarios where data quality is critical, such as in medical imaging, autonomous systems, or other applications requiring high-precision models.

Claim 10

Original Legal Text

10. The computer-implemented method of claim 7, wherein the PDF comprises a Gaussian Mixture Model PDF.

Plain English Translation

A system and method for analyzing data distributions using probabilistic models, particularly Gaussian Mixture Models (GMMs), to improve accuracy in classification, clustering, or density estimation tasks. The invention addresses challenges in handling complex, multi-modal data distributions where traditional single-component models fail to capture underlying patterns effectively. The method involves generating a probability density function (PDF) based on a GMM, which represents data as a weighted sum of multiple Gaussian distributions. This approach allows for flexible modeling of data with varying shapes and modalities. The GMM PDF is constructed by estimating parameters such as mean, covariance, and mixture weights from training data, enabling the model to adapt to different data characteristics. The system may further include preprocessing steps to normalize or transform input data, ensuring optimal performance of the GMM. The method is applicable in fields like machine learning, signal processing, and bioinformatics, where accurate modeling of data distributions is critical. By leveraging the GMM PDF, the system achieves improved robustness and precision in tasks such as anomaly detection, pattern recognition, and probabilistic inference. The invention enhances existing techniques by providing a more sophisticated framework for handling real-world data distributions.

Claim 14

Original Legal Text

14. The computing device of claim 12, wherein the at least one computer storage medium has further computer-executable instructions stored thereupon to add training data samples previously removed from the training data set back to the training data set prior to a start of an epoch for training the ANN model.

Plain English translation pending...
Claim 15

Original Legal Text

15. The computing device of claim 12, wherein the divergence comprises a Kullback-Leibler divergence.

Plain English Translation

A computing device is configured to analyze data distributions by calculating a divergence metric between two probability distributions. The device includes a processor and memory storing instructions that, when executed, cause the processor to receive input data representing a first probability distribution and a second probability distribution. The processor then computes a divergence metric between the two distributions, where the divergence metric quantifies the difference in information content between the distributions. Specifically, the divergence metric is a Kullback-Leibler divergence, which measures the relative entropy between the two distributions. The device may further include a display for visualizing the divergence metric or a communication interface for transmitting the computed divergence to another system. The computing device may also be configured to apply the divergence metric in machine learning tasks, such as model evaluation, feature selection, or anomaly detection, where comparing probability distributions is essential. The system ensures accurate and efficient computation of the divergence, enabling applications in data analysis, statistical modeling, and information theory.

Claim 16

Original Legal Text

16. The computing device of claim 12, wherein a SoftMax layer of the ANN generates the probability vector.

Plain English Translation

A computing device implements an artificial neural network (ANN) to process input data and generate a probability vector representing the likelihood of different output classes. The ANN includes multiple layers, with a SoftMax layer at the output stage. The SoftMax layer converts raw neural network outputs into a probability distribution, ensuring all values sum to one. This allows the system to assign confidence scores to each possible classification, enabling decision-making or further processing based on the most probable outcome. The device may also include preprocessing modules to prepare input data, such as normalization or feature extraction, and post-processing steps to refine the output. The ANN is trained using labeled datasets to optimize its weights and biases, improving accuracy over time. This approach is useful in applications like image recognition, natural language processing, or predictive analytics, where probabilistic outputs are needed for classification tasks. The SoftMax layer ensures the output is interpretable and normalized, making it suitable for real-world decision systems.

Claim 17

Original Legal Text

17. The computing device of claim 12, wherein the data defining the class associated with the current training data sample comprises a one-hot vector.

Plain English Translation

A computing device processes training data samples for machine learning, where each sample is associated with a class. The device includes a memory storing instructions and a processor executing those instructions to perform operations. The operations include receiving a training data sample, determining a class associated with the sample, and generating a one-hot vector representing that class. The one-hot vector is a binary vector where a single element is set to one, indicating the class, while all other elements are zero. This vector is used to encode categorical class labels into a numerical format suitable for machine learning algorithms. The device may also preprocess the training data, such as normalizing or scaling features, before processing. The one-hot vector encoding helps convert categorical labels into a format that machine learning models can process, avoiding ordinal relationships between classes. The system may further include a neural network or other model that uses the encoded class data for training, improving classification accuracy by properly representing categorical variables. The device may also handle multiple classes and dynamically adjust the one-hot vector dimensions based on the number of classes in the dataset.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

Patent Metadata

Filing Date

March 20, 2019

Publication Date

November 8, 2022

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, FAQs, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “Subsampling training data during artificial neural network training” (US-11494614). https://patentable.app/patents/US-11494614

© 2026 Nomic Interactive Technology LLC. Machine-readable context available at /api/llm-context/US-11494614. See llms.txt for full attribution policy.