Data Class Analysis Method and Apparatus

PublishedAugust 25, 2020

Assigneenot available in USPTO data we have

Technical Abstract

Patent Claims

21 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. An apparatus for computing, comprising: a computer processor comprising at least one central processing unit (“CPU”); a separability module to be operated by the computer processor to determine if a data class in a plurality of data classes is separable, wherein to determine if the data class is separable, the separability module is to determine an average intra-class similarity within each class in the plurality of data classes, an inter-class similarity across all data classes in the plurality of data classes, and is to determine separability of the data class based on the average intra-class similarity relative to the inter-class similarity; and an output and implementation module to be operated by the computer processor, to output a result of the separability of the data class to a data collector, wherein the data collector is to adapt data collection based at least in part on the result of the separability of the data class; wherein to determine the inter-class similarity across all data classes, the separability module is to determine, for a pair of data classes in the plurality of data classes, an average inter-class similarity, wherein the average inter-class similarity is determined either according to a similarity value for each signal in a first class in the pair of data classes relative to each signal in a second class in the pair of data classes and an average of such similarity values, or a similarity value for each signal in a first class in the pair of data classes relative to the average signal in a second class in the pair of data classes and an average of such similarity values, wherein the separability module is to fill a set of off-diagonal slots of a class separability matrix with the inter-class similarity for each pair of data classes in the plurality of classes, wherein the separability module is to, for each row in the class separability matrix, divide each off-diagonal slot in the row by a diagonal slot in the row and replace each off-diagonal slot with the result thereof; and wherein the diagonal slots of the class separability matrix are filled with the average intra-class similarity within each class.

2. The apparatus according to claim 1 , wherein to determine the average intra-class similarity within each class, the separability module is to determine, for each class, either an intra-class similarity value for all pairs of signals within a then-current class and an average of the intra-class similarity value for all pairs of signals within the then-current class, or an average intra-class value of all signals within a then-current class, a similarity of each signal in the then-current class relative to the average intra-class value of all signals within the then-current class, and an average of the similarity of each signal relative to the average intra-class value for each class, wherein the separability module is to further fill the set of diagonal slots of the class separability matrix with the average intra-class similarity within each class.

3. The apparatus according to claim 2 , wherein the separability module is to determine a pair of data classes to be inseparable from one another when an off-diagonal slot at an intersection of the pair of data classes in the class separability matrix has a value greater than an inter-class threshold and is to either combine the pair of data classes into one class in the plurality of data classes for a machine learning problem or drop one of the pair of data classes for the machine learning problem.

4. The apparatus according to claim 2 , wherein the separability module is to determine the data class to be highly variable when a diagonal slot of the data class in the set of diagonal slots has a value less than an intra-class threshold and wherein the output and implementation module is to output the result that the data class is highly variable and is to remove the data class from the plurality of data classes for a machine learning problem.

5. The apparatus according to claim 1 , wherein the computer processor further comprises a hardware accelerator encoded with a logic to perform a comparison; wherein the logic is used by the separability module to determine if the data class in a plurality of data classes is separable; wherein the logic to perform the comparison is executed at least in part by a set of artificial neurons of the hardware accelerator; and wherein pairs of signals are loaded in the artificial neurons at least in part to determine if the data class in the plurality of data classes is separable.

6. A computer implemented method, comprising: determining if a data class in a plurality of data classes is separable, wherein determining if the data class is separable comprises determining an average intra-class similarity within each data class in the plurality of data classes, an inter-class similarity across all data classes of the plurality of data classes, and determining separability of the data class based on the average intra-class similarity relative to the inter-class similarity; and adapting a data collection based at least in part on a result of determining if the data class in the plurality of data classes is separable; wherein determining the inter-class similarity across all data classes comprises determining, for a pair of data classes in the plurality of data classes, an average inter-class similarity, wherein determining the average inter-class similarity comprises either determining a similarity value for each signal in a first class in the pair of data classes relative to each signal in a second class in the pair of data classes and an average of such similarity values, or determining a similarity value for each signal in a first class in the pair of data classes relative to the average signal in a second class in the pair of data classes and an average of such similarity values, further comprising filling a set of off-diagonal slots of a class separability matrix with the inter-class similarity for each pair of data classes in the plurality of classes, further comprising, for each row in the class separability matrix, dividing each off-diagonal slot in the row by a diagonal slot in the row and replacing each off-diagonal slot with the result thereof; wherein the diagonal slots of the class separability matrix are filled with the average intra-class similarity within each class.

7. The method according to claim 6 , wherein determining the average intra-class similarity within each data class comprises determining, for each data class, either an intra-class similarity value for all pairs of signals within a then-current class and an average of the intra-class similarity value for all pairs of signals within the then-current class, or an average intra-class value of all signals within a then-current class, a similarity of each signal in the then-current class relative to the average intra-class value of all signals within the then-current class, and an average of the similarity of each signal relative to the average intra-class value for each class, further comprising filling the set of diagonal slots of the class separability matrix with the average intra-class similarity within each data class.

8. The method according to claim 7 , further comprising determining the data class to be highly variable when a diagonal slot of the data class in the set of diagonal slots has a value less than an intra-class threshold and removing the data class from the plurality of data classes for a machine learning problem.

9. The method according to claim 7 , further comprising determining a pair of data classes to be inseparable from one another when the off-diagonal slot at an intersection of the pair of data classes in the class separability matrix has a value greater than an inter-class threshold and either combining the pair of data classes into one class in the plurality of data classes for a machine learning problem or dropping one of the pair of data classes for the machine learning problem.

10. The method according to claim 6 , wherein determining is performed at least in part with a hardware accelerator encoded with a logic to perform a comparison, the hardware accelerator having a set of artificial neurons; and wherein determining comprises loading pairs of signals in the artificial neurons and using the artificial neurons at least in part to determine if the data class in the plurality of data classes is separable.

11. An apparatus for computing, comprising: means to determine if a data class in a plurality of data classes is separable, wherein means to determine if the data class is separable comprises means to determine an average intra-class similarity within each data class in the plurality of data classes, means to determine an inter-class similarity across all data classes of the plurality of data classes, and means to determine separability of the data class based on the average intra-class similarity relative to the inter-class similarity; means to adapt a data collection based at least in part on a result obtained from the means to determine if the data class in the plurality of data classes is separable; wherein means to determine the inter-class similarity across all data classes comprises means to determine, for a pair of data classes in the plurality of data classes, an average inter-class similarity, wherein means to determine the average inter-class similarity comprises either means to determine a similarity value for each signal in a first class in the pair of data classes relative to each signal in a second class in the pair of data classes and an average of such similarity values, or means to determine a similarity value for each signal in a first class in the pair of data classes relative to the average signal in a second class in the pair of data classes and an average of such similarity values, further comprising means to fill a set of off-diagonal slots of a class separability matrix with the inter-class similarity for each pair of data classes in the plurality of classes and, for each row in the class separability matrix, means to divide each off-diagonal slot in the row by a diagonal slot in the row and replace each off-diagonal slot with a result thereof; wherein the diagonal slots of the class separability matrix are filled with the average intra-class similarity within each class.

12. The apparatus according to claim 11 , wherein means to determine the average intra-class similarity within each data class comprises means to determine, for each data class, either an intra-class similarity value for all pairs of signals within a then-current class and an average of the intra-class similarity value for all pairs of signals within the then-current class, or an average intra-class value of all signals within a then-current class, a similarity of each signal in the then-current class relative to the average intra-class value of all signals within the then-current class, and an average of the similarity of each signal relative to the average intra-class value for each class, further comprising means to fill the set of diagonal slots of the class separability matrix with the average intra-class similarity within each data class.

13. The apparatus according to claim 12 , further comprising means to determine the data class to be highly variable when a diagonal slot of the data class in the set of diagonal slots has a value less than an intra-class threshold and further comprising means to remove the data class from the plurality of data classes for a machine learning problem.

14. The apparatus according to claim 12 , further comprising means to determine a pair of data classes to be inseparable from one another when an off-diagonal slot at an intersection of the pair of data classes in the class separability matrix has a value greater than an inter-class threshold and further comprising means to either combine the pair of data classes into one class in the plurality of data classes for a machine learning problem or drop one of the pair of data classes for the machine learning problem.

15. The apparatus according to claim 11 , wherein the means to determine comprises a hardware accelerator encoded with a logic to perform a comparison; and wherein the apparatus further comprises an internal sensor subsystem controlled at least in part by the hardware accelerator and further comprising means to receive an identification of the plurality of data classes and means to collect with the internal sensor subsystem a set of signal data comprising the plurality of data classes, wherein the set of signal data have a common set of units.

16. One or more non-transitory computer-readable media comprising instructions that cause a computer device, in response to execution of the instructions by a processor of the computer device, to: determine if a data class in a plurality of data classes is separable, wherein determine if the data class is separable comprises determine an average intra-class similarity within each data class in the plurality of data classes, determine an inter-class similarity across all data classes in the plurality of data classes, and determine separability of the data class based on the average intra-class similarity relative to the inter-class similarity; and adapt a data collection based at least in part on a result obtained from the determination if the data class in the plurality of data classes is separable; wherein determine the inter-class similarity across all data classes comprises determine, for a pair of data classes in the plurality of data classes, an average inter-class similarity, wherein determine the average inter-class similarity comprises either determine a similarity value for each signal in a first class in the pair of data classes relative to each signal in a second class in the pair of data classes and an average of such similarity values, or determine a similarity value for each signal in a first class in the pair of data classes relative to the average signal in a second class in the pair of data classes and an average of such similarity values, and where the instructions are further to cause the processor to fill a set of off-diagonal slots of a class separability matrix with the inter-class similarity for each pair of data classes in the plurality of classes and, for each row in the class separability matrix, divide each off-diagonal slot in the row by the diagonal slot in the row and replace each off-diagonal slot with the result thereof wherein the diagonal slots of the class separability matrix are filled with the average intra-class similarity within each class.

17. The computer-readable media according to claim 16 , wherein determine the average intra-class similarity within each data class comprises determine, for each data class, either i) an intra-class similarity value for all pairs of signals within a then-current class and an average of the intra-class similarity value for all pairs of signals within the then-current class, or ii) an average intra-class value of all signals within a then-current class, a similarity of each signal in the then-current class relative to the average intra-class value of all signals within the then-current class, and an average of the similarity of each signal relative to the average intra-class value for each class, wherein the instructions are further to cause to the processor to fill the set of diagonal slots of the class separability matrix with the average intra-class similarity within each data class.

18. The computer-readable media according to claim 17 , further comprising determine the data class to be highly variable when a diagonal slot of the data class in the set of diagonal classes has a value less than an intra-class threshold and remove the data class from the plurality of data classes for a machine learning problem.

19. The computer-readable media according to claim 17 , wherein the instructions further cause the processor to determine a pair of data classes to be inseparable from one another when an off-diagonal slot at the intersection of the pair of data classes in the class separability matrix has a value greater than an inter-class threshold.

20. The computer-readable media according to claim 16 , wherein to determine separability of the data class based on the average intra-class similarity relative to the inter-class similarity determines that the pair of data classes are inseparable and further comprising either combine the pair of data classes into one class in the plurality of data classes for a machine learning problem or drop one of the pair of data classes for the machine learning problem.

21. The computer-readable media according to claim 16 , wherein to determine comprises to use a hardware accelerator encoded with a logic to perform a comparison; and wherein the hardware accelerator comprises a set of artificial neurons; and wherein to determine comprises to load pairs of signals in the artificial neurons to determine if the data class in the plurality of data classes is separable.

Patent Metadata

Filing Date

Unknown

Publication Date

August 25, 2020

Inventors

Darshan Iyer

Nilesh K. Jain

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search