US-11645515

Automatically determining poisonous attacks on neural networks

PublishedMay 9, 2023

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

Embodiments relate to a system, program product, and method for automatically determining which activation data points in a neural model have been poisoned to erroneously indicate association with a particular label or labels. A neural network is trained using potentially poisoned training data. Each of the training data points is classified using the network to retain the activations of the last hidden layer, and segment those activations by the label of corresponding training data. Clustering is applied to the retained activations of each segment, and a cluster assessment is conducted for each cluster associated with each label to distinguish clusters with potentially poisoned activations from clusters populated with legitimate activations. The assessment includes executing a set of analyses and integrating the results of the analyses into a determination as to whether a training data set is poisonous based on determining if resultant activation clusters are poisoned.

Patent Claims

11 claims

Legal claims defining the scope of protection, as filed with the USPTO.

2. The system of claim 1, wherein the integrity assessment of the cluster data further comprises the cluster manager configured to select a preliminary topic assignment or a topic assignment based on the analysis of the analyzed information.

3. The system of claim 2, wherein the topic assignment based on the analysis further comprises the cluster manager configured to analyze topic text indicative of the poisonous classification or the legitimate classification.

6. The system of claim 1, wherein the cluster manager is configured to rank the integrity assessments of the clusters as a function of historical performance.

7. The system of claim 1, wherein the training manager is configured to retrain the neural model based on one or more of the integrity assessments.

9. The computer program product of claim 8, wherein integrity assessment of the cluster data further comprises program code executable by the processor to select a preliminary topic assignment or a topic assignment based on the analysis of the analyzed information.

10. The computer program product of claim 9, wherein the topic assignment based on the analysis further comprises program code executable by the processor to analyze topic text indicative of the poisonous classification or the legitimate classification.

13. The computer program product of claim 8, further comprising program code executable by the processor to rank the integrity assessments of the clusters as a function of historical performance.

15. The method of claim 14, wherein the cluster data includes a preliminary topic assignment or a topic assignment based on the analysis of the analyzed information.

16. The method of claim 15, wherein the topic assignment based on the analysis includes topic text indicative of the poisonous classification or the legitimate classification.

19. The method of claim 14, wherein the assessing integrity of data in the untrusted data set comprises conducting a plurality of integrity assessments, and wherein the method further comprises ranking the integrity assessments of the clusters as a function of historical performance.

20. The method of claim 14, further comprising retraining the neural model based on one or more of the integrity assessments.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G06N G06F G06V

Patent Metadata

Filing Date

September 16, 2019

Publication Date

May 9, 2023

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search