Securing Machine Learning Models Against Adversarial Samples Through Backdoor Misclassification

PublishedMay 7, 2024

Assigneenot available in USPTO data we have

InventorsSebastien ANDREINA Giorgia Azzurra MARSON Ghassan KARAME

Technical Abstract

Patent Claims

8 claims

Legal claims defining the scope of protection, as filed with the USPTO.

3. The method according to claim 2, further comprising flagging the sample as tampered in the case that the number of times that a respective one of the outputs of the backdoored models is not the same as a respective one of the backdoor classes of the backdoored models is greater than the threshold.

4. The method according to claim 3, wherein the threshold is zero.

6. The method according to claim 5, wherein the training is performed until the respective backdoored model has an accuracy of 90% or higher.

7. The method according to claim 5, wherein the genuine machine learning model and the version of the genuine machine learning model are each trained, and wherein the training of the version of the genuine machine learning model using the training samples having the respective trigger added is additional training to create the respective backdoored model from the genuine machine learning model.

8. The method according to claim 7, wherein the additional training includes training with genuine samples along with the samples having the respective trigger added.

9. The method according to claim 1, wherein the classifying of step b) comprises extracting the logits in the classification of the sample having the trigger attached using the backdoored model, wherein an output class of the backdoored models are not used for determining whether the sample is the adversarial sample, and wherein, in step e), the logits from step b) are compared to a set of honest logits that were computed using a plurality of genuine samples that had respective ones of the triggers attached and were applied to each of the backdoored models.

11. The method according to claim 10, wherein the outlier detection method uses a local outlier factor algorithm.

12. The method according to claim 1, wherein the genuine machine learning model is based on a neural network and trained for image classification.

Patent Metadata

Filing Date

Unknown

Publication Date

May 7, 2024

Inventors

Sebastien ANDREINA

Giorgia Azzurra MARSON

Ghassan KARAME

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search