Systems and Methods for Verifying Decentralized Federated Data Using Influence Evaluation

PublishedJanuary 18, 2022

Assigneenot available in USPTO data we have

Technical Abstract

Patent Claims

18 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A computer-implemented method for verifying decentralized federated data using influence evaluation, at least a portion of the method being performed by one or more federated client computing devices comprising at least one processor, the method comprising: receiving, by the one or more federated client computing devices, a plurality of data instances from one or more federated client devices; calculating, by the federated client computing devices, an influence score for each of a plurality of data instances, wherein calculating the influence score comprises: determining a matrix based on a set of training data parameters utilized for updating a global machine-learning model by the data instances; and utilizing the matrix to calculate the influence score based at least in part on an impact that each of the data instances has on the set of training data parameters; ranking, by the federated client computing devices, the data instances based on the influence scores; determining, by the federated client computing devices, an anomaly score for each of the ranked data instances; selecting, by the federated client computing devices, the ranked data instances with the highest anomaly scores as containing potentially malicious data; and performing, by the federated client computing devices, a security action that protects against the potentially malicious data.

2. The computer-implemented method of claim 1 , wherein the plurality of data instances comprises training data for updating a global machine-learning model.

3. The computer-implemented method of claim 1 , wherein the impact that each of the data instances has on the set of training data parameters comprises a degree of error in predicting a target training data parameter when the target training data parameter is removed from the set of training data parameters.

4. The computer-implemented method of claim 1 , wherein ranking the data instances comprises ranking a number of the data instances below a predetermined threshold.

5. The computer-implemented method of claim 1 , wherein ranking the data instances comprises ranking based on a predetermined influence score.

6. The computer-implemented method of claim 1 , wherein determining the anomaly score comprises: identifying a classification label for each of the ranked data instances; selecting a group of additional data instances stored on the one or more federated client devices that are neighbors of the ranked data instances received from the one or more federated client devices; and determining the anomaly score based on a fraction of the additional data instances having a classification label that is inconsistent with the classification label for the ranked data instances.

7. The computer-implemented method of claim 1 , wherein determining the anomaly score comprises: selecting at least one of the ranked data instances; calculating an average distance between the selected ranked data instance and an additional data instance on each of the federated clients; and determining the anomaly score based on a ratio between the average distance and a largest distance between the selected data instance and the additional data instance.

8. The computer-implemented method of claim 1 , wherein determining the anomaly score comprises: removing at least one data point from the ranked data instances; determining an effect of the removed at least one data point on a federated machine-learning model during a model update; and determining the anomaly score based on a deviation in the model update caused by the removed at least one data point.

9. The computer-implemented method of claim 1 , wherein performing the security action comprises at least one of: removing the data instances with the highest anomaly scores; and flagging the data instances with the highest anomaly scores as containing the potentially malicious data.

10. A system for verifying decentralized federated data using influence evaluation, the system comprising: at least one physical processor; physical memory comprising a plurality of modules and computer-executable instructions that, when executed by the physical processor, cause the physical processor to: calculate, by a calculation module, an influence score for each of a plurality of data instances, wherein the calculation module calculates the influence score by: determining a matrix based on a set of training data parameters utilized for updating a global machine-learning model by the data instances; and utilizing the matrix to calculate the influence score based at least in part on an impact that each of the data instances has on the set of training data parameters; rank, by a ranking module, the data instances based on the influence scores; determine, by a determining module, an anomaly score for each of the ranked data instances; select, by a selection module, the ranked data instances with the highest anomaly scores as containing potentially malicious data; and perform, by a security module, a security action that protects against the potentially malicious data.

11. The system of claim 10 , wherein the plurality of data instances comprises training data for updating a global machine-learning model.

12. The system of claim 10 , wherein the impact that each of the data instances has on the set of training data parameters comprises a degree of error in predicting a target training data parameter when the target parameter is removed from the set of training data parameters.

13. The system of claim 10 , wherein the ranking module ranks the data instances by ranking a number of the data instances below a predetermined threshold.

14. The system of claim 10 , wherein the ranking module ranks the data instances based on a predetermined influence score.

15. The system of claim 10 , wherein the determining module determines the anomaly score by: identifying a classification label for each of the ranked data instances; selecting a group of additional data instances stored on one or more federated client devices that are neighbors of the ranked data instances, wherein the ranked data instances are received from the one or more federated client devices; and determining the anomaly score based on a fraction of the additional data instances having a classification label that is inconsistent with the classification label for the ranked data instances.

16. The system of claim 10 , wherein determining module determines the anomaly score by: selecting at least one of the ranked data instances; calculating an average distance between the selected ranked data instance and an additional data instance on each of the federated clients; and determining the anomaly score based on a ratio between the average distance and a largest distance between the selected data instance and the additional data instance.

17. The system of claim 10 , wherein determining module determines the anomaly score by: removing at least one data point from the ranked data instances; determining an effect of the removed at least one data point on a federated machine-learning model during a model update; and determining the anomaly score based on a deviation in the model update caused by the removed at least one data point.

18. A non-transitory computer-readable medium comprising one or more computer-executable instructions that, when executed by at least one processor of a computing device, cause the computing device to: calculate an influence score for each of a plurality of data instances, wherein the influence score is calculated by: determining a matrix based on a set of training data parameters utilized for updating a global machine-learning model by the data instances; and utilizing the matrix to calculate the influence score based at least in part on an impact that each of the data instances has on the set of training data parameters; rank the data instances based on the influence scores; determine an anomaly score for each of the ranked data instances; select the ranked data instances with the highest anomaly scores as containing potentially malicious data; and perform a security action that protects against the potentially malicious data.

Patent Metadata

Filing Date

Unknown

Publication Date

January 18, 2022

Inventors

Christopher Gates

Yufei Han

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search