US-10810317

Sensitive data classification

PublishedOctober 20, 2020

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

A gateway device includes a network interface connected to data sources, and computer instructions, that when executed cause a processor to access data portions from the data sources. The processor accesses classification rules, which are configured to classify a data portion of the plurality of data portions as sensitive data in response to the data portion satisfying the rule. Each rule is associated with a significance factor representative of an accuracy of the classification rule. The processor applies each of the set of classification rules to a data portion to obtain an output of whether the data is sensitive data. The output are weighed by significance factors to produce a set of weighted outputs. The processor determines if the data portion is sensitive data by aggregating the set of weighted outputs, and presents the determination in a user interface. Security operations may also be performed on the data portion.

Patent Claims

20 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A gateway device, comprising: a network interface communicatively coupled with a plurality of data sources; a hardware processor; and a non-transitory computer readable storage medium storing computer readable instructions, that when executed by the hardware processor, cause the hardware processor to: access data from one or more of the plurality of data sources, the accessed data comprising a plurality of data portions; access a set of classification rules, each of the set of classification rules configured to classify a data portion of the plurality of data portions as sensitive data in response to the data portion satisfying the classification rule, and each of the set of classification rules further associated with a significance factor representative of an accuracy of the classification rule in classifying data portions as sensitive, wherein the significance factor associated with a classification rule is based on 1) a type of sensitive data that the classification rule is configured to detect and 2) an expected rate of false positives associated with the type of sensitive data; apply each of the set of classification rules to a data portion to obtain an output representative of whether the data portion is sensitive data; weigh the output from each application of a classification rule by the significance factor associated with the classification rule to produce a set of weighted outputs; determine if the data portion is sensitive by aggregating the set of weighted outputs; in response to determining that the data portion is sensitive, modify a user interface presented to a user to indicate that the data portion is determined to be sensitive and presenting a set of security operations that can be taken in response to the determination that the data portion is sensitive; and in response to a selection of a presented security operation, perform the security operation to reduce a security risk associated with the data portion.

2. The device of claim 1 , wherein a data portion is at least one of: a cell in a table, a non-delimited string of characters, and a file.

3. The device of claim 1 , wherein sensitive data is at least one of: an address component, a date of birth, a telephone number, an email address, a social security number, a financial account number, a password, and a username.

4. The device of claim 1 , wherein a classification rule of the set of classification rules is satisfied when the data portion matches a pre-defined pattern associated with the classification rule.

5. The device of claim 1 , wherein a classification rule of the set of classification rules is satisfied when a one or more data parsing rules associated with the classification rule, when applied to the data portion, return a true value.

6. The device of claim 1 , wherein a classification rule of the set of classification rules is satisfied when a one or more contextual data requirements specified by the classification rule are satisfied by one or both of the data portion and associated data sources of the plurality of data source.

7. The device of claim 1 , wherein a classification rule of the set of classification rules is satisfied when the data portion matches an entry in a reference table specified by the classification rule.

8. The device of claim 1 , wherein a classification rule of the set of classification rules is satisfied when a trained machine learning model associated with the classification rule returns a score for the data portion beyond a threshold value, the score computed by the machine learning model based on one or more features extracted from the data portion and used as input for the machine learning model.

9. The device of claim 1 , wherein the one or more security operations include at least one of encryption, tokenization, and obfuscation, and wherein the one or more security operations that are performed are selected based on a desired security level for the data portion.

10. The device of claim 1 , wherein the significance factor associated with a classification rule is further based on an accuracy of the classification rule determined based on a number of false positives generated by the classification rule when applied to a training data set.

11. A computer-implemented method, comprising: accessing data from one or more of a plurality of data sources, the accessed data comprising a plurality of data portions; accessing a set of classification rules, each of the set of classification rules configured to classify a data portion of the plurality of data portions as sensitive data in response to the data portion satisfying the classification rule, and each of the set of classification rules further associated with a significance factor representative of an accuracy of the classification rule in classifying data portions as sensitive wherein the significance factor associated with a classification rule is based on 1) a type of sensitive data that the classification rule is configured to detect and 2) an expected rate of false positives associated with the type of sensitive data; applying each of the set of classification rules to a data portion to obtain an output representative of whether the data portion is sensitive data; weighting the output from each application of a classification rule by the significance factor associated with the classification rule to produce a set of weighted outputs; determining if the data portion is sensitive by aggregating the set of weighted outputs; and in response to determining that the data portion is sensitive, performing one or more security operations on the data portion to reduce a security risk associated with the data portion.

12. The method of claim 11 , wherein a data portion is at least one of: a cell in a table, a non-delimited string of characters, and a file.

13. The method of claim 11 , wherein sensitive data is at least one of: an address component, a date of birth, a telephone number, an email address, a social security number, a financial account number, a password, and a username.

14. The method of claim 11 , wherein a classification rule of the set of classification rules is satisfied when the data portion matches a pre-defined pattern associated with the classification rule.

15. The method of claim 11 , wherein a classification rule of the set of classification rules is satisfied when a one or more data parsing rules associated with the classification rule, when applied to the data portion, return a true value.

16. The method of claim 11 , wherein a classification rule of the set of classification rules is satisfied when a one or more contextual data requirements specified by the classification rule are satisfied by one or both of the data portion and associated data sources of the plurality of data source.

17. The method of claim 11 , wherein a classification rule of the set of classification rules is satisfied when the data portion matches an entry in a reference table specified by the classification rule.

18. The method of claim 11 , wherein a classification rule of the set of classification rules is satisfied when a trained machine learning model associated with the classification rule returns a score for the data portion beyond a threshold value, the score computed by the machine learning model based on one or more features extracted from the data portion and used as input for the machine learning model.

19. The method of claim 11 , wherein the one or more security operations include at least one of encryption, tokenization, and obfuscation, and wherein the one or more security operations that are performed are selected based on a desired security level for the data portion.

20. The method of claim 11 , wherein the significance factor associated with a classification rule is further based on an accuracy of the classification rule determined based on a number of false positives generated by the classification rule when applied to a training data set.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G06F G06N

Patent Metadata

Filing Date

February 9, 2018

Publication Date

October 20, 2020

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search