Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
1. A method for preserving privacy of a dataset, where the dataset has at least a sensitive data field and one or more fields of at least one first quasi-identifier, the method comprising: determining a k-anonymity value K with respect to the sensitive data field according to the at least one first quasi-identifier; determining to adopt the at least one first quasi-identifier to categorize the dataset into a plurality of groups, if the k-anonymity value K is less than a reference number Kr, wherein data entries in each of the plurality of groups have the same value in the one or more fields of at least one first quasi-identifier and data entries in different groups of the plurality of groups have different values in the one or more fields of at least one first quasi-identifier; determining the number of data entries in each of the plurality of groups; determining a first group among the plurality of groups, wherein the number of data entries, N 1 , in the first group is less than the reference number Kr; determining a second group among the plurality of groups, whereby when the first group and the second group are merged into a merging group, the number of data entries, Nm, in the merging group is not less than the reference number Kr; and masking the one or more fields of at least one first quasi-identifier for the merging group.
A method for preserving data privacy involves grouping data entries based on quasi-identifiers (fields that could potentially identify individuals). First, it determines a k-anonymity value related to sensitive data. If this value is below a threshold (Kr), the method groups the data based on the quasi-identifiers, ensuring each group has the same values for those identifiers, and different groups have different values. It counts the number of entries in each group. If a group has fewer entries than Kr, it merges that group with another group, such that the combined group has at least Kr entries. Finally, it masks (e.g., replaces with a general value or removes) the quasi-identifiers for the merged group to protect privacy.
2. The method for preserving privacy of a dataset as claimed in claim 1 , wherein the step of masking the one or more fields of at least one first quasi-identifier for the merging group further comprises: determining the one or more fields to be masked of at least one first quasi-identifier for the merging group, wherein prior to masking, the values of the fields being masked of the first group are distinct from the values of the same fields being masked of the second group.
The method from the previous description specifies that masking the quasi-identifiers of the merged group involves first deciding which quasi-identifier fields to mask. Crucially, before masking, the values of the chosen fields in the first group being merged must be different from the values of those same fields in the second group being merged. This ensures that masking introduces meaningful generalization and prevents simple re-identification by reversing the masking process.
3. The method for preserving privacy of a dataset as claimed in claim 1 , wherein the step of determining a second group among the plurality of groups further comprises: determining the second group, wherein Nm is a minimum value.
When merging groups to reach the minimum size (Kr) as described before, the method chooses the second group to merge such that the resulting merged group has the *smallest possible* number of total entries. This aims to minimize the amount of data that needs to be generalized or masked, preserving data utility as much as possible while still meeting the privacy threshold.
4. The method for preserving privacy of a dataset as claimed in claim 1 , wherein the step of determining a k-anonymity value K with respect to the sensitive data field further comprises: allowing users to determine the value of the reference number Kr.
The method for preserving privacy described earlier allows users to set the value of the reference number Kr, which dictates the minimum number of data entries required in a group after merging. This user-definable threshold allows administrators to adjust the level of privacy protection based on their specific data sensitivity requirements and risk tolerance, balancing privacy with data utility.
5. The method for preserving privacy of a dataset as claimed in claim 4 , wherein the step of determining the second group among the plurality of groups further comprises: determining the second group, wherein Vm is the minimum value.
Expanding on the method where users define the reference number (Kr), the system also considers a value 'Vm' when determining the second group to merge. The second group is chosen such that Vm is minimized, however, the patent does not specify what Vm represents. It can be inferred from Claim 7 that Vm is the number of values of the sensitive data field of the merging group. This would then mean the second group is chosen such that the number of values of the sensitive data field in the merging group is minimized.
6. The method for preserving privacy of a dataset as claimed in claim 4 , wherein the step of determining to adopt the at least one first quasi-identifier to categorize the dataset into the plurality of groups further comprises: allowing users to determine the value of the reference number Lr.
Expanding on the method where users define the reference number (Kr), users can also define the reference number Lr. However, the patent does not specify what Lr is used for. According to Claim 7, Lr is a reference number that the l-diversity value L with respect to the sensitive data field is compared to.
7. The method for preserving privacy of a dataset as claimed in claim 1 , wherein the step of determining the k-anonymity value K with respect to the sensitive data field further comprises: determining a l-diversity value L with respect to the sensitive data field according to the at least one first quasi-identifier; the step of determining to adopt the at least one first quasi-identifier to categorize the dataset into the plurality of groups further comprises: determining to adopt the at least one first quasi-identifier to categorize the dataset into a plurality of groups, if the l-diversity value L is less than a reference number Lr; the step of determining the number of the data entries in each of the plurality of groups further comprises: determining the number of values of the sensitive data field of the data entries in each group; the step of determining the first group among the plurality of groups further comprises: determining the first group among the plurality of groups, wherein the number of values of the sensitive data field of the data entries, V 1 , in the first group is less than the reference number Lr; the step of determining the second group among the plurality of groups further comprises: determining the second group, whereby the number of values of the sensitive data field of the data entries, Vm, in the merging group is not less than the reference number Lr.
This variation of the data privacy method focuses on l-diversity instead of k-anonymity. It calculates an l-diversity value (L) related to the sensitive data. If L is below a reference number (Lr), data is grouped based on quasi-identifiers. The method counts the number of distinct values of the sensitive data field within each group. If a group has fewer sensitive data values than Lr, it's merged with another group. The merging is done such that the combined group has *at least* Lr distinct sensitive data values. Finally, quasi-identifiers of the merged group are masked.
8. The method for preserving privacy of a dataset as claimed in claim 7 , wherein the step of determining the second group among the plurality of groups further comprises: determining the second group, wherein Nm is the minimum value.
In the l-diversity focused method (from the previous description), where groups are merged to ensure each contains at least Lr distinct values of the sensitive attribute, the selection of the second group to merge is based on minimizing the total number of data entries (Nm) in the resulting merged group. This prioritization helps to limit unnecessary generalization and maintain higher data fidelity while still meeting the l-diversity privacy requirement.
9. The method for preserving privacy of a dataset as claimed in claim 1 , wherein the step of determining the second group among the plurality of groups further comprises: determining the second group, whereby the merging group complies with a set of criteria provided by users.
When merging data groups to meet privacy thresholds, the method allows users to provide a set of criteria that the merged group must satisfy. Instead of blindly merging based solely on size or diversity, the algorithm considers user-defined constraints or rules, offering more granular control over the privacy-preserving transformation process. These criteria might relate to data quality, specific attributes, or business requirements.
10. The method for preserving privacy of a dataset as claimed in claim 1 , wherein the step of determining the second group among the plurality of groups further comprises: the step proceeds by decision tree algorithm; the step of determining the second group among the plurality of groups further comprises: determining the second group, wherein a distance of a path of the first group and a distance of a path of the second group according to the decision tree are minimized.
This version uses a decision tree algorithm to determine the second group for merging. The algorithm aims to minimize the combined "distance" of the paths in the decision tree corresponding to the first group and the second group. By minimizing this distance, the method seeks to merge groups that are most similar based on the decision tree's structure, leading to more coherent and less disruptive data transformations.
11. A product of computer programs stored in a non-transitory computer accessible medium, which comprises a set of computer readable programs for embodying the methods as claimed in claim 1 in one or more computer systems.
This claim describes a software product, stored on a computer-readable medium, containing code that implements the data privacy method described in the first claim. This means the invention can be distributed and used as a software application or library.
12. A computer system, which comprises: a host, which comprises: a bus system; a memory module connecting to the bus system, wherein a set of computer executable instructions is included; and a processing unit connecting to the bus system, where the processing unit executes the set of computer executable instructions for embodying methods as claimed in claim 1 .
This claim describes a computer system that implements the data privacy method described in the first claim. The system includes a host with a bus, memory, and a processing unit. The memory stores instructions that, when executed by the processor, perform the steps of determining k-anonymity, grouping data, merging groups, and masking quasi-identifiers to preserve data privacy.
Unknown
August 19, 2014
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.