System and Method for Detecting and Analyzing Pattern Relationships

PublishedAugust 9, 2011

Assigneenot available in USPTO data we have

Technical Abstract

Patent Claims

39 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A computerized method for analyzing relationships among patterns within a data set having a set of samples and a corresponding attribute value for each attribute of each said sample, the method implemented as a set of instructions for execution by a processor, the method comprising: receiving at a computer interface at least two patterns, the computer interface coupled to the computer processor; defining a data cluster within the data set for each of said at least two patterns, each defined data cluster having samples with attribute values associated with a corresponding pattern of said at least two patterns; grouping at least some of the samples of each defined data cluster with one another to generate a resultant data cluster; calculating a variation between the attribute value of a first set of samples and the attribute value of a second set of samples within said resultant data cluster, the attribute value of the first set of samples and the second set of samples corresponding to the same attribute; and outputting the variation as data representing a measure of relevance of the first set of samples with the second set of samples.

2. The method according to claim 1 , wherein defining the data cluster comprises defining a minimal subset of data within the data set having attribute values associated with the corresponding pattern.

3. The method according to claim 1 , wherein the defined data cluster includes samples with attribute values associated with a plurality of said at least two patterns.

4. The method according to claim 1 , wherein grouping at least some of the samples of each defined data cluster with one another to generate the resultant data cluster further comprises: grouping all of the samples and attributes of each defined data cluster to provide a resultant data cluster having attribute values for each of the samples and attributes of each defined data cluster.

5. The method according to claim 1 , wherein grouping at least some of the samples of each defined data cluster with one another comprises grouping common samples having same attribute values for same attributes associated with each defined data cluster to define an overlapping data portion and generate the resultant data cluster.

6. The method according to claim 4 , wherein the resultant data cluster comprises at least one corner region having attribute values associated with samples and attributes located outside each defined data cluster, and wherein calculating the variation further comprises calculating the variation between the first set of samples and the second set of samples within each one of said at least one corner region.

7. The method according to claim 6 , wherein calculating the variation further comprises calculating a joint entropy of the attribute values for each of the attributes of the first and second set of samples within each said at least one corner region and summing the calculated entropies corresponding to each at least one corner region to provide the variation.

8. The method according to claim 4 , wherein calculating the variation between the attribute values of the first and second set of samples comprises calculating a joint entropy of the attribute values for each of the attributes of the first and second set of samples within the resultant data cluster.

9. The method according to claim 1 , wherein grouping at least some of the samples of each defined data cluster with one another further comprises grouping the data clusters in dependence upon a weighted combination of the data clusters.

10. The method according to claim 7 , wherein calculating the variation further comprises weighting the calculated entropy by a measured number of samples and attributes in each of said at least one corner regions.

11. The method according to claim 8 , wherein calculating the variation further comprises weighting the calculated entropy by a measured number of samples and attributes in the resultant data cluster.

12. The method according to claim 10 , wherein calculating the variation further comprises normalizing the calculated entropy by an expected possible number of values for each attribute within each of said at least one corner regions.

13. The method according to claim 11 , wherein calculating the variation further comprises normalizing the calculated entropy by an expected possible number of values within the resultant data cluster.

14. The method according to claim 1 , wherein calculating the variation is in dependence upon a count of the number of common samples and attribute values shared between the defined data clusters.

15. The method according to claim 14 , wherein calculating the variation is further in dependence upon a count of the number differing samples and the number of differing attribute values corresponding to the defined data clusters.

16. The method according to claim 1 , wherein the first set of samples and the second set of samples are the same samples.

17. The method according to claim 1 further comprising the step of communicating the variation to a storage for subsequent access by a pattern post processing task.

18. The method according to claim 17 , wherein the storage is a distance measure repository.

19. The method according to claim 17 , wherein the variation is communicated across a network to the distance measure repository.

20. The method according to claim 17 , wherein the post processing task is selected from the group consisting of: pattern clustering, pattern pruning, pattern summarization, pattern visualization, and pattern classification.

21. A computerized system for analyzing relationships among patterns within a data set having a set of samples and a corresponding attribute value for each attribute of each said sample, the system comprises: a processor and a memory configured for implementing a plurality of modules comprising: a pattern inducing module of the plurality of modules configured for receiving at an input at least two patterns, and defining a data cluster within the data set for each of said at least two patterns, each defined data cluster having samples with attribute values associated with a corresponding pattern of said at least two patterns; a prototyping module of the plurality of modules configured for grouping at least some of the samples of each defined data cluster with one another to generate a resultant data cluster; and a distancing module of the plurality of modules configured for calculating a variation between the attribute value of a first set of samples and the attribute value of a second set of samples within said resultant data cluster, the attribute value of the first set of samples and the second set of samples corresponding to the same attribute, and configured for outputting the variation as data representing a measure of relevance of the first set of samples with the second set of samples.

22. The system according to claim 21 , further comprising the pattern inducing module configured for defining a minimal subset of data within the data set having attribute values associated with the corresponding pattern.

23. The system according to claim 21 , wherein the defined data cluster includes samples with attribute values associated with a plurality of said at least two patterns.

24. The system according to claim 21 , further comprising the prototyping module configured for grouping all of the samples and attributes of each defined data cluster to provide a resultant data cluster having attribute values for each of the samples and attributes of each defined data cluster.

25. The system according to claim 21 , further comprising the prototyping module configured for grouping common samples having same attribute values for same attributes associated with each defined data cluster to define an overlapping data portion and generate the resultant data cluster.

26. The system according to claim 24 , wherein the resultant data cluster comprises at least one corner region having attribute values associated with samples and attributes located outside each defined data cluster, and further comprising the distancing module configured for calculating the variation between the first set of samples and the second set of samples within each one of said at least one corner region.

27. The system according to claim 26 , further comprising the distancing module configured for calculating a joint entropy of the attribute values for each of the attributes of the first and second set of samples within each said at least one corner region and summing the calculated entropies corresponding to each at least one corner region to provide the variation.

28. The system according to claim 24 , wherein the distancing module configured for calculating the variation between the attribute values of the first and second set of samples comprises calculating a joint entropy of the attribute values for each of the attributes of the first and second set of samples within the resultant data cluster.

29. The system according to claim 21 , further comprising the prototyping module configured for grouping the data clusters in dependence upon a weighted combination of the data clusters.

30. The system according to claim 27 , further comprising the distancing module configured for weighting the calculated entropy by a measured number of samples and attributes in each of said at least one corner regions.

31. The system according to claim 28 , further comprising the distancing module configured for weighting the calculated entropy by a measured number of samples and attributes in the resultant data cluster.

32. The system according to claim 30 , further comprising the distancing module configured for normalizing the calculated entropy by an expected possible number of values for each attribute within each of said at least one corner regions.

33. The system according to claim 31 , further comprising the distancing module configured for normalizing the calculated entropy by an expected possible number of values within the resultant data cluster.

34. The system according to claim 21 , further comprising the distancing module configured for calculating the variation in dependence upon a count of the number of common samples and attribute values shared between the defined data clusters.

35. The system according to claim 34 , further comprising the distancing module configured for calculating the variation in dependence upon a count of the number differing samples and the number of differing attribute values corresponding to the defined data clusters.

36. The system according to claim 21 , wherein the first set of samples and the second set of samples are the same samples.

37. The system according to claim 21 , wherein said at least two patterns are selected from the group comprising: event association patterns, correlation rules, frequent itemsets and association rules.

38. The system according to claim 21 , wherein the variation as output is communicated to a storage for subsequent access by a pattern post processing task.

39. The system according to claim 38 , wherein the post processing task is selected from the group consisting of: pattern clustering, pattern pruning, pattern summarization, pattern visualization, and pattern classification.

Patent Metadata

Filing Date

Unknown

Publication Date

August 9, 2011

Inventors

Andrew Wong

Chung Lam Li

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search