Scalable Methods for Detecting Significant Traffic Patterns in a Data Network

PublishedAugust 17, 2010

Assigneenot available in USPTO data we have

InventorsTian Bu Jin Cao Aiyou Chen Pak-Ching Lee

Technical Abstract

Patent Claims

20 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A computer implemented method for detecting traffic patterns in a data network, the method comprising: partitioning keys of the data network into D sub-keys, wherein each key has |1≦i≦D| sub-keys, D refers to the number of sub-keys in a key, each key having a length of D; constructing D hash arrays, wherein each hash array i includes M i independent hash tables each having K buckets, with each of the buckets having an associated traffic total, wherein each of the D sub-keys corresponds with one of the D hash arrays, each of the sub-keys D i through D D corresponds to one of each of hash arrays i−D, each independent hash table M i corresponding to the ith hash array in a sequential hashing scheme, and each of the D sub-keys are associated with one bucket of each of the M i independent hash tables, wherein K refers to a number of buckets in each hash array, and wherein each of the keys corresponds with a single bucket of each of the M i independent hash tables; updating a traffic total of each bucket that corresponds with a key responsive to receiving traffic associated with the key; identifying high traffic buckets of the M independent hash tables having a traffic total greater than a threshold value; and detecting traffic patterns of the data network based on the high traffic buckets.

2. The computer implemented method of claim 1 , wherein detecting the traffic patterns further comprises: detecting high traffic users of the data network based on the high traffic buckets, wherein the high traffic users are keys of the data network having a traffic total exceeding a traffic total threshold.

3. The method of claim 2 , wherein detecting the high traffic users further comprises: identifying a candidate set of possible high traffic users based on the high traffic buckets, wherein identifying the candidate set comprises recursively performing from |1≦i≦D| for each of the D hash arrays the following steps: concatenating each sub-key x′ of a set C i−1 of high traffic sub-keys identified for a previously checked hash array i−1 with a set of possible bit values from 0 to 2 b i −1 to form a set of sub-keys x″; checking buckets of the M i independent hash tables of a presently checked hash array i corresponding to a hash of each of the sub-keys x″ to determine whether traffic totals of any of the checked buckets are less than the threshold value; and adding one or more sub-keys x″ of the set of sub-keys x″ to a set C i of possible high traffic sub-keys responsive to determining that none of the checked buckets for the one or more sub-keys x″ have traffic totals less than the threshold value, wherein the candidate set is based on a set C D of possible high traffic sub-keys; and analyzing the candidate set to detect the high traffic users; and wherein x refers to a key; x′ and x″ refers to sub-keys of a key; C and C i refer to the size of the candidate set of high traffic users, wherein the notation of C with a subscript i denotes the corresponding quantities for the ith hash array in a sequential hashing scheme.

4. The computer implemented method of claim 3 , including analyzing the candidate set further comprises: performing linear regression on the candidate set to estimate the high traffic users.

5. The computer implements method of claim 1 , wherein detecting the traffic patterns further comprises: detecting significant traffic change users of the data network based on the high traffic buckets, wherein the significant traffic change users are keys of the data network having a change in traffic volume between two monitoring intervals which is greater than or equal to a traffic change threshold.

6. The computer implemented method of claim 5 , wherein detecting the significant traffic change users further comprises: identifying a candidate set of possible significant traffic change users based on the high traffic buckets, wherein identifying the candidate set comprises recursively performing from |1≦i≦D| for each of the D hash arrays the following steps: concatenating each sub-key x′ of a set C i−1 of high traffic sub-keys identified for a previously checked hash array i−1 with a set of possible bit values from 0 to 2 b i −1 to form a set of sub-keys x″; checking buckets of the M i independent hash tables of a presently checked hash array i corresponding to a hash of each of the sub-keys x″ to determine whether traffic totals of any of the checked buckets are less than the threshold value; and determining a number of misses for each of the set of sub-keys x″, wherein the number of misses for a sub-key x″ is based on a number of checked buckets corresponding to the sub-key x″ that have a traffic total less than the threshold value; adding one or more sub-keys x″ of the set of sub-keys x″ to a set C i of possible high traffic sub-keys responsive to determining that none of the checked buckets for the one or more sub-keys x″ have traffic totals less than the threshold value, wherein the candidate set is based on a set C D of possible high traffic sub-keys; and analyzing the candidate set to detect the high traffic users; and wherein x refers to the key; x′ and x″ refers to sub-keys of the key; C and C i refer to the size of the candidate set of high traffic users, wherein the notation of C with a subscript i denotes the corresponding quantities for the ith hash array in a sequential hashing scheme.

7. The computer Implemented method of claim 1 , wherein the number of the D hash arrays corresponds with the number of bits of the key.

8. The computer Implemented method of claim 1 , wherein the key corresponds with a source address, a destination address, a source port, a destination port and a protocol of the traffic.

9. An apparatus for detecting traffic patterns in a data network, the apparatus comprising: Memory storing D hash arrays, each hash array I, wherein |1≦i≦D| includes M i independent hash tables each having K buckets, each of the buckets having an associated traffic total, wherein keys of the data network are partitioned into D sub-keys, wherein each key has |1≦i≦D| sub-keys, D refers to the number of sub-keys in a key, each key having a length of D, wherein each of the D sub-keys corresponds with one of the D hash arrays, each of the sub-keys D i through D D corresponds to one of each of hash arrays i−D, each independent hash table M i corresponding to the ith hash array in a sequential hashing scheme, and each of the D sub-keys are associated with one bucket of each of the M i independent hash tables, wherein K refers to a number of buckets in each hash array, and each of the keys corresponds with a single bucket of each of the M i independent hash tables; an interface system to receive traffic associated with a key; and a processing system coupled to the memory and coupled to the interface system, the processing system performs: updating a traffic total of each bucket that corresponds with a key responsive to receiving traffic associated with the key; identifying high traffic buckets of the M independent hash tables having a traffic total greater than a threshold value; and detecting traffic patterns of the data network based on the high traffic buckets.

10. The apparatus of claim 9 , wherein the processing system is further adapted to detect high traffic users of the data network based on the high traffic buckets, wherein the high traffic users are keys of the data network having a traffic total exceeding a traffic total threshold.

11. The apparatus of claim 10 , wherein the processing system is further adapted to: identify a candidate set of possible high traffic users based on the high traffic buckets, wherein identifying the candidate set comprises recursively performing from |1≦i≦D| for each of the D hash arrays the following steps: concatenating each sub-key x′ of a set C i−1 of high traffic sub-keys identified for a previously checked hash array i−1 with a set of possible bit values from 0 to 2 b i −1 to form a set of sub-keys x″; checking buckets of the M i independent hash tables of a presently checked hash array i corresponding to a hash of each of the sub-keys x″ to determine whether traffic totals of any of the checked buckets are less than the threshold value; and adding one or more sub-keys x″ of the set of sub-keys x″ to a set C i of possible high traffic sub-keys responsive to determining that none of the checked buckets for the one or more sub-keys x″ have traffic totals less than the threshold value, wherein the candidate set is based on a set C D of possible high traffic sub-keys; and analyzing the candidate set to detect the high traffic users; and wherein x refers to a key; x′ and x″ refers to sub-keys of a key; C and C i refer to the size of the candidate set of high traffic users, wherein the notation of C with a subscript i denotes the corresponding quantities for the ith hash array in a sequential hashing scheme.

12. The apparatus of claim 11 , wherein the processing system is further adapted to perform linear regression on the candidate set to estimate the high traffic users.

13. The apparatus of claim 9 , wherein the processing system is further adapted to detect significant traffic change users of the data network based on the high traffic buckets, wherein the significant traffic change users are keys of the data network having a change in traffic volume between two monitoring intervals which is greater than or equal to a traffic change threshold.

14. The computer implemented method of claim 13 , wherein the processing system is further adapted to: identify a candidate set of possible significant traffic change users based on the high traffic buckets, wherein identifying the candidate set comprises recursively performing from |1≦i≦D| for each of the D hash arrays the following steps: concatenating each sub-key x′ of a set C i−1 of high traffic sub-keys identified for a previously checked hash array i−1 with a set of possible bit values from 0 to 2 b i −1 to form a set of sub-keys x″; checking buckets of the M i independent hash tables of a presently checked hash array i corresponding to a hash of each of the sub-keys x″ to determine whether traffic totals of any of the checked buckets are less than the threshold value; and determining a number of misses for each of the set of sub-keys x″, wherein the number of misses for a sub-key x″ is based on a number of checked buckets corresponding to the sub-key x″ that have a traffic total less than the threshold value; adding one or more sub-keys x″ of the set of sub-keys x″ to a set C i of possible high traffic sub-keys responsive to determining that none of the checked buckets for the one or more sub-keys x″ have traffic totals less than the threshold value, wherein the candidate set is based on a set C D of possible high traffic sub-keys; and analyzing the candidate set to detect the high traffic users; and wherein x refers to the key; x′ and x″ refers to sub-keys of the key; C and C i refer to the size of the candidate set of high traffic users, wherein the notation of C with a subscript i denotes the corresponding quantities for the ith hash array in a sequential hashing scheme.

15. The apparatus of claim 9 , wherein the number of the D hash arrays corresponds with the number of bits of the key.

16. The computer Implemented method of claim 1 , wherein the key corresponds with a source address, a destination address, a source port, a destination port and a protocol of the traffic.

17. A computer implemented method for detecting traffic patterns in a data network, the method comprising: constructing a multi-level hashing structure with D hash arrays, wherein each hash array i includes Mi independent hash tables each having K buckets, each of the K buckets having an associated traffic total; partitioning keys of the data network into D sub-keys, each of the D sub-keys for the keys having a variable length of i between 1 to D, the keys having a length of D, with a value of i representing a number of sequential bits bi of the keys, wherein each of the D sub-keys corresponds with one of the D hash arrays, each of the sub-keys D i through D D corresponds to one of each of hash arrays i−D, each independent hash table M i corresponding to the ith hash array in a sequential hashing scheme, and each of the D sub-keys are associated with one bucket of each of the M i independent hash tables of a corresponding hash array i, wherein K refers to a number of buckets in each hash array, and wherein each of the keys corresponds with a single bucket of each of the M i independent hash tables; receiving traffic for a key; identifying sub-keys of the key; updating a traffic total for buckets corresponding to the sub-keys of the key; identifying high traffic buckets of the M i independent hash tables of each hash array i having a traffic total greater than a threshold value; identifying a first candidate set of possible high traffic users of the data network based on the high traffic buckets; detecting high traffic users of the data network based on the first candidate set, wherein the high traffic users are keys of the data network having a traffic total greater than or equal to a traffic total threshold; identifying a second candidate set of possible significant traffic change users of the data network based on the high traffic buckets; and detecting significant traffic change users of the data network based on the second candidate set, wherein the significant traffic change users are keys of the data network having a change in traffic volume between two monitoring intervals which is greater than or equal to a traffic change threshold.

18. The computer implemented method of claim 17 , wherein identifying the first candidate set comprises recursively performing from |1≦i≦D| for each of the D hash arrays the following steps: concatenating each sub-key x′ of a set C i−1 of high traffic sub-keys identified for a previously checked hash array i−1 with a set of possible bit values from 0 to 2 b i −1 to form a set of sub-keys x″; checking buckets of the M i independent hash tables of a presently checked hash array i corresponding to a hash of each of the sub-keys x″ to determine whether traffic totals of any of the checked buckets are less than the threshold value; and adding one or more sub-keys x″ of the set of sub-keys x″ to a set C i of possible high traffic sub-keys responsive to determining that none of the checked buckets for the one or more sub-keys x″ have traffic totals less than the threshold value, wherein the candidate set is based on a set C D of possible high traffic sub-keys; and analyzing the candidate set to detect the high traffic users; and wherein x refers to a key; x′ and x″ refers to sub-keys of a key; C and C i refer to the size of the candidate set of high traffic users, wherein the notation of C with a subscript i denotes the corresponding quantities for the ith hash array in a sequential hashing scheme.

19. The computer implemented method of claim 17 , wherein detecting the high traffic users further comprises: performing linear regression on the candidate set to estimate the high traffic users.

20. The computer implemented method of claim 17 , wherein identifying the second candidate set comprises recursively performing from |1≦i≦D| for each of the D hash arrays the following steps: concatenating each sub-key x′ of a set C i−1 of high traffic sub-keys identified for a previously checked hash array i−1 with a set of possible bit values from 0 to 2 b i −1 to form a set of sub-keys x″; checking buckets of the M i independent hash tables of a presently checked hash array i corresponding to a hash of each of the sub-keys x″ to determine whether traffic totals of any of the checked buckets are less than the threshold value; and determining a number of misses for each of the set of sub-keys x″, wherein the number of misses for a sub-key x″ is based on a number of checked buckets corresponding to the sub-key x″ that have a traffic total less than the threshold value; adding one or more sub-keys x″ of the set of sub-keys x″ to a set C i of possible high traffic sub-keys responsive to determining that none of the checked buckets for the one or more sub-keys x″ have traffic totals less than the threshold value, wherein the candidate set is based on a set C D of possible high traffic sub-keys; and analyzing the candidate set to detect the high traffic users; and wherein x refers to the key; x′ and x″ refers to sub-keys of the key; C and C i refer to the size of the candidate set of high traffic users, wherein the notation of C with a subscript i denotes the corresponding quantities for the ith hash array in a sequential hashing scheme.

Patent Metadata

Filing Date

Unknown

Publication Date

August 17, 2010

Inventors

Tian Bu

Jin Cao

Aiyou Chen

Pak-Ching Lee

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search