Fingerprinting Application Traffic in a Network

PublishedJune 22, 2021

Assigneenot available in USPTO data we have

InventorsAndrey Kvasyuk Hazim Hashim Dahir Robert Bukofser Saad Syed Hasan

Technical Abstract

Patent Claims

20 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A method comprising: obtaining, by a device, telemetry data regarding a plurality of traffic flows in a network; forming, by the device, a directed graph based on the telemetry data, wherein nodes of the directed graph represent devices in the network; simulating, by the device and based at least on a probability of a traffic flow between a particular starting node of the directed graph and one or more neighbor nodes, traffic for one or more of the devices by performing random walks on the directed graph starting at the particular starting node to generate a set of trails, each trail representing a sequence of one or more flows from the particular starting node to the one or more neighbor nodes that result from the random walks on the directed graph; clustering, by device, the set of trails to form one or more clusters; generating, by the device, an application fingerprint for an application based on one of the one or more clusters; and using, by the device, the application fingerprint to identify traffic in the network as associated with the application.

2. The method as in claim 1 , wherein simulating traffic for one or more of the devices by performing random walks on the directed graph to generate a set of trails comprises: using a Markov Chain Monte Carlo model to determine the probability of a traffic flow between the particular starting node of and the one or more neighbor nodes.

3. The method as in claim 1 , wherein clustering the set of trails to form one or more clusters comprises: transforming the trails into sequences of terms by using a dictionary lexicon to represent each flow in a trail as a code; converting the codes in each trail into a vector representation; and applying clustering to the vector representations of the trails.

4. The method as in claim 3 , further comprising: sending a visualization of the one or more clusters to a user interface.

5. The method as in claim 1 , wherein generating the application fingerprint for the application based on one of the one or more cluster comprises: computing a conditional probability distribution based on flows associated with that cluster.

6. The method as in claim 5 , wherein generating the application fingerprint for the application based on one of the one or more cluster comprises: calculating a prevalence for the cluster by multiplying a number of trails in that cluster by a number of unique client devices associated with that cluster.

7. The method as in claim 1 , further comprising: simulating, by the device, backwards traffic by performing random walks on the directed graph starting from nodes in the graph that represent devices with no outbound connections, to generate a set of backwards trails; clustering the set of backwards trails into one or more clusters; using one of the clusters of backwards trails to identify a source device of traffic in the network.

8. The method as in claim 1 , further comprising: measuring a frequency of each flow within a time interval; and applying fast independent component analysis to the frequencies of the flows within the time interval, to generate an application fingerprint.

9. An apparatus, comprising: one or more network interfaces to communicate with one or more networks; a processor coupled to the network interfaces and configured to execute one or more processes; and a memory configured to store a process executable by the processor, the process when executed configured to: obtain telemetry data regarding a plurality of traffic flows in a network; form a directed graph based on the telemetry data, wherein nodes of the directed graph represent devices in the network; simulate, based at least on a probability of a traffic flow between a particular starting node of the directed graph and one or more neighbor nodes, traffic for one or more of the devices by performing random walks on the directed graph starting at the particular starting node to generate a set of trails, each trail representing a sequence of one or more flows from the particular starting node to the one or more neighbor nodes that result from the random walks on the directed graph; cluster the set of trails to form one or more clusters; generate an application fingerprint for an application based on one of the one or more clusters; and use the application fingerprint to identify traffic in the network as associated with the application.

10. The apparatus as in claim 9 , wherein the apparatus simulates traffic for one or more of the devices by performing random walks on the directed graph to generate a set of trails by: using a Markov Chain Monte Carlo model to determine the probability of a traffic flow between the particular starting node of and the one or more neighbor nodes.

11. The apparatus as in claim 9 , wherein the apparatus clusters the set of trails to form one or more clusters by: transforming the trails into sequences of terms by using a dictionary lexicon to represent each flow in a trail as a code; converting the codes in each trail into a vector representation; and applying clustering to the vector representations of the trails.

12. The apparatus as in claim 11 , wherein the process when executed is further configured to: sending a visualization of the one or more clusters to a user interface.

13. The apparatus as in claim 9 , wherein the apparatus generates the application fingerprint for the application based on one of the one or more cluster by: computing a conditional probability distribution based on flows associated with that cluster.

14. The apparatus as in claim 13 , wherein the apparatus generates the application fingerprint for the application based on one of the one or more cluster by: calculating a prevalence for the cluster by multiplying a number of trails in that cluster by a number of unique client devices associated with that cluster.

15. The apparatus as in claim 9 , wherein the process when executed is further configured to: simulate backwards traffic by performing random walks on the directed graph starting from nodes in the graph that represent devices with no outbound connections, to generate a set of backwards trails; cluster the set of backwards trails into one or more clusters; use one of the clusters of backwards trails to identify a source device of traffic in the network.

16. The apparatus as in claim 9 , wherein the process when executed is further configured to: measure a frequency of each flow within a time interval; and apply fast independent component analysis to the frequencies of the flows within the time interval, to generate an application fingerprint.

17. A tangible, non-transitory, computer-readable medium storing program instructions that cause a device to execute a process comprising: obtaining, by a device, telemetry data regarding a plurality of traffic flows in a network; forming, by the device, a directed graph based on the telemetry data, wherein nodes of the directed graph represent devices in the network; simulating, by the device and based at least on a probability of a traffic flow between a particular starting node of the directed graph and one or more neighbor nodes, traffic for one or more of the devices by performing random walks on the directed graph starting at the particular starting node to generate a set of trails, each trail representing a sequence of one or more flows from the particular starting node to the one or more neighbor nodes that result from the random walks on the directed graph; clustering, by device, the set of trails to form one or more clusters; generating, by the device, an application fingerprint for an application based on one of the one or more clusters; and using, by the device, the application fingerprint to identify traffic in the network as associated with the application.

18. The computer-readable medium as in claim 17 , wherein simulating traffic for one or more of the devices by performing random walks on the directed graph to generate a set of trails comprises: using a Markov Chain Monte Carlo model to determine the probability of a traffic flow between the particular starting node of and the one or more neighbor nodes.

19. The computer-readable medium as in claim 17 , wherein clustering the set of trails to form one or more clusters comprises: transforming the trails into sequences of terms by using a dictionary lexicon to represent each flow in a trail as a code; converting the codes in each trail into a vector representation; and applying clustering to the vector representations of the trails.

20. The computer-readable medium as in claim 17 , wherein generating the application fingerprint for the application based on one of the one or more cluster comprises: computing a conditional probability distribution based on flows associated with that cluster.

Patent Metadata

Filing Date

Unknown

Publication Date

June 22, 2021

Inventors

Andrey Kvasyuk

Hazim Hashim Dahir

Robert Bukofser

Saad Syed Hasan

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search