Patentable/Patents/US-10572329
US-10572329

Methods and systems to identify anomalous behaving components of a distributed computing system

PublishedFebruary 25, 2020
Assigneenot available in USPTO data we have
Inventorsnot available in USPTO data we have
Technical Abstract

Methods and system described herein are directed to identifying anomalous behaving components of a distributed computing system. Methods and system collect log messages generated by a set of event log source running in the distributed computing system within an observation time window. Frequencies of various types of event messages generated within the observation time window are determined for each of the log sources. A similarity value is calculated for each pair of event sources. The similarity values are used to identify similar clusters of event sources of the distributed computing system for various management purposes. Components of the distributed computing system that are used to host the event source outliers may be identified as potentially having problems or may be an indication of future problems.

Patent Claims
24 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1

1. A method stored in one or more data-storage devices and executed using one or more processors of a computer system to identify anomalous behaving components of a distributed computing system, the method comprising: collecting event messages generated by event sources within an observation time window, the event sources hosted by a number of the components of the distributed computing system; determining frequencies of event types of the event messages within the observation time window; calculating a similarity for each pair of event sources based on the frequencies of the event types; determining event source clusters based on the similarities determined for each pair of event sources; determining a local outlier factor for each event source of each event source cluster; identifying anomalously behaving components of the set of components that host the event sources when a corresponding local outlier factor is greater than a local outlier factor threshold; and migrating virtual machines from one or more server computers having the anomalously behaving components to one or more server computers having normal behaving components.

2

2. The method of claim 1 , wherein the event sources are copies of the same type of event source running on the components.

3

3. The method of claim 1 , wherein determining frequencies of event types within the observation time window comprises: determining an event type of each event message recorded within the observation time window using event type analysis; and counting a number of times each event type occurs within the observation time window, the number of each event type being the frequency of the event type.

4

4. The method of claim 1 , wherein calculating the similarity for each pair of event sources comprises: identifying frequencies of event types of a first event source of the pair of event sources as a first event-type frequency vector; identifying frequencies of event types of a second event source of the pair of event sources as a second event-type frequency vector; and calculating a similarity between the first and second event-type frequency vectors, the similarity being a measure of closeness between the pair of event sources.

5

5. The method of claim 1 , where calculating the similarity for each pair of event sources comprises: calculating a first probability distribution of the frequencies of event types generated by a first event source of the pair of event sources; calculating a first probability distribution of frequencies of event types generated by a second event source of the pair of event sources; and calculating an information divergence between the first probability distribution and the second probability distribution, the information divergence being a measure of the similarity between the pair of event sources.

6

6. The method of claim 1 , where calculating the similarity for each pair of event sources comprises: calculating a first probability distribution of the frequencies of event types generated by a first event source of the pair of event sources; calculating a first probability distribution of frequencies of event types generated by a second event source of the pair of event sources; and calculating a Jensen-Shannon divergence between the first probability distribution and the second probability distribution, the Jensen-Shannon divergence being a measure of the similarity between the pair of event sources.

7

7. The method of claim 1 , wherein determining the event source clusters comprises: applying hierarchical clustering analysis to the similarities of event sources in order to generate a dendrogram of the event source similarities; and forming the event source clusters for event sources connected by similarities that are greater than a dissimilarity threshold.

8

8. The method of claim 1 , wherein determining the local outlier factor for each event source of each the event source cluster comprises: calculating a distance between each pair of the event sources in the event source cluster; calculating a k-th nearest neighbor distance for event source of the event source cluster; determining a k-distance neighborhood for each event source of the event source cluster based on the k-th nearest neighbor distance of each event source; calculating a local reachability density for each event source based on the k-distance neighborhood of each event source; calculating a local outlier factor for each event source based on the local reachability density of event sources within the k-distance neighborhood; and identifying an event source in the event source cluster as an event source outlier when the local outlier factor of the event source is greater than the local outlier factor threshold.

9

9. A system to identify anomalous behaving components of a distributed computing system, the system comprising: one or more processors; one or more data-storage devices; and machine-readable instructions stored in the one or more data-storage devices that when executed using the one or more processors controls the system to carry out collecting event messages generated by event sources within an observation time window, the event sources hosted by a number of the components of the distributed computing system; determining frequencies of event types of the event messages within the observation time window; calculating a similarity for each pair of event sources based on the frequencies of the event types; determining event source clusters based on the similarities determined for each pair of event sources; determining a local outlier factor for each event source of each event source cluster; identifying anomalously behaving components of the set of components that host the event sources when a corresponding local outlier factor is greater than a local outlier factor threshold; and migrating virtual machines from one or more server computers having the anomalously behaving components to one or more server computers having normal behaving components.

10

10. The system of claim 9 , wherein the event sources are copies of the same type of event source running on the components.

11

11. The system of claim 9 , wherein determining frequencies of event types within the observation time window comprises for each event log generated by one of the event sources, determining an event type of each event message recorded within the observation time window using event type analysis; and counting a number of times each event type occurs within the observation time window, the number of each event type being the frequency of the event type.

12

12. The system of claim 9 , wherein calculating the similarity for each pair of event sources comprises: identifying frequencies of event types of a first event source of the pair of event sources as a first event-type frequency vector; identifying frequencies of event types of a second event source of the pair of event sources as a second event-type frequency vector; and calculating a similarity between the first and second event-type frequency vectors, the similarity being a measure of closeness between the pair of event sources.

13

13. The system of claim 9 , where calculating the similarity for each pair of event sources comprises: calculating a first probability distribution of the frequencies of event types generated by a first event source of the pair of event sources; calculating a first probability distribution of frequencies of event types generated by a second event source of the pair of event sources; and calculating an information divergence between the first probability distribution and the second probability distribution, the information divergence being a measure of the similarity between the pair of event sources.

14

14. The system of claim 9 , where calculating the similarity for each pair of event sources comprises: calculating a first probability distribution of the frequencies of event types generated by a first event source of the pair of event sources; calculating a first probability distribution of frequencies of event types generated by a second event source of the pair of event sources; and calculating a Jensen-Shannon divergence between the first probability distribution and the second probability distribution, the Jensen-Shannon divergence being a measure of the similarity between the pair of event sources.

15

15. The system of claim 9 , wherein determining the event source clusters comprises: applying hierarchical clustering analysis to the similarities of event sources in order to generate a dendrogram of the event source similarities; and forming the event source clusters for event sources connected by similarities that are greater than a dissimilarity threshold.

16

16. The system of claim 9 , wherein determining the local outlier factor for each event source of each the event source cluster comprises: calculating a distance between each pair of the event sources in the event source cluster; calculating a k-th nearest neighbor distance for event source of the event source cluster; determining a k-distance neighborhood for each event source of the event source cluster based on the k-th nearest neighbor distance of each event source; calculating a local reachability density for each event source based on the k-distance neighborhood of each event source; calculating a local outlier factor for each event source based on the local reachability density of event sources within the k-distance neighborhood; and identifying an event source in the event source cluster as an event source outlier when the local outlier factor of the event source is greater than the local outlier factor threshold.

17

17. A non-transitory computer-readable medium encoded with machine-readable instructions that implement a method carried out by one or more processors of a computer system to perform the operations of collecting event messages generated by event sources within an observation time window, the event sources hosted by a number of the components of the distributed computing system; determining frequencies of event types of the event messages within the observation time window; calculating a similarity for each pair of event sources based on the frequencies of the event types; determining event source clusters based on the similarities determined for each pair of event sources; determining a local outlier factor for each event source of each event source cluster; identifying anomalously behaving components of the set of components that host the event sources when a corresponding local outlier factor is greater than a local outlier factor threshold; and migrating virtual machines from one or more server computers having the anomalously behaving components to one or more server computers having normal behaving components.

18

18. The medium of claim 17 , wherein the event sources are copies of the same type of event source running on the components.

19

19. The medium of claim 17 , wherein determining frequencies of event types within the observation time window comprises for each event log generated by one of the event sources, determining an event type of each event message recorded within the observation time window using event type analysis; and counting a number of times each event type occurs within the observation time window, the number of each event type being the frequency of the event type.

20

20. The medium of claim 17 , wherein calculating the similarity for each pair of event sources comprises: identifying frequencies of event types of a first event source of the pair of event sources as a first event-type frequency vector; identifying frequencies of event types of a second event source of the pair of event sources as a second event-type frequency vector; and calculating a similarity between the first and second event-type frequency vectors, the similarity being a measure of closeness between the pair of event sources.

21

21. The medium of claim 17 , where calculating the similarity for each pair of event sources comprises: calculating a first probability distribution of the frequencies of event types generated by a first event source of the pair of event sources; calculating a first probability distribution of frequencies of event types generated by a second event source of the pair of event sources; and calculating an information divergence between the first probability distribution and the second probability distribution, the information divergence being a measure of the similarity between the pair of event sources.

22

22. The medium of claim 17 , where calculating the similarity for each pair of event sources comprises: calculating a first probability distribution of the frequencies of event types generated by a first event source of the pair of event sources; calculating a first probability distribution of frequencies of event types generated by a second event source of the pair of event sources; and calculating a Jensen-Shannon divergence between the first probability distribution and the second probability distribution, the Jensen-Shannon divergence being a measure of the similarity between the pair of event sources.

23

23. The medium of claim 17 , wherein determining the event source clusters comprises: applying hierarchical clustering analysis to the similarities of event sources in order to generate a dendrogram of the event source similarities; and forming the event source clusters for event sources connected by similarities that are greater than a dissimilarity threshold.

24

24. The medium of claim 17 , wherein determining the local outlier factor for each event source of each the event source cluster comprises: calculating a distance between each pair of the event sources in the event source cluster; calculating a k-th nearest neighbor distance for event source of the event source cluster; determining a k-distance neighborhood for each event source of the event source cluster based on the k-th nearest neighbor distance of each event source; calculating a local reachability density for each event source based on the k-distance neighborhood of each event source; calculating a local outlier factor for each event source based on the local reachability density of event sources within the k-distance neighborhood; and identifying an event source in the event source cluster as an event source outlier when the local outlier factor of the event source is greater than the local outlier factor threshold.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

Patent Metadata

Filing Date

December 12, 2016

Publication Date

February 25, 2020

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “Methods and systems to identify anomalous behaving components of a distributed computing system” (US-10572329). https://patentable.app/patents/US-10572329

© 2026 Patentable. All rights reserved.

Patentable is a research and drafting-assistant tool, not a law firm, and does not provide legal advice. Documents we generate are drafts for review by a licensed patent attorney.