Network Monitoring Tool for Supercomputers

PublishedFebruary 11, 2020

Assigneenot available in USPTO data we have

Technical Abstract

Patent Claims

20 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A method for monitoring a supercomputer nodes network: monitoring, by an application monitoring module of a network monitoring device, communication messages between a plurality of processes being executed by a plurality of supercomputer nodes; generating, by the application monitoring module of the network monitoring device, a virtual network topology containing a plurality of virtual communication links between the plurality of processes being executed by the plurality of supercomputer nodes based upon the monitoring of the communication messages; determining, by the application monitoring module of the network monitoring device, a number of the communication messages being transmitted on each of the plurality of virtual communication links and a bandwidth value for each of the plurality of virtual communication links; monitoring, by a traffic monitoring module of the network monitoring device, network traffic in a plurality of communication links interconnecting the plurality of supercomputer nodes; generating, by the traffic monitoring module of the network monitoring device, a global networking view of the network traffic of the plurality of the supercomputer nodes and the interconnecting plurality of communication links; receiving, by a topology mapping module of the network monitoring device, an API call for mapping a new application to the plurality of supercomputer nodes; and mapping, by the topology mapping module of the network monitoring device, the new application to the plurality of supercomputer nodes that are currently available based upon the virtual network topology and the global networking view of the network traffic.

2. The method according to claim 1 , further comprising: selecting, by the network monitoring device, one or more supercomputer nodes of the plurality of supercomputer nodes having lowest network traffic to execute one or more processes of the plurality of processes.

3. The method according to claim 1 , further comprising: determining, by the traffic monitoring module, congestion in the network based on the network traffic in the plurality of communication links to identify one or more hot spots.

4. The method according to claim 1 , further comprising: determining, by the traffic monitoring module, one or more supercomputer nodes of the plurality of supercomputer nodes currently being utilized by one or more applications based on the network traffic.

5. The method according to claim 1 , wherein each of the plurality of supercomputer nodes is connected to one or more switches, and wherein the network monitoring device is tapped into the one or more switches of the plurality of supercomputer nodes to monitor the network.

6. The method according to claim 5 , wherein the one or more switches are utilized by the plurality of supercomputer nodes to build one or more network topologies comprising at least one of a fat-tree, a 2D mesh, a 2D/3D torus, and a Dragonfly.

7. The method according to claim 6 , wherein the one or more switches comprises a management tool to monitor and aggregate data associated with parameters of the one or more switches and the plurality of supercomputer nodes, and wherein the data comprises network traffic characteristics, physical information, health counters, and error counters.

8. The method according to claim 7 , wherein the data is aggregated per application, per specific fabric tenant server group, or per switch ports of the one or more switches and the plurality of supercomputer nodes.

9. The method according to claim 1 , further comprising: determining, by the network monitoring device, a number of hops separating any two supercomputer nodes of the plurality of supercomputer nodes.

10. The method according to claim to claim 1 , further comprising: generating, by the network monitoring device, a graphical user interface on an analyst computing device to display the global networking view of the network traffic showing available supercomputer nodes and currently busy supercomputer nodes of the plurality of supercomputer nodes.

11. A network monitoring device for monitoring a network between supercomputer nodes comprising: a non-transitory storage medium configured to store one or more computer program instructions; a processor configured to execute the one or more computer program instructions to implement: an application monitoring module configured to: monitor communication messages between a plurality of processes being executed by a plurality of supercomputer nodes; generate a virtual network topology containing a plurality of virtual communication links between the plurality of processes being executed by the plurality of supercomputer nodes; and determine a number of communication messages being transmitted on each of the plurality of virtual communication links and a bandwidth value for each of the plurality of virtual communication links; a traffic monitoring module configured to: monitor network traffic in a plurality of communication links interconnecting the plurality of supercomputer nodes; and generate a global networking view of the plurality of the supercomputer nodes and the interconnecting communication links; and a topology mapping module configured to: receive an API call for mapping a new application to the plurality of supercomputer nodes; and map the new application to the plurality of supercomputer nodes that are currently available based upon the virtual network topology and the global networking view of the network traffic.

12. The network monitoring device according to claim 11 , wherein the traffic monitoring module is further configured to select one or more supercomputer nodes of the plurality of supercomputer nodes having lowest network traffic to execute one or more processes of the plurality of processes.

13. The network monitoring device according to claim 11 , wherein the traffic monitoring module is further configured to determine congestion in the network based on the network traffic in the plurality of communication links to identify one or more hot spots.

14. The network monitoring device according to claim 11 , wherein the topology mapping module is further configured to determine one or more supercomputer nodes of the plurality of supercomputer nodes currently being utilized by running one or more applications based on the network traffic.

15. The network monitoring device according to claim 11 , wherein each of the plurality of supercomputer nodes is connected to one or more switches, and wherein the network monitoring device is tapped into the one or more switches of the plurality of supercomputer nodes to monitor the network.

16. The network monitoring device according to claim 15 , wherein the one or more switches are utilized by the plurality of supercomputer nodes to build one or more network topologies comprising one of a fat-tree, a 2D mesh, a 2D/3D torus, and a Dragonfly.

17. The network monitoring device according to claim 16 , wherein the one or more switches comprises a management tool to monitor and aggregate data associated with parameters of the one or more switches and the plurality of supercomputer nodes, and wherein the data comprises network traffic characteristics, physical information, health counters, and error counters.

18. The network monitoring device according to claim 17 , wherein the data is aggregated per application, per specific fabric tenant server group, or per switch ports of the one or more switches and the plurality of supercomputer nodes.

19. The network monitoring device according to claim 11 , wherein the application monitoring module is further configured to determine a number of hops separating any two supercomputer nodes of the plurality of supercomputer nodes.

20. The network monitoring device according to claim 11 , wherein the topology mapping module is further configured to generate a graphical user interface on an analyst computing device to display the global networking view of the network traffic showing available supercomputer nodes and currently busy supercomputer nodes of the plurality of supercomputer nodes.

Patent Metadata

Filing Date

Unknown

Publication Date

February 11, 2020

Inventors

Maher KADDOURA

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search