Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
1. A method for monitoring a supercomputer nodes network: monitoring, by an application monitoring module of a network monitoring device, communication messages between a plurality of processes being executed by a plurality of supercomputer nodes; generating, by the application monitoring module of the network monitoring device, a virtual network topology containing a plurality of virtual communication links between the plurality of processes being executed by the plurality of supercomputer nodes based upon the monitoring of the communication messages; determining, by the application monitoring module of the network monitoring device, a number of the communication messages being transmitted on each of the plurality of virtual communication links and a bandwidth value for each of the plurality of virtual communication links; monitoring, by a traffic monitoring module of the network monitoring device, network traffic in a plurality of communication links interconnecting the plurality of supercomputer nodes; generating, by the traffic monitoring module of the network monitoring device, a global networking view of the network traffic of the plurality of the supercomputer nodes and the interconnecting plurality of communication links; receiving, by a topology mapping module of the network monitoring device, an API call for mapping a new application to the plurality of supercomputer nodes; and mapping, by the topology mapping module of the network monitoring device, the new application to the plurality of supercomputer nodes that are currently available based upon the virtual network topology and the global networking view of the network traffic.
This invention relates to monitoring and managing network traffic in supercomputer systems to optimize application deployment. Supercomputers consist of numerous interconnected nodes that execute parallel processes, and efficient communication between these processes is critical for performance. The invention addresses the challenge of dynamically mapping applications to available nodes while ensuring optimal network utilization and minimizing bottlenecks. The system includes an application monitoring module that tracks communication messages between processes running on supercomputer nodes, generating a virtual network topology that represents the logical connections between these processes. It calculates the number of messages and bandwidth usage for each virtual link. A traffic monitoring module observes physical network traffic across the interconnecting links between nodes, creating a global view of network activity. A topology mapping module receives API calls to deploy new applications and maps them to available nodes based on the virtual topology and real-time traffic data. This ensures that applications are assigned to nodes with sufficient bandwidth and minimal interference from existing traffic, improving overall system efficiency and performance. The invention enables dynamic, data-driven application placement in large-scale computing environments.
2. The method according to claim 1 , further comprising: selecting, by the network monitoring device, one or more supercomputer nodes of the plurality of supercomputer nodes having lowest network traffic to execute one or more processes of the plurality of processes.
A method for optimizing process execution in a supercomputer network involves monitoring network traffic across multiple supercomputer nodes to identify nodes with the lowest traffic levels. The method includes selecting these low-traffic nodes to execute one or more processes from a set of processes, thereby improving network efficiency and reducing congestion. The selection is performed by a network monitoring device that continuously tracks traffic conditions across the network. By dynamically assigning processes to nodes with minimal traffic, the method ensures balanced resource utilization and prevents bottlenecks. This approach enhances overall system performance by distributing workloads more effectively, particularly in high-performance computing environments where network latency and congestion can significantly impact processing speed. The method is part of a broader system for managing supercomputer operations, where processes are distributed based on real-time network conditions to maintain optimal performance.
3. The method according to claim 1 , further comprising: determining, by the traffic monitoring module, congestion in the network based on the network traffic in the plurality of communication links to identify one or more hot spots.
This invention relates to network traffic monitoring and congestion management in communication networks. The method involves analyzing network traffic across multiple communication links to detect congestion and identify hot spots where traffic bottlenecks occur. A traffic monitoring module collects and processes data from these links to assess congestion levels, enabling the system to pinpoint areas of high traffic density that may degrade network performance. The method further includes determining congestion by evaluating traffic patterns and link utilization, allowing for proactive identification of potential network issues before they impact service quality. By monitoring traffic in real-time, the system can detect and address congestion hot spots, improving overall network efficiency and reliability. The approach is particularly useful in large-scale networks where traffic distribution can vary significantly, and identifying congestion early helps in optimizing resource allocation and preventing service disruptions. The method ensures that network performance remains stable by continuously assessing traffic conditions and taking corrective actions when necessary.
4. The method according to claim 1 , further comprising: determining, by the traffic monitoring module, one or more supercomputer nodes of the plurality of supercomputer nodes currently being utilized by one or more applications based on the network traffic.
This invention relates to monitoring and managing network traffic in a supercomputer system to optimize resource utilization. The problem addressed is inefficient allocation of supercomputer nodes, where nodes may remain underutilized or overutilized due to a lack of real-time traffic monitoring and dynamic adjustment. The solution involves a traffic monitoring module that analyzes network traffic patterns to identify which supercomputer nodes are actively being used by applications. By tracking network traffic, the system can determine which nodes are currently engaged in processing tasks and which are idle or underutilized. This information enables better load balancing, resource allocation, and overall system efficiency. The traffic monitoring module continuously assesses network traffic to provide up-to-date insights into node utilization, allowing the system to dynamically adjust workload distribution. This approach ensures that computational resources are allocated optimally, reducing wasted capacity and improving performance. The invention is particularly useful in high-performance computing environments where efficient resource management is critical.
5. The method according to claim 1 , wherein each of the plurality of supercomputer nodes is connected to one or more switches, and wherein the network monitoring device is tapped into the one or more switches of the plurality of supercomputer nodes to monitor the network.
This invention relates to network monitoring in supercomputer systems. Supercomputers consist of multiple interconnected nodes that require high-performance, low-latency communication. A key challenge is efficiently monitoring network traffic across these nodes to detect performance issues, security threats, or failures without disrupting system operations. The invention describes a method for monitoring network traffic in a supercomputer system. The system includes multiple supercomputer nodes, each connected to one or more network switches. A network monitoring device is directly tapped into these switches to observe traffic passing through them. This allows real-time monitoring of data flows between nodes, enabling detection of anomalies, congestion, or security breaches. The monitoring device can analyze packet headers, payloads, or other metadata to assess network health and performance. By tapping into the switches rather than individual nodes, the system minimizes intrusion and ensures comprehensive coverage of all network traffic. The method supports scalable monitoring as the number of nodes and switches increases, maintaining efficiency even in large-scale supercomputing environments. This approach enhances system reliability, security, and performance by providing continuous visibility into network operations.
6. The method according to claim 5 , wherein the one or more switches are utilized by the plurality of supercomputer nodes to build one or more network topologies comprising at least one of a fat-tree, a 2D mesh, a 2D/3D torus, and a Dragonfly.
This invention relates to high-performance computing networks, specifically methods for dynamically configuring network topologies in supercomputer systems to optimize communication efficiency. The problem addressed is the inflexibility of traditional supercomputer network architectures, which often rely on fixed topologies that cannot adapt to varying workload demands, leading to suboptimal performance and resource utilization. The method involves using one or more configurable switches to dynamically build and reconfigure network topologies among a plurality of supercomputer nodes. The supported topologies include fat-tree, 2D mesh, 2D/3D torus, and Dragonfly, each offering distinct advantages depending on the workload. The switches enable seamless transitions between these topologies, allowing the system to adapt to different computational tasks without hardware modifications. This flexibility improves scalability, reduces latency, and enhances overall system efficiency by matching the network structure to the specific requirements of the workload. The method ensures that the switches can be programmed to support multiple topologies simultaneously, enabling hybrid configurations where different parts of the network operate under different topologies. This further optimizes performance for heterogeneous workloads. The dynamic reconfiguration is achieved through software-controlled switch settings, eliminating the need for physical re-cabling or hardware changes. This approach enhances the adaptability of supercomputer networks, making them more versatile for diverse high-performance computing applications.
7. The method according to claim 6 , wherein the one or more switches comprises a management tool to monitor and aggregate data associated with parameters of the one or more switches and the plurality of supercomputer nodes, and wherein the data comprises network traffic characteristics, physical information, health counters, and error counters.
This invention relates to a method for managing and monitoring a supercomputer system, specifically addressing the need for centralized oversight of networked supercomputer nodes and their associated switches. The method involves using a management tool to collect, monitor, and aggregate data from multiple switches that interconnect a plurality of supercomputer nodes. The collected data includes network traffic characteristics, such as bandwidth usage and latency, as well as physical information like temperature and power consumption. Additionally, the tool tracks health counters, which measure operational status and performance metrics, and error counters, which log system faults and anomalies. By centralizing this data, the management tool enables real-time monitoring, troubleshooting, and performance optimization of the supercomputer infrastructure. The system ensures efficient operation by providing administrators with comprehensive insights into the network's health and performance, allowing for proactive maintenance and quick identification of potential issues. This approach enhances reliability and reduces downtime in large-scale computing environments.
8. The method according to claim 7 , wherein the data is aggregated per application, per specific fabric tenant server group, or per switch ports of the one or more switches and the plurality of supercomputer nodes.
This invention relates to data aggregation in high-performance computing environments, particularly for monitoring and managing network traffic in supercomputer systems. The problem addressed is the need for efficient and granular data collection to optimize performance, troubleshoot issues, and ensure security in large-scale computing infrastructures. The method involves aggregating network traffic data from multiple supercomputer nodes and switches. The data is collected and organized based on different criteria, including per application, per specific fabric tenant server group, or per switch ports. This granular aggregation allows administrators to analyze traffic patterns, identify bottlenecks, and enforce policies at various levels of the network hierarchy. The system includes a network fabric comprising one or more switches and a plurality of supercomputer nodes interconnected through the fabric. The switches are configured to monitor and collect traffic data, which is then processed to generate aggregated statistics. The aggregation can be tailored to specific applications, groups of servers within a tenant fabric, or individual switch ports, providing flexibility in how the data is analyzed. By organizing data in this manner, the invention enables more precise performance tuning, better resource allocation, and improved security monitoring. The method ensures that administrators can drill down into specific areas of the network to diagnose issues or optimize traffic flow without being overwhelmed by raw, unstructured data. This approach is particularly valuable in high-performance computing environments where network efficiency directly impacts overall system performance.
9. The method according to claim 1 , further comprising: determining, by the network monitoring device, a number of hops separating any two supercomputer nodes of the plurality of supercomputer nodes.
This invention relates to network monitoring in supercomputer systems, specifically addressing the challenge of tracking and analyzing communication paths between supercomputer nodes. Supercomputers rely on high-speed, low-latency networks to connect multiple nodes, but monitoring these connections is complex due to the scale and dynamic nature of the network. The invention provides a method for determining the number of hops (intermediate network devices) between any two supercomputer nodes, enabling better performance analysis, fault detection, and optimization of data routing. The method involves a network monitoring device that identifies the network topology and communication paths between nodes. By analyzing routing tables, network traffic patterns, or direct measurements, the device calculates the hop count for each pair of nodes. This information helps administrators understand network latency, identify bottlenecks, and optimize routing decisions. The method may also include visualizing the network topology, detecting anomalies, or dynamically adjusting routing protocols based on hop count data. The invention improves upon existing network monitoring techniques by providing a more granular and scalable approach to tracking node-to-node connectivity in large-scale supercomputer environments. This is particularly useful in high-performance computing (HPC) systems where minimizing communication delays is critical for performance. The method can be integrated into existing network management tools or deployed as a standalone monitoring solution.
10. The method according to claim to claim 1 , further comprising: generating, by the network monitoring device, a graphical user interface on an analyst computing device to display the global networking view of the network traffic showing available supercomputer nodes and currently busy supercomputer nodes of the plurality of supercomputer nodes.
This invention relates to network monitoring systems for supercomputing environments. The problem addressed is the lack of real-time visibility into the availability and utilization of supercomputer nodes, which can lead to inefficient resource allocation and performance bottlenecks. The solution involves a network monitoring device that generates a global networking view of network traffic, distinguishing between available and busy supercomputer nodes. This view is displayed on a graphical user interface (GUI) accessible by an analyst via a computing device. The GUI provides a visual representation of node status, enabling analysts to monitor and manage supercomputing resources effectively. The system dynamically updates the display to reflect real-time changes in node availability, ensuring accurate and up-to-date information for decision-making. This enhances operational efficiency by allowing quick identification of underutilized or overloaded nodes, optimizing workload distribution, and preventing potential system failures. The invention improves resource management in high-performance computing environments by providing clear, actionable insights into network traffic and node status.
11. A network monitoring device for monitoring a network between supercomputer nodes comprising: a non-transitory storage medium configured to store one or more computer program instructions; a processor configured to execute the one or more computer program instructions to implement: an application monitoring module configured to: monitor communication messages between a plurality of processes being executed by a plurality of supercomputer nodes; generate a virtual network topology containing a plurality of virtual communication links between the plurality of processes being executed by the plurality of supercomputer nodes; and determine a number of communication messages being transmitted on each of the plurality of virtual communication links and a bandwidth value for each of the plurality of virtual communication links; a traffic monitoring module configured to: monitor network traffic in a plurality of communication links interconnecting the plurality of supercomputer nodes; and generate a global networking view of the plurality of the supercomputer nodes and the interconnecting communication links; and a topology mapping module configured to: receive an API call for mapping a new application to the plurality of supercomputer nodes; and map the new application to the plurality of supercomputer nodes that are currently available based upon the virtual network topology and the global networking view of the network traffic.
A network monitoring device is designed to monitor and optimize communication between supercomputer nodes in a high-performance computing environment. The device addresses the challenge of efficiently managing data traffic and resource allocation in large-scale distributed systems where multiple processes run across numerous interconnected nodes. The system includes a non-transitory storage medium storing computer program instructions and a processor executing these instructions to perform several key functions. An application monitoring module tracks communication messages between processes running on different supercomputer nodes, generating a virtual network topology that represents the logical connections between these processes. It also measures the number of messages and bandwidth usage on each virtual link. A traffic monitoring module observes actual network traffic across the physical communication links interconnecting the nodes, creating a global view of the network's current state. A topology mapping module processes API calls to map new applications to available supercomputer nodes, using the virtual network topology and global traffic data to ensure optimal resource allocation. This approach enhances performance by dynamically aligning application requirements with available network and computational resources.
12. The network monitoring device according to claim 11 , wherein the traffic monitoring module is further configured to select one or more supercomputer nodes of the plurality of supercomputer nodes having lowest network traffic to execute one or more processes of the plurality of processes.
A network monitoring device is designed to optimize traffic distribution within a supercomputer system. The device monitors network traffic across multiple supercomputer nodes and dynamically allocates processes to minimize congestion. The traffic monitoring module identifies nodes with the lowest current network traffic and assigns processes to those nodes, ensuring efficient resource utilization and reducing bottlenecks. This approach prevents overloading specific nodes while balancing the workload across the system. The device may also prioritize critical processes, ensuring high-priority tasks are executed on nodes with minimal traffic interference. By continuously assessing network conditions, the system adapts in real-time to maintain optimal performance. The solution addresses the challenge of uneven traffic distribution in high-performance computing environments, where inefficient process allocation can lead to delays and reduced efficiency. The device integrates with existing supercomputer architectures, providing a scalable and adaptive traffic management solution.
13. The network monitoring device according to claim 11 , wherein the traffic monitoring module is further configured to determine congestion in the network based on the network traffic in the plurality of communication links to identify one or more hot spots.
A network monitoring device is designed to analyze and manage network traffic across multiple communication links. The device includes a traffic monitoring module that collects and processes data from these links to assess network performance. Specifically, the module evaluates traffic patterns, including data rates, latency, and packet loss, to detect congestion points. By analyzing the traffic across the plurality of communication links, the device identifies hot spots—areas where network congestion is most severe. This identification helps in optimizing network performance by pinpointing critical bottlenecks. The device may also include additional modules for generating alerts, rerouting traffic, or adjusting bandwidth allocation based on the detected congestion. The overall system aims to enhance network efficiency by proactively addressing traffic-related issues before they escalate. The technology is particularly useful in large-scale networks where real-time monitoring and dynamic adjustments are essential for maintaining smooth operations.
14. The network monitoring device according to claim 11 , wherein the topology mapping module is further configured to determine one or more supercomputer nodes of the plurality of supercomputer nodes currently being utilized by running one or more applications based on the network traffic.
A network monitoring device for supercomputer systems analyzes network traffic to identify and map the topology of interconnected supercomputer nodes. The device includes a topology mapping module that detects active nodes by monitoring network traffic patterns associated with running applications. This module identifies which nodes are currently in use by analyzing traffic flows, allowing for real-time tracking of resource utilization. The system also includes a traffic analysis module that processes network data to detect anomalies, performance bottlenecks, or security threats. By correlating traffic patterns with node activity, the device provides insights into application workload distribution and node interactions. The solution addresses the challenge of dynamically tracking node utilization in large-scale supercomputer environments, where static configurations may not reflect real-time usage. The topology mapping module enhances situational awareness by distinguishing between active and idle nodes, enabling optimized resource management and troubleshooting. The device supports high-performance computing (HPC) systems by providing actionable data for administrators to balance loads, detect failures, or optimize application performance.
15. The network monitoring device according to claim 11 , wherein each of the plurality of supercomputer nodes is connected to one or more switches, and wherein the network monitoring device is tapped into the one or more switches of the plurality of supercomputer nodes to monitor the network.
This invention relates to network monitoring in supercomputer systems. Supercomputers often consist of multiple interconnected nodes, each connected to one or more network switches. Monitoring these high-speed, high-volume networks is critical for performance optimization, fault detection, and security. The challenge lies in efficiently capturing and analyzing network traffic across distributed nodes without disrupting system operations. The invention describes a network monitoring device designed for supercomputer environments. It is connected to one or more switches within the supercomputer system, allowing it to tap into network traffic for real-time monitoring. The device is capable of observing data flows between nodes, switches, and other network components. By tapping into the switches, it can passively monitor traffic without requiring active participation from the nodes or switches, minimizing performance impact. The monitoring device may also include features to filter, analyze, or log network data, providing insights into network health, congestion, or security threats. This approach ensures comprehensive visibility into the supercomputer's network infrastructure while maintaining operational efficiency.
16. The network monitoring device according to claim 15 , wherein the one or more switches are utilized by the plurality of supercomputer nodes to build one or more network topologies comprising one of a fat-tree, a 2D mesh, a 2D/3D torus, and a Dragonfly.
This invention relates to network monitoring in high-performance computing environments, specifically for supercomputers with complex network topologies. The problem addressed is the difficulty of efficiently monitoring and managing network traffic in large-scale supercomputers that use specialized network architectures, such as fat-tree, 2D mesh, 2D/3D torus, or Dragonfly topologies. These topologies are designed to optimize data transfer between nodes but introduce challenges in monitoring due to their hierarchical or multi-dimensional structures. The invention describes a network monitoring device that includes one or more switches connected to a plurality of supercomputer nodes. These switches are configured to support the construction of various network topologies, including fat-tree, 2D mesh, 2D/3D torus, and Dragonfly. The device monitors network traffic across these topologies, providing visibility into data flow, performance bottlenecks, and potential failures. By integrating with the switches, the monitoring system can track traffic patterns, latency, and bandwidth utilization in real-time, enabling proactive management of the supercomputer's network infrastructure. The solution ensures that the monitoring capabilities are scalable and adaptable to different network configurations, allowing for efficient operation in diverse high-performance computing environments.
17. The network monitoring device according to claim 16 , wherein the one or more switches comprises a management tool to monitor and aggregate data associated with parameters of the one or more switches and the plurality of supercomputer nodes, and wherein the data comprises network traffic characteristics, physical information, health counters, and error counters.
This invention relates to network monitoring in high-performance computing environments, specifically for supercomputers. The system addresses the challenge of efficiently monitoring and managing large-scale computing networks with numerous interconnected nodes. The network monitoring device includes one or more switches that serve as central hubs for data aggregation and analysis. These switches are equipped with a management tool designed to collect and process data related to various operational parameters of both the switches and the supercomputer nodes. The collected data encompasses network traffic characteristics, such as bandwidth usage and latency, as well as physical information like temperature and power consumption. Additionally, the system tracks health counters, which monitor the operational status of components, and error counters, which record and log system failures or anomalies. By centralizing this data, the management tool enables real-time monitoring, proactive issue detection, and performance optimization across the supercomputer network. This approach enhances system reliability, reduces downtime, and improves overall efficiency in large-scale computing environments.
18. The network monitoring device according to claim 17 , wherein the data is aggregated per application, per specific fabric tenant server group, or per switch ports of the one or more switches and the plurality of supercomputer nodes.
Network monitoring devices are used to track and analyze data traffic within high-performance computing environments, such as supercomputer networks. A key challenge is efficiently aggregating and organizing data to provide meaningful insights for performance optimization, security monitoring, and resource allocation. Existing solutions often lack flexibility in how data is grouped, leading to inefficiencies in analysis and decision-making. This invention improves network monitoring by enabling data aggregation in multiple ways. The device collects and processes data from switches and supercomputer nodes, then organizes it based on different criteria. The data can be grouped per application, allowing administrators to track performance and usage patterns for specific software. Alternatively, it can be aggregated per specific fabric tenant server group, which helps in isolating and analyzing traffic within dedicated server clusters. Another option is aggregation per switch ports, providing granular visibility into traffic flow at the hardware level. This flexibility ensures that monitoring aligns with the specific needs of the network, whether for troubleshooting, capacity planning, or security audits. The system dynamically adjusts aggregation methods to adapt to changing network conditions or administrative requirements, enhancing overall efficiency and accuracy in network monitoring.
19. The network monitoring device according to claim 11 , wherein the application monitoring module is further configured to determine a number of hops separating any two supercomputer nodes of the plurality of supercomputer nodes.
Network monitoring systems for supercomputers face challenges in tracking performance and connectivity between nodes, particularly in large-scale distributed environments. Existing solutions often lack detailed insights into inter-node communication paths, making it difficult to diagnose latency issues or optimize network topology. This invention addresses these challenges by enhancing a network monitoring device with an application monitoring module capable of analyzing supercomputer node interactions. The module tracks communication patterns and calculates the number of hops between any two nodes in the system. By quantifying these hops, the device provides a precise measurement of network distance, enabling administrators to identify bottlenecks, assess routing efficiency, and improve overall system performance. The monitoring module operates continuously, dynamically updating hop counts as network conditions change, ensuring real-time visibility into node connectivity. The system integrates with existing supercomputer architectures, requiring no modifications to the underlying hardware or communication protocols. It supports both wired and wireless node configurations, accommodating diverse deployment scenarios. The hop-counting functionality complements other monitoring capabilities, such as bandwidth utilization and latency tracking, to deliver a comprehensive view of network health. This solution is particularly valuable in high-performance computing environments where minimizing communication overhead is critical for maintaining computational efficiency.
20. The network monitoring device according to claim 11 , wherein the topology mapping module is further configured to generate a graphical user interface on an analyst computing device to display the global networking view of the network traffic showing available supercomputer nodes and currently busy supercomputer nodes of the plurality of supercomputer nodes.
This invention relates to network monitoring systems for supercomputing environments, addressing the challenge of visualizing and managing network traffic across distributed supercomputer nodes. The system includes a topology mapping module that generates a global networking view, displaying available and busy supercomputer nodes. This visualization helps analysts monitor resource utilization and identify bottlenecks in real-time. The graphical user interface (GUI) provides an interactive display, allowing users to assess node status, traffic patterns, and potential performance issues. The system dynamically updates the GUI to reflect changes in node availability and network activity, ensuring accurate and timely decision-making. By integrating this visualization with network monitoring capabilities, the invention enhances operational efficiency in large-scale computing environments. The topology mapping module may also include additional features, such as traffic flow analysis and predictive modeling, to further optimize network performance. The GUI is designed to be intuitive, enabling quick identification of underutilized or overloaded nodes, which is critical for maintaining high-performance computing operations. This solution is particularly valuable in environments where real-time monitoring and adaptive resource management are essential.
Unknown
February 11, 2020
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.