Legal claims defining the scope of protection, as filed with the USPTO.
1. A system for providing data for diagnosing datacenter issues, comprising: multiple end hosts in a datacenter, each operable to at least one of store and process data and comprising a Network Interface Controller (NIC) operable for data transmission across the datacenter in accordance with a coordinated job undertaken by the datacenter; and multiple flow agents, each flow agent residing at an end host, a flow agent comprising: a traffic module operable to correlate an amount of traffic scheduled to both of a traffic flow and a process having a process identifier (PID), the traffic flow being generated by the end host on which the each flow agent resides by the process executing on the end host on which the each flow agent resides, the end host not being an intermediary node between a destination and a source of the traffic flow and the process executing a distributed job received by the end host; a local-information module operable to access additional job-implementation information for the coordinated job and locally accessible at a corresponding end host, the additional job-implementation information including hardware usage data of the process having the PID, the hardware usage data including usage by the process of hardware resources including a Central Processing Unit (CPU), Random Access Memory (RAM), and at least one of a Hard Disc Drive (HDD) and a Solid State Drive (SSD); a report module operable to correlate the amount of traffic sent with the additional job-implementation information in a log file; an operating system supporting hardware resources at the corresponding end host; the hardware resources comprising the Central Processing Unit (CPU), Random Access Memory (RAM), and the at least one of a Hard Disc Drive (HDD) and the Solid State Drive (SSD); and a usage module residing at the flow agent operable to access at least a portion of the job-implementation information by querying the operating system for hardware usage data corresponding to the PID for all of the hardware resources.
2. The system of claim 1 , the local-information module operable to access the additional job-implementation information, at least in part, by identifying the source-destination pair by at least one of the PID and a port number corresponding to the source-destination pair.
3. The system of claim 1 further comprising: an analytic controller operable to aggregate log files from the multiple end hosts in the datacenter into a data set for datacenter diagnostics; the report module further operable to send the log file to the analytic controller; and a path module residing at the analytic controller, the path module operable to: maintain topology information for the datacenter that indexes lists of edges and nodes traversed between combinations of source, physical IP addresses and destination, physical IP addresses, in a connection based approach to packet switching imposing a list across multiple connections with a common combination, provide a list of edges and nodes traversed by applying the source, physical IP address and the destination, physical IP address to the topology information.
4. The system of claim 3 , further comprising a map module at the analytic controller, the map module operable to map at least one of a source, virtual IP address and a destination, virtual IP address to at least one or a source, physical IP address and a destination, physical IP address with a hash maintained by the map module.
5. The system of claim 4 , wherein the map module is further operable to map at least one of a traffic flow and a process to at least one of a user IDentification for a user initiating the coordinated job and a priority associated with the coordinated job.
6. The system of claim 1 , further comprising an identification module operable to: access an identifier for a job-related property; and access the additional job-implementation information by looking up a value for the job-related property indexed to the identifier in a property file maintained by the identification module.
7. The system of claim 6 , wherein the identifier is the PID for the process, the process creating a new socket associated with the source-destination pair.
8. The system of claim 6 , wherein at least one of the identification module and/or the map module are operable to look up at least one additional job-related property that comprises at least one of a cluster of nodes in the datacenter on which the coordinated job is undertaken, an application utilized by the coordinated job undertaken, and a job type for the coordinated job undertaken by the datacenter.
9. A method for tracking datacenter utilization, the method comprising: collecting, by flow agents residing at end hosts in a datacenter, transmission data reporting amounts of data transmitted between source-destination pairs in the datacenter, each end host not being an intermediary node between the source-destination pairs, each end host executing a process performing one of a plurality of tasks of a distributed, parallel processing job; accessing, by the flow agents, additional information local to each end host about the process executed by the each end host, the additional information including utilization of a Central Processing Unit (CPU), Random Access Memory (RAM), and at least one of a Hard Disc Drive (HDD) and a Solid State Drive (SSD) by the process; correlating, by the flow agents, the transmission data with the additional information local to the end hosts in files specific to the individual end hosts; provisioning, by the flow agents, the files to a common analytic controller.
10. The method of claim 9 , wherein collecting transmission data further comprises: maintaining, by a flow agent, a counter operable to increment when data is scheduled between a common source-destination pair; and checking the counter to identify an amount of data scheduled for the corresponding common source-destination pair.
11. The method of claim 9 , further comprising mapping, by the common analytic controller, at least one of a source, virtual Internet Protocol (IP) address and a destination, virtual IP address to at least one of a source, physical IP address and a destination, physical IP address, the at least one physical IP address being assigned to at least one end host supporting at least one virtual computing instance assigned to at least one of the source, virtual IP address and the destination, virtual IP address.
12. The method of claim 11 , further comprising: applying a source, physical IP address and the destination, physical IP address for a source-destination pair to graph data at the analytic controller; and determining a path through the datacenter for the source-destination pair from the graph data, the datacenter employing a connection based approach to packet switching that enforces a common path of pre-defined nodes and edges across the datacenter for connections sharing a common source, physical IP address and a common destination, physical IP address.
13. The method of claim 9 , further comprising: accessing, by a flow agent, an identifier for a characteristic of at least a portion of the distributed job, the identifier accessible at an end host supporting the flow agent; and correlating the identifier to a value for the characteristic, the value indexed to the identifier in an index maintained by the flow agent.
14. The method of claim 13 , wherein the characteristic of at least a portion of the distributed job comprises at least one of an application responsible for data transmitted between a source-destination pair, a task type responsible for the data transmitted between the source-destination pair, a user IDentification (ID) for a user of the datacenter initiating the distributed job.
15. The method of claim 9 , further comprising querying, by a flow agent, an operating system for at least one hardware measurement, the operating system providing an interface with end-host hardware for the end host supporting the flow agent.
16. A system for job-centric tracking of datacenter usage, comprising: multiple end hosts comprising hardware networked in a datacenter, the multiple end hosts providing distributed computing resources for a distributed, parallel processing job; multiple flow agents, each flow agent residing on an end host among the multiple end hosts, each flow agent comprising: a traffic module operable to determine an amount of data sent from a common source to a common destination by a process executing on the end host on which the each flow agent resides, the end host on which the each flow agent resides not being an intermediary node between the common source and the common destination, the process being part of a distributed parallel processing job; a hardware module operable to: query an operating system for hardware statistics of the process; provide a set of hardware statistics of the process inclusion with the report; and a report module operable to combine and correlate the amount of data sent for a source-destination pair with the hardware statistics of the process in a report; and an analytic controller operable to receive and aggregate reports from the multiple end hosts into a diagnostic data set for the datacenter, the analytic controller comprising a map module operable to map at least one of a source, virtual Internet Protocol (IP) address and a destination, virtual IP address to at least one of a source, physical IP address and a destination, physical IP address; wherein the map module is further operable to: access an identifier comprising at least one of a port number and a Process IDentifier (PID) for a process sending data from a common source to a common destination; map the identifier to an aspect of the distributed, parallel processing job; and provide information mapped to the identifier about the aspect for inclusion with the amount of data sent and the source-destination pair in the report, the information identifying an implementation phase of the distributed, parallel processing job.
17. The system of claim 16 , wherein the map module is further operable to update information used to map a virtual Internet Protocol(IP) address to a physical IP address.
Unknown
March 19, 2019
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.