Patentable/Patents/US-20260163836-A1
US-20260163836-A1

Adaptive Packet Spray Using On-Demand, Non-Cascading, In-Band Network Telemetry

PublishedJune 11, 2026
Assigneenot available in USPTO data we have
Technical Abstract

Adaptive packet spray using on-demand, non-cascading, in-band network telemetry permits an initiator node to obtain utilization metrics for paths of a packet-switched network, on-demand, by populating a header (e.g., an INT header) with a utilization metric, appending the header to multiple packets (e.g., RDMA packets), and transmitting the packets to a destination node via the respective paths. Each intermediate node of a path replaces the utilization metric of the header with the utilization metric of the intermediate node if the utilization metric of the intermediate node is greater than the utilization metric of the header. The highest utilization metric along the path serves as an occupancy/congestion metric for the path. The destination node receives the packets from the respective paths, prepares acknowledgments for the packets, and populates utilization metric fields of the acknowledgments with the utilization metrics of the respective packets.

Patent Claims

Legal claims defining the scope of protection, as filed with the USPTO.

1

append a resource utilization metric to packets of an application program; transmit the packets to a second network device via multiple respective paths of a packet-switched network based on a packet-spray method and path distribution weights; receive acknowledgments of receipt of the packets from the second network device, wherein the acknowledgments comprise resource utilization metrics of the respective paths; and adjust the path distribution weights based on the resource utilization metrics of the paths. a first network device configured to: . A system, comprising:

2

claim 1 . The system of, wherein the packets of the application program comprise remote direct memory access packets.

3

claim 1 . The system of, wherein the resource utilization metrics of the respective paths each comprise a resource metric of a most-congested network device of the respective path.

4

claim 1 . The system of, wherein the first network device is further configured to append the resource utilization metric within headers of the packets.

5

claim 1 . The system of, wherein the first network device is further configured to append the resource utilization metric within in-band telemetry headers of the packets.

6

claim 1 . The system of, wherein the first network device is further configured to append the resource utilization metric as metadata of the packets.

7

claim 1 determine one or more of a time and a frequency at which to append the resource utilization metric to subsequent packets of the application program based on the resource utilization metrics of the respective paths. . The system of, wherein the first network device is further configured to:

8

claim 1 . The system of, wherein the resource utilization metric comprises one or more of a queue occupancy metric, a buffer occupancy metric, an egress link utilization metric, and a memory usage metric.

9

claim 1 . The system of, wherein the resource utilization metric comprises a path occupancy ratio that is based on multiple resource utilization metrics.

10

claim 1 receive a packet; determine that the packet includes an appended resource utilization metric; replace the appended resource utilization metric with a resource utilization metric of the first network device if the resource utilization metric of the first network device is greater than the appended resource utilization metric; and transmit the packet to a destination node via the path. . The system of, wherein the first network device is further configured to serve as an intermediate network device of a path between other network devices of the packet-switched network, including to:

11

claim 1 receive packets from an initiating network device via multiple respective paths between the initiating network device and the first network device; parse resource utilization metrics appended to the packets received from the initiating network device; populate acknowledgements of the packets received from the initiating network device with the parsed resource utilization metrics; and transmit the acknowledgements of the packets received from the initiating network device to the initiating network device. . The system of, wherein the first network device is further configured to serve as a destination network device, including to:

12

receive a packet of an application program via a packet-switched network; determine that the packet includes an appended resource utilization metric; replace the appended resource utilization metric with a resource utilization metric of the first network device if the resource utilization metric of the first network device is greater than the appended resource utilization metric; and transmit the packet to a destination network device via the packet-switched network. a first network device configured to: . A system, comprising:

13

claim 12 . The system of, wherein the packet comprises a remote direct memory access packet.

14

claim 12 . The system of, wherein the resource utilization metric comprises one or more of a queue occupancy metric, a buffer occupancy metric, an egress link utilization metric, and a memory use metric.

15

claim 12 retrieve the resource utilization metric of the first network device from one or more of a register and a buffer. . The system of, wherein the first network device is further configured to:

16

claim 12 an application programming interface configured to retrieve the resource utilization metric of the first network device from a hardware-based source of the first network device. . The system of, wherein the first network device comprises:

17

claim 12 . The system of, wherein the appended resource utilization metric is appended to the packet as one or more of metadata and a header of the packet.

18

receive multiple packets of an application program from an initiating network device via multiple respective paths of a packet-switched network; parse resource utilization metrics that are appended to the packets; populate acknowledgements of receipt the packets with the parsed resource utilization metrics of the respective packets; and transmit the acknowledgements to the initiating network device. a first network device configured to: . A system, comprising:

19

claim 18 . The system of, wherein the parsed resource utilization metrics comprise one or more of a queue occupancy metric, a buffer occupancy metric, an egress link utilization metric, and a memory use metric.

20

claim 18 . The system of, wherein the resource utilization metrics comprise a resource metrics of a most-congested network device of the respective paths.

Detailed Description

Complete technical specification and implementation details from the patent document.

Examples of the present disclosure generally relate to adaptive packet spray using on-demand, non-cascading, in-band network telemetry.

In packet-switched networking, load balancing refers to methods for balancing or allocating packets amongst paths to optimize resource usage and avoid dropped packets due to congestion. Load balancing methods include flow-based and packet-spray methods.

Flow-based methods allocate packet flows amongst network paths (i.e., packets within a given flow are assigned to the same path). Flow-based methods allocate at a packet-flow level rather than a more-granular packet-level. Flow-based methods do not account for differences in flow sizes. As a result, in some situations (e.g., high-bandwidth flows, such as artificial intelligence applications), it may not be possible to appropriately balance loads.

Packet-spray methods distribute packets equally amongst multiple paths (i.e., packets within a flow follow different paths). Packet spray methods do not account for differences in congestion or workloads of nodes along the paths, and thus may not make optimum use of available bandwidth.

Techniques for adaptive packet spray using on-demand, non-cascading, in-band network telemetry are described. One example is a system that includes a first network device that obtains utilization metrics from multiple paths between the first network device and a second network device of a packet-switched network, on-demand, including, populating a header with a utilization metric, appending the header to multiple packets containing payloads of an application program executing on the first network device, transmitting the packets having the appended header to the second network device via respective ones of the paths between the first and second network device, receiving acknowledgments of receipt of the packets from the second network device, wherein the acknowledgments comprise utilization metrics of respective one of the paths, and adjusting packet spray distribution weights associated with the paths based on the utilization metrics of the acknowledgments.

Another example described herein is a system that includes a first network device that receives a packet from an originating node of a packet-switched network, where the packet includes a payload of an application program executing on the originating node, determines that a header of the packet includes a utilization metric, replaces the utilization metric of the header with a utilization metric of the first network device if the utilization metric of the first network device is greater than the utilization metric of the header, and transmits the packet to a destination node of the packet-switched network.

Another example described herein is a system that includes a first network device that a first network device that receives multiple packets from an initiating node of a packet-switched network via multiple respective paths of the packet-switched network, parses utilization metrics from headers of the packets, populates acknowledgements of the packets with the utilization metrics of the headers of the respective packets, and transmits the acknowledgements to the initiating node.

To facilitate understanding, identical reference numerals have been used, where possible, to designate identical elements that are common to the figures. It is contemplated that elements of one example may be beneficially incorporated in other examples.

Various features are described hereinafter with reference to the figures. It should be noted that the figures may or may not be drawn to scale and that the elements of similar structures or functions are represented by like reference numerals throughout the figures. It should be noted that the figures are only intended to facilitate the description of the features. They are not intended as an exhaustive description of the features or as a limitation on the scope of the claims. In addition, an illustrated example need not have all the aspects or advantages shown. An aspect or an advantage described in conjunction with a particular example is not necessarily limited to that example and can be practiced in any other examples even if not so illustrated, or if not so explicitly described.

Embodiments herein describe adaptive packet spray using on-demand, non-cascading, in-band network telemetry. Embodiments herein are described with respect to remote direct memory access (RDMA) packets. Embodiments herein are not, however, limited to RDMA packets.

RDMA over Converged Ethernet (RoCE) is a network protocol that allows RDMA over an Ethernet network. There are multiple RoCE versions. RoCE v2 enables direct memory access to a remote system through network interface controller (NIC) hardware by encapsulating an InfiniBand (IB) transport packet over Ethernet. RoCE v2 operates at the internet layer, allowing RoCE v2 packets to be routed. RoCE v2 offers low latency, high throughput, and minimal CPU involvement on both local and remote ends. In essence, RoCE v2 allows RDMA to run over generic IP networks, making it a powerful solution for network-intensive applications such as networked storage/cluster computing/AI workloads.

Early RDMA standards are directed to single path transport (i.e., RDMA packets only flow along one network path). Single path transport is prone to path failures and cannot utilize the rich parallel paths in modern datacenters. More recent RDMA standards that support multiple paths use a specific source port in a user datagram protocol (UDP) header. Actual paths may be determined based on an equal-cost multi-path (ECMP) routing strategy. An ECMP routing strategy essentially hashes different connections to different paths. ECMP routing strategies do not account for current path utilization. Thus, some paths a network may be highly congested, while other paths may have a low traffic load, reducing the overall network utilization.

Multi-path transport for RDMP (MP-RDMA) is a technique that uses a multi-path ACK-clocking mechanism to distribute traffic in a congestion-aware manner without incurring per-path states. MP-RDMA adjusts a congestion window based solely on explicit congestion notification (ECN) markings. MP-RDMA does not utilize real-time path occupancy metrics. Reaction time is thus slow, which may reduce its effectiveness in load balancing, and may lead to increased congestion.

Enhanced high precision congestion control (HPCC++) uses explicit in-band telemetry (INT) probe packets to obtain path utilization data. With INT probe packets, each intermediate node along a path adds a new INT header to the INT probe packet (i.e., cascaded INT headers), and a destination node parses the cascaded INT headers of the INT probe packet, and computes path utilization based on the parsed data. INT probe packets and parsing cascaded INT headers thus increase bandwidth overhead and latency.

Packet spray methods statically distribute traffic amongst available paths using different UDP source ports (i.e., without consideration of or adjusting for congestions/utilization differences amongst the paths).

Disclosed herein are on-demand, non-cascading methods of obtaining real-time path utilization metrics. In an example, an initiator node obtains utilization metrics, on-demand, by populating a header (e.g., an INT header) with a utilization metric, appending the header to multiple packets (e.g., packets related to an application program executing on the initiator node, such as ROCE v2 RDMA packets), and transmitting the packets to a destination node via respective paths of a network.

Each intermediate node of a path compares the utilization metric of the respective packet to a utilization metric of the intermediate node, and replaces the utilization metric of the packet with the utilization metric of the intermediate node if the utilization metric of the intermediate node is greater than the utilization metric of the packet. The highest utilization metric along a path serves as an occupancy metric for the path (e.g., path occupancy), between the initiating node and the destination node.

The destination node receives the packets from the respective paths, prepares acknowledgments (ACKs) for the packets, and populates utilization metric fields of the ACKs with the utilization metrics of the respective packets (i.e., without needing to compute path utilization metrics based on utilization metrics of intermediate nodes).

The originating node receives the ACKs, and determines whether to adjust a packet spray distribution weights based on the utilization metrics of the ACKs. In an example, the utilization metrics of three paths, A, B, and C, are 20%, 40%, and 70%, respectively. In this example, equal packet spray distribution weights is likely to result congestion on path C. As disclosed herein, the originating node may adjust the packet spray distribution weights based on the utilization metrics (e.g., such that 50% of packets are sent via path A, 35% of packets are sent via path B, and only 15% of packets are sent via path C). Dynamically adjusting the packet spray distribution weights based on the utilization metrics may ensure that the paths are utilized equally in real terms, and may reduce/avoid congestion.

The originating node may also adjust a frequency at which the originating node demands updated utilization metrics based on the utilization metrics.

The headers may piggyback on RDMA packets, and the utilization metrics may piggyback on corresponding RDMA ACKs.

Hence it is desirable to perform packet spray using real time queue/buffer occupancy details of all the network nodes in the path for optimal utilization of multipath bandwidth and to reduce possible drops/rate limiting due to congestion.

Methods disclosed herein may be useful for load balancing traffic for RDMA workloads and non-RDMA workloads.

Methods disclosed herein may be useful to rate-limit and balance traffic amongst paths between an initiator node and a destination node, without explicit congestion control mechanisms.

On-demand, non-cascading approaches disclosed herein may incur less bandwidth and latency overhead compared to other approaches (e.g., INT probe packets).

Methods disclosed herein may be useful in a variety of network applications, such as to improve throughput of artificial intelligence (AI) traffic and/or to reduction of congestion events in data centers.

1 FIG. 100 102 1 102 8 102 100 102 102 102 depicts a networkof interconnected node-through-(collectively, nodes), according to an embodiment. Networkmay represent a packet-switched network, and nodesmay represent network devices such as, for example and without limitations, routers, switches, hubs, bridges, modems, computers/servers, memory devices, and/or a printer. Nodesmay serve as connection points for data transmission, process recognition, packet switching, and network distribution. Nodesmay be programmed to identify, process, and transmit data from one node to another.

102 102 In an example, an application program is compiled to execute on nodes, or a subset thereof, and nodesexchange packets containing payloads (i.e., data and/or instructions) related to the application program.

102 102 1 102 4 104 2 104 4 104 3 104 104 1 102 2 102 3 104 2 102 5 102 6 104 3 102 7 102 8 102 1 102 4 1 FIG. In an example, nodestransmit packets to one another via multiple paths based on a packet spray distribution method. Examples are provided below in which node-transmits packets to node-via paths-,-, and-(collectively, paths). In the example of, path-includes nodes-and-. Path-includes nodes-and-. Path-includes nodes-and-. In the following examples, node-may be referred to as an originating node, and node-may be referred to as a destination node.

100 102 1 104 102 4 2 3 4 FIGS.,, and 2 FIG. 3 FIG. 4 FIG. Networkis described below with reference to.relates to an initiating node (e.g., node-).relates to intermediate nodes (i.e., nodes along paths).relates to a destination node (e.g., node-).

2 FIG. 1 FIG. 1 FIG. 200 200 100 200 depicts a methodof initiating adaptive remote direct memory access (RDMA) packet spray using on-demand, non-cascading, in-band network telemetry, according to an embodiment. Methodis described below with reference to networkof. Methodis not, however, limited to the example of.

202 102 2 102 2 At, initiating node-(e.g., a compute unit or a network interface controller of node-) populates a header with an initial utilization metric (e.g., zero). Example utilization metrics are provided further below.

202 102 2 102 2 106 1 106 2 106 3 106 104 1 104 2 104 3 1 FIG. At, initiating node-appends the header to multiple packets. In the example of, initiating node-appends the header to three packets,-,-, and-(collectively, packets), one for each of paths-,-, and-.

102 2 102 2 102 2 102 2 Initiating node-may append the header to packets related to an application program executing on node-, such as remote direct memory access (RDMA) packets, rather than packets dedicated to management functions such as load balancing or congestion management functions. Packets related to an application program executing on node-(e.g., RDMA packets) may be referred to as application packets. Packets dedicated to management functions may be referred to as management packets. Appending the header to application packets rather than management packets may be useful to avoid increasing congestion. In an example, the header is an in-band telemetry (INT) header. In this example, node-may further append a user datagram protocol (UDP) header to the packets, preceding the INT header, to indicate that the subsequent header is an INT header.

206 102 1 106 102 4 104 1 104 2 104 3 At, initiator node-transmits packetsto destination node-via respective paths-,-, and-(e.g., based on a packet-spray method).

208 102 1 106 1 106 2 106 3 102 4 102 1 102 4 3 4 FIGS.and At, node-waits for acknowledgements (ACKs) of receipt of packets-,-, and-from destination node-. While node-waits for acknowledgements (ACKs), intermediate nodes and destination node-may perform functions described below with reference to.

3 FIG. 1 FIG. 1 FIG. 300 106 300 100 300 depicts a methodof handling packetsby the intermediate nodes, according to an embodiment. Methodis described below with reference to networkof. Methodis not, however, limited to the example of.

302 102 2 102 5 102 7 106 1 106 2 106 4 At, intermediate nodes-,-, and-receive respective packets-,-, and-.

304 102 2 106 1 102 2 106 1 102 5 102 7 106 2 106 3 At, intermediate node-determines that a header of packet-includes the utilization metric. Intermediate node-may first determine that packet-includes an INT header, and may then determine that the INT header includes the utilization metric. Intermediate nodes-and-perform similar functions with respect to packets-and-.

306 102 2 102 2 102 5 102 7 106 2 106 3 At, intermediate node-compares the utilization metric of the header to a utilization metric of intermediate node-. Intermediate nodes-and-perform similar functions with respect to packets-and-. The presence of the INT header may serve as an instruction to the intermediate nodes to compare the utilization metric of the header to a utilization metric of the respective intermediate nodes.

The utilization metric may relate to a work load (e.g., a queue/buffer occupancy) and/or resource usage (e.g., memory usage). The utilization metric may be readily available/accessible in hardware, such that the intermediate nodes can access/read the utilization metric(s) by calling a hardware-specific application programming interface (API), by reading a register(s), and/or other method. The utilization metric may readily available as metadata (e.g., INT metadata), such as egress link utilization, buffer/queue occupancy, and/or other metadata. The intermediate nodes may determine the utilization metrics as path occupancy ratios (PORs) based on a combination of metrics.

308 102 2 106 1 102 2 102 2 102 2 102 2 102 5 102 7 106 2 106 3 At, if the utilization metric of intermediate node-is greater than the utilization metric of the header of packet-, intermediate node-replaces the utilization metric of the header with the utilization metric of intermediate node-. If the utilization metric of intermediate node-is not greater than the utilization metric of the header, intermediate node-takes no action with respect to the header. Intermediate nodes-and-perform similar functions with respect to packets-and-.

310 102 2 102 5 102 7 106 1 106 2 106 2 102 4 104 1 104 2 104 3 At, intermediate nodes-,-, and-forward respective packets-,-, and-, with the appended headers, toward destination node-along respective paths-,-, and-.

300 102 3 102 6 102 8 102 3 102 6 102 8 106 1 106 2 106 3 Methodmay be repeated with respect to intermediate nodes-,-, and-, when intermediate nodes-,-, and-receive respective packets-,-, and-.

4 FIG. 1 FIG. 1 FIG. 400 106 102 4 400 100 400 depicts a methodof handling packetsby destination node-, according to an embodiment. Methodis described below with reference to networkof. Methodis not, however, limited to the example of.

402 102 4 106 104 A, destination node-receives packetsvia respective paths.

404 102 4 106 1 106 2 106 3 At, destination node-parses the utilization from the headers of packets-,-, and-.

406 102 4 108 1 108 2 108 3 108 106 1 106 2 106 3 108 At, destination node-prepares acknowledgement (ACKs)-,-, and-(collectively, ACKs) for respective packets-,-, and-, and populates utilization fields of ACKswith the utilization of the respective packets.

408 102 4 108 102 1 At, destination node-transmits ACKsto initiating node-.

3 FIG. 4 FIG. 102 4 106 102 4 In the example of, an intermediate node updates the utilization metric of the header only if the utilization metric of the intermediate node is greater than the utilization metric of the header. Thus, when destination node-receives packets, each packet includes the worst-case utilization metric of a single node of the respective path. Although the worst-case utilization metric represents only a single node of a path, the worst-case utilization metric of the node may result in delay, congestion, and/or dropped packets, which impacts the entire path. Thus, in, destination node-need not receive utilization metrics of every intermediate node along a path (i.e., in respective headers), and need not compute path utilization metrics based on utilization metrics of every intermediate node along the path.

2 FIG. 102 1 108 208 210 102 1 108 In, when initiating node-receives ACKsat, processing proceeds to, where initiating node-parses the utilization metrics from ACKs.

212 102 1 108 102 1 At, initiating node-determines whether to adjust packet-spray distribution weights based on the utilization metrics parsed from ACKs. Initiating node-may determine to adjust the packet-spray distribution weights based a highest one of the utilization metrics, differences amongst the utilization metrics, comparison of one or more of the utilization metrics to a threshold(s), and/or based on another method(s).

102 1 212 214 102 1 102 1 102 4 104 104 1 104 2 104 3 102 1 104 1 102 1 104 1 104 2 104 3 If initiating node-determines to adjust the packet-spray distribution weights at, processing proceeds to, where initiating node-or other device, adjusts the packet-spray distribution weights. In an example, default or current packet-spray distribution weights may distribute/apportion packets sent from node-to node-equally amongst paths. If the utilization metric for path-is greater than the utilization metrics for paths-and-, initiating node-may adjust the packet-spray distribution weights to reduce the proportion of packets sent via path-. Initiating node-may, for example, adjust the packet-spray distribution weights such that 20% of the packets are sent via path-, 40% of the packets are sent via path-, and 40% of the packets are sent via path-.

104 1 104 2 104 3 102 1 104 1 102 1 104 1 104 2 104 3 400 If the utilization metric for path-is significantly greater than the utilization metrics for paths-and-, initiating node-may adjust the packet-spray distribution weights to reduce the proportion of packets sent via path-by a greater extent. Initiating node-may, for example, adjust the packet-spray distribution weights such that 10% of the packets are sent via path-, 45% of the packets are sent via path-, and 45% of the packets are sent via path-. Methodis not limited to the foregoing examples.

216 102 1 102 4 At, initiating node-transmits subsequent application packets to destination node-, without headers having the utilization metrics, based on the packet-spray distribution weights.

218 102 1 104 102 1 102 1 102 1 In an embodiment, at, initiating node-may determine to update the utilization metrics of pathson-demand. Initiating node-may determine to update the utilization metrics based on a criterion, an event, and/or condition. In an example, initiating node-updates the utilization metrics periodically. Initiating node-may further determine an update frequency based on one or more of the utilization metrics and/or other factors. Examples are provided below.

102 1 102 1 102 1 Initiating node-may determine an update frequency that is approximately proportional to a highest one of the utilization metrics. In this example, when the highest utilization metric is relatively low, initiating node-updates the utilization metrics less frequently. When the highest utilization metric is relatively high, initiating node-updates utilization metric more frequently.

102 1 102 1 102 1 102 1 102 1 Initiating node-may determine an update frequency as one of multiple pre-determined frequencies based on corresponding thresholds (i.e., step-based increments). As an example, when the highest utilization metric is below a first threshold (e.g., 20% of a maximum value), initiating node-may set the update frequency to a first frequency (i.e., a relatively low frequency). When the highest utilization metric is above a second threshold (e.g., 60% of the maximum value), initiating node-may set the update frequency to a second frequency that is higher than the first frequency. Initiating node-is not limited to the foregoing example thresholds and frequencies. Initiating node-may utilize fewer than two thresholds/frequencies, or more than two thresholds/frequencies.

Setting the update frequency based on node-based utilization metrics may be appropriate since it may be assumed that, when a utilization metric is relatively low, it is less likely that the utilization metric will rise to a level of concern in the near term.

102 1 102 1 102 1 Initiating node-is not limited to proportional or step-based frequency updating. Initiating node-is also not limited to initiating utilization metrics based on utilization metrics. In another example, initiating node-may determine to update the utilization metrics based on an event (e.g., a host command, or an application event) and/or a condition.

102 1 214 202 When initiating node-determines to update the utilization metrics at, processing returns to.

102 1 102 4 102 102 102 1 102 2 102 8 102 2 102 8 In the foregoing examples, node-serves as an initiating node, node-serves as a destination node, and other nodesserve as intermediate nodes. Each of nodesmay serve as an initiating node, an intermediate node, and a destination node. As an example, in addition to serving as an initiating node in the examples above, node-may also serve as a destination node for packets from one or more of nodes-through-, and/or may serve as an intermediate node for packets sent between other pairs of nodes-through-.

102 1 102 102 102 400 Further in the foregoing examples, initiating node-uses the utilization metrics to adjust packet-spray distribution weights. Alternatively, or additionally, the utilization metrics may be used for other purposes. As an example, a network manager may use the utilization metrics for network monitoring, security analysis, congestion alleviation (e.g., to dynamically assign or re-assign tasks/workloads amongst nodes), and/or for other purposes. If the network manager dynamically assign or re-assign tasks/workloads amongst nodes, the network manager may send a command to one or more nodesto update the utilization metrics (i.e., to initiate/invoke method).

As described above, an initiating node determines utilization metrics on-demand, by appending regular application packets with a header containing a utilization metric (i.e., piggybacking on existing packets). An intermediate node updates the header only if the utilization metric of the intermediate node is greater than the utilization metric of the header (i.e., non-cascading, read and compare operations). The destination node merely pastes a single utilization metric from the header of each of the packets into respective ACKs (i.e., piggybacking on existing ACKs). The foregoing methods provide on-demand updating of utilization metrics and dynamic adjustment of packet spray distribution weights, with little, if any, added overhead or delay.

102 500 500 502 502 5 FIG. 5 FIG. 5 FIG. One or more of nodesmay include a data processing unit (DPU), such as described below with reference to.depicts a DPU, according to an embodiment. In the example of, DPUincludes a host interfacethat interfaces with a host. The host may include one or more central processing units (CPUs) and/or graphic processing units (GPUs), and may execute one or more application programs. Host interfacemay include, for example and without limitation, a peripheral component interconnect express (PCIe) interface and/or other interface type(s).

500 504 504 504 504 DPUfurther includes one or more processors, one or more of which may include multiple processing cores. Processorsmay include CPUs, GPUs, and/or other type(s) of processors. Processorsmay form one or more CPU core complexes. Processorsmay include hardware/circuitry that uses an instruction set architecture (ISA) to process data, such as a complex instruction set computer (CISC) and/or reduced instruction set computer (RISC).

500 506 506 508 504 DPUfurther includes memory, which may include volatile and/or non-volatile memory such as random access memory (RAM), high bandwidth memory (HBM), and/or other memory. Memorymay store an operating system (OS)for execution by processors(i.e., separate from a host OS).

500 510 510 500 512 DPUfurther includes a network interfacethat interfaces with one or more other network IO systems over a network, such as a packet-switched network. Network interfacemay include an Ethernet interface and/or other interface type(s). DPUmay further include a packet bufferthat buffers incoming and/or outgoing packets.

500 514 512 510 514 4 514 516 DPUmay further include packet processing pipelines, which may include receive packet processing pipelines that process incoming packets from packet buffer, and/or transmit packet processing pipelines that process outgoing packets for transmission by network interface. Packet processing pipelinesmay be programmable (e.g., based on the Pprogramming language). Packet processing pipelinesmay include multiple stages.

514 514 514 514 514 Packet processing pipelinesor a subset thereof (e.g., receive packet processing pipelines or transmit packet processing pipelines) may operate in parallel with one another. Packet processing pipelinesor a subset thereof (e.g., receive packet processing pipelines or transmit packet processing pipelines) may perform the same tasks or differing tasks. As an example, a subset of packet processing pipelinesmay perform networking tasks, such as combining packets that were subdivided to be compatible with a maximum transmission unit (MTU). Another subset of packet processing pipelinesmay perform tasks related to the host (e.g., interfacing with a host OS, drivers, and/or message descriptor formats in host memory). Alternatively, or additionally, another subset of packet processing pipelinesmay serve as direct memory access (DMA) pipelines and/or remote direct memory access (RDMA) pipelines that handle DMA and/or RDMA access requests of the host.

514 516 516 500 516 514 516 516 514 516 514 Packet processing pipelinesmay include multiple stages(e.g., a set of data processing elements connected in series, where the output of one stage serves as the input to a subsequent stage). Stagesmay perform respective processes on packets or portions thereof (. e.g., packet headers and/or packet payloads). In an example, DPUfurther includes a parser that parses features of packets, such as packet header vector (PHV), for processing by stagesof one or more packet processing pipelines. Stagesmay include circuitry, which may be configurable and/or programmable, such as with the P4 programming language. In an example, stagesof multiple packet processing pipelinesperform the same functions (e.g., in parallel). Alternatively, or additionally, stagesof multiple packet processing pipelinesperform differing functions.

514 516 514 516 516 Packet processing pipelinesand/or stagesmay include local memory. The local memory may be programmed with local tables (e.g., match-action tables) that indicate whether/how packet processing pipelinesand/or stagesare to process a packet (e.g., based on features of the packet). In an example, one of stagesmay perform a lookup operation to read a policy entry in a table to determine whether an entity associated with the packet has exceeded a rate limit (e.g., a packet rate limit and/or a data rate limit).

500 518 518 DPUmay further include one or more acceleratorsthat perform specialized tasks, such as data movement tasks. Accelerator(s)may include, for example and without limitation, a cryptography accelerator, a data compression accelerator, an accelerator for performing regex or dedupe, and/or other accelerator type(s).

500 520 500 512 545 512 514 504 514 520 5 FIG. DPUfurther include an interconnect, depicted here as a packet-based network-on-chip (NoC)that provides communication links amongst other components of DPU. Alternatively, or additionally, the interconnect may include one or more other on-die or on-chip interconnects. In an example, the interconnect further includes direct or dedicated communication links (e.g., AXI interfaces) amongst two or more circuit blocks. In the example of, the interconnect further includes a communication link between packet bufferand network interface. In another example, the interconnect includes a link between packet bufferand packet processing pipelines. Processorsand packet processing pipelinesmay communicate with one another via NoC.

500 DPUmay further include security and/or management features, which may provide a hardware root of trust, secure boot, and/or other features.

500 DPUmay be configurable as and/or integrated within a network interface controller/card (NIC), such as a SmartNIC, such as to process packets before they are forwarded to the host and/or to process packets for transmission.

500 500 500 500 DPUmay serve as a programmable processor system, which may be useful as an offload engine or accelerator. As an example, DPUmay perform functions of, or on behalf of an application program executing on the host, which may free the host to perform other functions of the application program and/or functions of other application programs. DPUmay efficiently handle data-centric workloads such as data transfer, reduction, security, compression, analytics, and/or encryption, at scale in data centers. DPUmay improve efficiency and performance of data centers by offloading workloads from the host, and may enhance computing power and/or handling of complex data workloads.

In the preceding, reference is made to embodiments presented in this disclosure. However, the scope of the present disclosure is not limited to specific described embodiments. Instead, any combination of the described features and elements, whether related to different embodiments or not, is contemplated to implement and practice contemplated embodiments. Furthermore, although embodiments disclosed herein may achieve advantages over other possible solutions or over the prior art, whether or not a particular advantage is achieved by a given embodiment is not limiting of the scope of the present disclosure. Thus, the preceding aspects, features, embodiments and advantages are merely illustrative and are not considered elements or limitations of the appended claims except where explicitly recited in a claim(s).

As will be appreciated by one skilled in the art, the embodiments disclosed herein may be embodied as a system, method or computer program product. Accordingly, aspects may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.

Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium is any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).

Aspects of the present disclosure are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments presented in this disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various examples of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.

While the foregoing is directed to specific examples, other and further examples may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

Patent Metadata

Filing Date

December 10, 2024

Publication Date

June 11, 2026

Inventors

Rajshekhar BIRADAR

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “ADAPTIVE PACKET SPRAY USING ON-DEMAND, NON-CASCADING, IN-BAND NETWORK TELEMETRY” (US-20260163836-A1). https://patentable.app/patents/US-20260163836-A1

© 2026 Patentable. All rights reserved.

Patentable is a research and drafting-assistant tool, not a law firm, and does not provide legal advice. Documents we generate are drafts for review by a licensed patent attorney.

ADAPTIVE PACKET SPRAY USING ON-DEMAND, NON-CASCADING, IN-BAND NETWORK TELEMETRY — Rajshekhar BIRADAR | Patentable