Examples described herein relate to receiving memory access requests in a first number of connections from one or more front-end clients destined to a storage system and consolidating the memory access requests to a second number of connections between a network device and the storage system, wherein the second number is less than the first number. In some examples, consolidating the memory access requests includes combining read commands with other read commands destined to the storage system among connections of the first number of connections and combining write commands with other write commands destined to a same storage system among connections of the first number of connections. In some examples, consolidating the memory access requests includes performing protocol conversion to a format accepted by the storage system. In some examples, read or write commands are identified based on content of a header of a received packet, wherein the received packet includes a read or write command.
Legal claims defining the scope of protection, as filed with the USPTO.
20 .-. (canceled)
each of the first number of connections is from a different respective front-end client and accesses a different respective logical unit number (LUN) in the storage system, and each respective LUN can be mapped to a same block device in the storage system with a different respective range; and receive, at a first number of connections between the network device and a storage system, memory access requests from one or more front-end clients, wherein: consolidate the memory access requests to a second number of connections between the network device and the storage system. . An apparatus comprising a network device, the network device configured to:
claim 21 . The apparatus of, wherein the second number of connections is less than the first number of connections.
claim 21 . The apparatus of, wherein the network device comprises a processing pipeline to one or more of programmably merge and programmably translate the memory access requests.
claim 23 . The apparatus of, wherein the processing pipeline is a P4 or a C language programmable packet processing pipeline.
claim 23 . The apparatus of, wherein the processing pipeline comprises at least one match action unit configured to be programmed by a remote control plane.
claim 21 combine read commands with other read commands to the storage system; and combine write commands with other write commands to the storage system. . The apparatus of, wherein to consolidate the memory access requests, the network device is further configured to:
claim 21 . The apparatus of, wherein to consolidate the memory access requests, the network device is further configured to convert protocol to a format accepted by a target storage system.
claim 21 provide a storage interface to the one or more front-end clients for the first number of connections; and provide a front-end client interface to the storage system for the second number of connections. . The apparatus of, wherein the network device is further configured to:
claim 21 . The apparatus of, wherein to consolidate the memory access requests, the network device is further configured to store a state of a received memory access command to form a response to at least one of the one or more front-end clients.
claim 21 . The apparatus of, wherein the network device comprises one or more of a switch, a router, an endpoint transmitter, and an endpoint receiver.
each of the first number of connections is from a different respective front-end client and accesses a different respective logical unit number (LUN) in the storage system, and each respective LUN can be mapped to a same block device in the storage system with a different respective range; and receiving, at a first number of connections between a network device and a storage system, memory access requests from one or more front-end clients, wherein: consolidating the memory access requests to a second number of connections between the network device and the storage system. . A method comprising:
claim 31 . The method of, wherein the second number of connections is less than the first number of connections.
claim 31 . The method of, wherein the network device comprises a processing pipeline to one or more of programmably merge and programmably translate the memory access requests.
claim 33 . The method of, wherein the processing pipeline is a P4 or a C language programmable packet processing pipeline.
claim 33 . The method of, wherein the processing pipeline comprises at least one match action unit configured to be programmed by a remote control plane.
claim 31 combining read commands with other read commands to the storage system; and combining write commands with other write commands to the storage system. . The method of, wherein to consolidate the memory access requests, the method further comprises:
claim 31 . The method of, wherein to consolidate the memory access requests, the method further comprises converting protocol to a format accepted by a target storage system.
claim 31 providing a storage interface to the one or more front-end clients for the first number of connections; and providing a front-end client interface to the storage system for the second number of connections. . The method of, wherein the method further comprises:
claim 31 . The method of, wherein to consolidate the memory access requests, the method further comprises storing a state of a received memory access command to form of a response to at least one of the one or more front-end clients.
each of the first number of connections is from a different respective front-end client and accesses a different respective logical unit number (LUN) in the storage system, and each respective LUN can be mapped to a same block device in the storage system with a different respective range; and receive, at a first number of connections between the network device and a storage system, memory access requests from one or more front-end clients, wherein: consolidate the memory access requests to a second number of connections between the network device and the storage system. . A non-transitory computer-readable storage medium comprising instructions stored thereon, wherein the instructions, when executed by a network device, cause the network device to:
Complete technical specification and implementation details from the patent document.
Some applications deployed in a data center receive massive numbers of requests (connections) from different users under during events such as Black Friday online shopping event in U.S., Alibaba's single day event (November 11), and JD.com's 518 event (May 18). In these conditions, there is enormous pressure on backend (cloud) storage systems due to the communications over connections with front end users. In some cases, the storage system is not able to handle the connections and access to the storage system is degraded.
1 FIG. shows an example system. In this example, there are 4 storage clients (A, B, C, and D) that request use of storage services from storage clusters E and F. For example, storage clients A and B communicate with storage cluster E whereas storage clients C and D communicate with storage cluster F. Thus, services deployed in gateways for storage clusters E and F can maintain two connections with storage clients. A path to a backend storage system can include user's requests to an application, application issued requests to a storage engine managed by the applications (e.g., database), or storage engine issued requests (e.g., transformed to connections) to the remote gate way nodes. A storage gateway can include a network appliance or server which resides at the customer premises and translates cloud storage application program interfaces (APIs) such as SOAP or REST to block-based storage protocols such as Internet Small Computer Systems Interface (iSCSI) or FibreChannel or file-based interfaces such as Network File System (NFS) or Server Message Block (SMB). A storage cluster can include two or more storage servers working together to increase performance, capacity, or reliability.
Under high packet traffic, a gateway to a storage service node can discard received requests or cache the requests but not send the requests to the backend storage systems in order to protect the backend storage systems from being overwhelmed. A gateway service utilized by a storage service has a maximum number of connections and if the requests exceed the limitation of the gateway service, a denial of service (DoS) occurs, which is an undesirable result. For online shopping applications, the users may click the submission button again and the request can be retransmitted. If the application service discards user's requests under pressure, the customer experience may be negative as the website or application interaction is too slow.
Some solutions deploy more servers with more network adapters and leverage load balancing algorithms among gateway nodes of the storage clusters. Adding storage servers can support additional connections but have additional cost for the cloud infrastructure.
Various embodiments provide a Protocol Independent Switch Architecture (PISA) to deploy a storage gateway in a programmable pipeline of a network device to reduce the pressure on backend storage services. Various embodiments provide a network device configured to reduce a number of connections to a backend storage system. A storage connection can include a network connection between one process on a host and a destination process on another host. During the lifetime of the connection, there can be bi-directional interactions between the two processes and the two processes can utilize predefined storage protocols. Any protocol can be used for a connection including, but not limited to, TCP. Various embodiments configure a network device to perform I/O aggregation, batch handling, I/O merging and negotiate with the backend storage systems in order to behave as the storage service to the client applications and behave as the client to the storage systems, and perform conversion of storage commands from the front-end to the storage commands. Various embodiments can reduce the network connection related cost to the backend storage systems, which can be used to reduce the pressure to the backend storage systems, and improve the performance to the front-end applications. Various embodiments are transparent to the front end hosts and can improve performance of front-end applications, improving user experience if high number of transactions are received at a backend storage system.
A front-end application may negotiate with the storage service in the network device in order to not add pressure to the backend storage systems and the storage service in the network device can negotiate with the backend storage systems in order to aggregate the requests from front-end applications to reduce the pressure to the storage systems such as reducing connection numbers. To the front-end applications, the storage service in the network device acts as a storage server and the storage service in the network device acts as a client to the backend storage clusters. Thus, in some embodiments, the front-end applications do not communicate directly with backend storage systems and the backend storage system do not communicate directly with the applications since the backend storage systems communicate with the network device using an IP address of the network device.
For cloud service providers (CSPs) which have some special events (e.g., Alibaba, JD.com), these techniques can reduce the pressure of their cloud storage system and transform the pressure to the switches or routers.
2 FIG. 200 depicts a system. In some embodiments, network devicecan consolidate connections between storage clients A, B, C, and D and utilize fewer connections with gateways for backend storage systems so that a gateway utilizes fewer connections with storage clients. A storage client can include a mobile device, smart phone, or personal computer executing an application (e.g., database, purchases of goods, media streaming) and issues a request to a backend storage cluster associated with an IP address.
200 200 200 200 200 200 200 200 200 1 FIG. In some examples, network devicecan present itself as a front end client to a gateway of a storage cluster and reduce a number of connections with a storage cluster. In some examples, network devicecan utilize a front-end connection (from a front-end client to network device) and a backend connection (from network deviceto a gateway for a storage cluster). In this example, network devicecan receive packets with a destination IP addresses of 192.168.0.10 and 192.168.0.11 with destination port of 4420, which correspond to gateways for respective storage clusters E and F. In this example, network devicecan present itself as a storage client with a source IP address of 192.168.0.3 to gate ways for storage clusters E and F instead of providing any source IP address of any storage clients A-D. In this example, storage clients A, B, C, and D are communicatively coupled to network deviceusing four connections and network deviceis communicatively coupled to a gateway for storage cluster E using one connection and to a gateway for storage cluster F using one connection. In this example, compared to the configuration of, gateways for storage clusters E and F maintain one connection each instead of two connections. Network devicecan be implemented as one or more of: an endpoint transmitter or receiver, network interface, network interface card, network interface controller, smartNIC, router, or switch.
200 Some examples of network deviceare part of an Infrastructure Processing Unit (IPU) or data processing unit (DPU). An IPU or DPU can include a network interface with one or more programmable or fixed function processors to perform offload of operations that could have been performed by a CPU. The IPU or DPU can include one or more memory devices. In some examples, the IPU or DPU can perform virtual switch operations, manage storage transactions (e.g., compression, cryptography, virtualization), and manage operations performed on other IPUs, DPUs, servers, or devices.
200 202 202 202 202 202 For example, network devicecan utilize client intermediary. Some embodiments of client intermediarycan be configured using P4 switch-based solution (e.g., Barefoot Networks Tofino) to manage storage workloads with one or more storage clusters. Client intermediarycan maintain connections with the front-end clients and maintain connections to backend storage systems via gate ways. Client intermediarycould be configured with storage format (e.g., file, object, block) is supported by a storage cluster as well as supported protocol (e.g., Internet Small Computer Systems Interface (iSCSI), Nonvolatile Memory Express over Fabrics (NVMe-oF), key value (KV), iWARP, RoCE, or RoCEv2). For example, iSCSI is described at least in RFC 3720 (2004) and variations and revisions thereof. For example, NVMe-oF is described at least in NVM Express, Inc., “NVM Express Over Fabrics,” Revision 1.0, Jun. 5, 2016, and specifications referenced therein and variations and revisions thereof. Client intermediarycould be configured with storage format and supported protocols from a gateway service node via a communication channel with the gateway service node, orchestrator, hypervisor, or administrator.
202 200 202 202 202 200 Various modules in client intermediarycan be deployed in network device. Client intermediarycan perform I/O aggregation, batch handling, perform I/O merging, negotiate with the backend storage systems, and convert the storage commands from the front-end to the storage commands supported by backend storage systems. Client intermediarycan intercept input/output (I/O) requests received in network packets from front end storage clients A-D and perform I/O merging or batch operations based on protocols utilized by a target storage cluster. For example, in some cases, client intermediarycan merge storage commands from a front-end storage client and construct a new packet with other storage commands for transmission to a gate way. For example, read/write IO commands can be recognized by a header format and merged. For example, network devicecan buffer received commands destined for gateways for clusters E and F and apply a polling policy applied to attempt to merge buffered commands by an event-driven or timer policy to balance merging commands with introducing too much delay in propagating commands to a gate way. In some cases, if a command is received and buffered and a timer expires and no command merging for a gateway can be achieved because merely one command addressed to a gateway occurs, the command can be transferred to a gateway without merging with another command.
202 202 For example, client intermediarycan convert storage commands received in an iSCSI communication to NVMe-oF format or merge storage commands received in iSCSI communications to an NVMe-oF format command. For example, client intermediarycan convert storage commands received in an NVMe-OF communication to iSCSI format or merge storage commands received in NVMe-OF communications to an iSCSI format command.
202 202 200 202 202 For example, client intermediarycan convert block based storage protocol commands (e.g., iSCSI and NVMe-oF) into an abstracted common block-based format (e.g., as abstract block I/O (ABIO)). Block storage can involve dividing data into blocks and then stores those blocks as separate pieces, with a unique identifier. For example, a block can have a unit size of 512B and logical block address 0 (LBA0) can refer to a first block, LBA1 can refer to a second block, and so forth. Client intermediarycan convert intermediary ABIO format commands to backend I/O command. Use of an abstracted common block-based format can assist with conversion of front-end protocols or semantics to a variety of different backend storage protocols or semantics and vice versa. Likewise, for responses received from a back-end storage system or gate way received at network device, client intermediarycan convert a backend I/O response to ABIO response format and an ABIO response format to a fronted I/O response. In some examples, client intermediarycan merge multiple backend I/O responses into a single front-end I/O response.
202 202 For example, for an I/O operation originated from a source IP address, client intermediarycan maintain a mapping table from the original command's target information to the transformed command's target information to determine how to construct a response command to a storage client. For example, if an NVMe-oF response command is 16 bytes and contains the command ID (CID), submission queue identifier (sqid), submission queue header (sqhd), client intermediarycan translate the response command to an ABIO format and from ABIO format to a format of the source storage client.
200 For front end commands such as keep alive NVMe command if the backend protocol is same as the front-end end protocol (e.g., NVMe-oF), stateless conversion can be performed. Various embodiments can store state used for mapping front-end connections and I/O requests to backend connections and backend I/O requests or mapping backend connections and backend I/O responses to front-end connections and front-end I/O responses. States can include information used to map a front end I/O command to a back-end I/O command such as LBA address mapping between the front and backend I/O commands. State information can be stored in a memory of network devicein a table format. For example, if a front end iSCSI command in a packet is to write I/O on a target LUN with logical block address (LBA)=A, after a command translation, back-end command can attempt to write to LBA=B on a block device in a backend storage system. After the completion, a response from the backend storage system indicating a write to LBA=B can be converted to refer to LBA=A according to the front end packet's destination address information. A front-end response to a storage client can refer to target address A.
202 Client intermediarycan perform Admin commands. For example, an Admin command can include creating a name space in the NVMe-oF protocol to translate name space creating commands. For example, a RADOS Block Device (RBD) device creation request can be issued to a backend storage system if the backend storage system is Ceph.
A storage cluster can include a cloud storage system or enterprise storage services but associated media can be local or remotely positioned with respect to the cloud storage system or enterprise storage services.
3 FIG. 3 FIG. 200 300 315 320 300 325 depicts an example of a network device. Various embodiments of a network devicecan utilize any of the technologies of the network device described with respect toto merge and/or translate command formats. A network device can include a programmable packet engine pipelinethat includes (1) physical portsthat receive data messages from, and transmit data messages to, devices outside of the programmable packet engine, (2) a data-plane forwarding circuit (“data plane”)that perform the forwarding operations of the programmable packet engine(e.g., that receive data messages and forward the data messages to other devices), and (3) a control-plane circuit (“control plane”)that provides a configuration interface for configuring the forwarding behavior of the data plane forwarding circuit.
320 312 330 335 312 315 300 312 312 315 312 315 315 312 320 As further shown, the data planeincludes ports, configurable message processing circuitsand a data-plane configurator. In some embodiments, several portsreceive data messages from and forward data messages to portsof the programmable packet engine. For instance, in some embodiments, N data-plane ports(e.g., 4 ports) are associated with each portof the programmable packet engine. The N-portsfor each portare viewed as N-channels of the port. In some embodiments, several data-plane portsare associated with other modules (e.g., data plane configurator) of the data plane.
330 335 330 335 330 320 305 The configurable message-processing circuitsperform the configurable data-plane forwarding operations of the programmable packet engine to process and forward data messages to their destinations. The data-plane configuratorcan be a processor-executed driver that configures configurable message-processing circuitsbased on configuration data supplied by the control-plane circuit 325. The data-plane configuratorcan also configure these circuitsbased on configuration data messages that the data planereceives in-band from the remote controller.
330 340 342 344 340 342 315 300 312 In some embodiments, the configurable message-forwarding circuitsof the data plane include several ingress processing pipelines, several egress processing pipelines, and a traffic management stagebetween the ingress and egress processing pipelinesand. In some embodiments, each ingress or egress pipeline is associated with one or more physical portsof the programmable packet engine. Also, in some embodiments, each ingress or egress pipeline is associated with several data-plane ports.
350 352 354 350 354 352 344 312 315 Also, in some embodiments, an ingress or egress pipeline includes a parser, several message-processing stages, and a deparser(e.g., packet modifier). A pipeline's parserextracts a message header from a data message that the pipeline receives for processing. In some embodiments, the extracted header is in a format of a header vector (HV), which can be modified by successive message processing stages as part of their message processing operations. The parser of a pipeline passes the payload of the message to the deparseras the pipeline's message processing stagesoperate on the header vectors. When a pipeline finishes processing a data message and the message has to be provided to the traffic management stage(in case of an ingress pipeline) or to a portto forward to a port(in case of an egress pipeline) to be forwarded to the message's next hop (e.g., to its destination compute node or next forwarding element), a deparser of the pipeline in some embodiments produces the data message header from the message's header vector that was processed by the last message processing stage, and combines this header with the data message's payload.
The operations of the data plane's message processing stages are configured by a local or remote control plane using P4 or other language in some embodiments. In some embodiments, a local control plane is implemented by a control software layer that is executed by one or more general purpose processors (e.g., CPUs) of the forwarding element, while a remote control plane is implemented by control software layer executing by one or more CPUs of another forwarding element or a remote computer (e.g., server).
325 342 210 325 Control planecan program one or more match action units (MAUs) of egress processing pipelineto perform or implement packet modifierby determining if a packet belongs to a stream and/or port and if so, performing an action of modifying a particular header field or fields of a test packet. Control planecan program functions such as MAUs to perform command merging or translation from front-end to backend or backend to front-end as described herein.
4 FIG. 4 FIG. 400 402 402 shows an example of storage command processing. For example, a network device and/or client intermediary can utilize and perform operations described with respect to. Network devicecan receive commands from any of storage clients A or B. For example, a command can be received in a packet that is consistent with Ethernet and includes an iSCSI header or NVMe-OF header that refers to a read or write command and data. Front-end packet handlingcan decapsulate/encapsulate the packets from/to the front-end applications. For received packets from a storage client, front end packet handlingcan perform packet decapsulation to decapsulate received packets and provide a read or write command and data for subsequent processing.
404 404 Front-end protocol handlingcan perform I/O aggregation, batching, or I/O merging. For example, multiple iSCSI services (front end) can be mapped to fewer NVMe-oF connections (back-end) or NVMe-OF connections (front end) can be mapped to fewer iSCSI services (back-end). For example, commands can be merged whereby block-based read/write commands can be combined. Front-end protocol handlingcan negotiate with backend storage systems to conduct I/O operations with a back-end storage cluster.
404 404 404 If the backend and the front-end protocols are the same, no conversion may occur but a consolidation or merging of commands can occur. For example, if NVMe-oF is used by front-end clients and backend storage clusters, front-end protocol handlingcould be forward the command without translation. Front-end protocol handlingcan provide a storage service target that can translate storage commands (e.g., block based commands) from storage clients or front-end users into an abstracted block based storage commands (e.g., BIO in Linux kernel). For a response from a backend storage system, storage service target of front-end protocol handlingcan translate the received response to an abstracted block based storage command and translate the abstracted block based storage command to a formats used in a request by a front end user storage client.
406 404 406 404 I/O convertercan convert abstracted block based storage commands received from front-end protocol handlingto a format of storage commands which can be recognized by the backend storage systems. I/O convertercan convert storage commands received from back-end storage clusters to abstracted block based storage commands for processing by front-end protocol handling. If front-end and backend use different storage semantics, some I/O transforming efforts can be performed. For example, if the front-end utilizes iSCSI commands, but the backend is used to connect the Ceph object storage daemons (OSDs) directly, then the iSCSI commands could be transformed into Ceph RADOS Block Device (RBD) based requests.
408 Backend packet handlingcan encapsulate commands and/or data into packets for transmission to the backend storage systems or decapsulate packets from backend storage systems to provide commands and/or data.
5 FIG.A 5 FIG.A shows an example of protocol conversions for front end NVMe-oF communications. NVMe-oF defines a common architecture that supports a range of storage networking fabrics for NVMe block storage protocol over a storage networking fabric. NVMe-oF enables a front-side interface into storage systems, scaling out to large numbers of NVMe devices and extending the distance within a datacenter over which NVMe devices and NVMe subsystems can be accessed. In the example of, the front-end application utilizes an NVMe-oF target service and the backend storage system is Ceph or iSCSI. In this case, I/O converter translates NVMe-OF commands into Ceph or iSCSI.
The following is an example of a write operation using NVMe-OF TCP transport from the clients to the backend storage systems. At (1) a data packet is received at a network element and the network element has a source IP address of 192.168.0.3. At (2), a front-end packet handling module extracts the NVMe command (write command) in the packet and marks which connection it belongs to. At (3), the command is provided to a customized storage service NVMe-oF target. At (4), the NVMe write command is transformed into an RBD write request by an I/O converter because the backend storage system is Ceph according to the NVMe subsystem configuration in NVMe-oF target. I/O converter may merge the front-end storage I/O commands in order to send them in a batched manner to the backend storage systems. At (5), converted command(s) is/are encapsulated in one or more packets. At (6), when the commands issued to the Ceph cluster is completed, the backend packet handling module can decapsulate the packets and extract the response command. At (7), the response command can be converted by I/O converter to NVMe-oF and passed to an NVMe-oF target in the network device. At (8), the NVMe-oF target obtains the command and changes NVMe-oF target status if needed. At (9), the NVMe-oF target delivers the NVMe I/O response into NVMe-oF TCP packet for transmission to a front-end application using a previous marked connection between the front-end application and the network element that is an intermediary between a client application and storage cluster gateway.
5 FIG.B 5 FIG.B shows an example of protocol conversions for a front end iSCSI communications service. Similar operations can take place in the system ofto perform command format translation for a front end iSCSI communication to a back-end NVMe-oF or Ceph command, or others.
6 FIG.A 602 depicts an example process. The process can be performed by a network device including a switch, router, smartNIC, network interface controller, endpoint transmitter, or endpoint receiver. At, a network device can be configured to perform storage communication translation and connection consolidation between front end client devices and back-end storage services, gateways, or clusters. The network device can include a programmable data plane or offload engine to perform client-to-storage cluster intermediary operations. To configure the network data plane, the data plane can be programmable using P4, C, Broadcom Inc. Network Programming Language (NPL), or x86 compatible executable binaries or other executable binaries.
604 606 604 At, the network device can determine if an access request is received. If an access request is received, the process can continue to. If an access request is not received, the process can repeat. The access request can be received using a connection formed between a front end client and a storage system. A memory access request can include a read command or write command with data and can refer to any address in volatile or non-volatile memory or a storage medium.
606 At, the network device can attempt to consolidate memory access requests to a fewer number of connections than between the front end client and the network device. For example, if two connections from different hosts access two different logical unit numbers (LUNs) in the storage service and if the two LUNs can be mapped to the same block device in the backend storage system but with different ranges, the network device can use one connection to connect to the backend storage systems. Consolidating the memory access requests can include combining read commands with other read commands among front end connections with front end clients or combining write commands with other write commands among connections among front end connections with front end clients. In some cases, if a front end and back-end utilize the same protocol, protocol conversion may not be utilized. In cases where a front end and back-end utilize different protocols, protocol conversion can occur from a format of a connection from the front end client to a second format accepted by the back-end storage system. Examples of supported formats can include iSCSI services, NVMe-oF subsystems, Key-Value (KV) stores, iWARP, iSCSI, RDMA, any custom protocol.
The network device can save connection state for use in managing responses to a request sent to a backend storage system. State can include original IP address and port and the original active commands on this connection.
608 At, the network device can transmit one or more packets to the band end storage system that is the target of the access request. The one or more packets can be transmitted using a connection between the network device and the back-end storage system.
6 FIG.B 650 depicts an example process. The process can be performed by a network device including a switch, router, smartNIC, network interface controller, endpoint transmitter, or endpoint receiver. At, a network device can be configured to perform storage communication translation between back-end storage services, gateways, or clusters and front end client devices. The network device can include a programmable data plane. To configure the network data plane, the data plane can be programmable using P4 or C. In some examples, the switch can offload some command header parsing work or memory copy work to an offload engine.
652 654 652 654 656 At, the network device can determine if a response is received from a storage system. If a response was received, the process can continue to. If a response was not received the process can repeat. At, the network device can generate a response to an associated request. If the response format is different than a format of the associated request, the network device can selectively convert the response to a format consistent with a format of a received memory access request among the received memory access requests. If the response format is a same as a format of the associated request, the network device can transfer indicators in the response for transmission to the client. State of a connection utilized by a the request to which the response is provided can be used to form a response. At, the network device can transmit the response in a packet using a particular connection on which a corresponding request was received.
7 FIG. 700 700 710 700 710 700 710 700 depicts a system. Various embodiments can be used by systemto transmit and receive requests from a storage system based on embodiments described herein. Systemincludes processor, which provides processing, operation management, and execution of instructions for system. Processorcan include any type of microprocessor, central processing unit (CPU), graphics processing unit (GPU), processing core, or other processing hardware to provide processing for system, or a combination of processors. Processorcontrols the overall operation of system, and can be or include, one or more programmable general-purpose or special-purpose microprocessors, digital signal processors (DSPs), programmable controllers, application specific integrated circuits (ASICs), programmable logic devices (PLDs), or the like, or a combination of such devices.
700 712 710 720 740 742 712 740 700 740 740 730 710 740 730 710 In one example, systemincludes interfacecoupled to processor, which can represent a higher speed interface or a high throughput interface for system components that needs higher bandwidth connections, such as memory subsystemor graphics interface components, or accelerators. Interfacerepresents an interface circuit, which can be a standalone component or integrated onto a processor die. Where present, graphics interfaceinterfaces to graphics components for providing a visual display to a user of system. In one example, graphics interfacecan drive a high definition (HD) display that provides an output to a user. High definition can refer to a display having a pixel density of approximately 100 PPI (pixels per inch) or greater and can include formats such as full HD (e.g., 1080p), retina displays, 4K (ultra-high definition or UHD), or others. In one example, the display can include a touchscreen display. In one example, graphics interfacegenerates a display based on data stored in memoryor based on operations executed by processoror both. In one example, graphics interfacegenerates a display based on data stored in memoryor based on operations executed by processoror both.
742 710 742 742 742 742 742 Acceleratorscan be a fixed function or programmable offload engine that can be accessed or used by a processor. For example, an accelerator among acceleratorscan provide compression (DC) capability, cryptography services such as public key encryption (PKE), cipher, hash/authentication capabilities, decryption, or other capabilities or services. In some embodiments, in addition or alternatively, an accelerator among acceleratorsprovides field select controller capabilities as described herein. In some cases, acceleratorscan be integrated into a CPU socket (e.g., a connector to a motherboard or circuit board that includes a CPU and provides an electrical interface with the CPU). For example, acceleratorscan include a single or multi-core processor, graphics processing unit, logical execution unit single or multi-level cache, functional units usable to independently execute programs or threads, application specific integrated circuits (ASICs), neural network processors (NNPs), programmable control logic, and programmable processing elements such as field programmable gate arrays (FPGAs) or programmable logic devices (PLDs). Acceleratorscan provide multiple neural networks, CPUs, processor cores, general purpose graphics processing units, or graphics processing units can be made available for use by artificial intelligence (AI) or machine learning (ML) models. For example, the AI model can use or include any or a combination of: a reinforcement learning scheme, Q-learning scheme, deep-Q learning, or Asynchronous Advantage Actor-Critic (A3C), combinatorial neural network, recurrent combinatorial neural network, or other AI or ML model. Multiple neural networks, processor cores, or graphics processing units can be made available for use by AI or ML models.
720 700 710 720 730 730 732 700 734 732 730 734 736 732 734 732 734 736 700 720 722 730 722 710 712 722 710 Memory subsystemrepresents the main memory of systemand provides storage for code to be executed by processor, or data values to be used in executing a routine. Memory subsystemcan include one or more memory devicessuch as read-only memory (ROM), flash memory, one or more varieties of random access memory (RAM) such as DRAM, or other memory devices, or a combination of such devices. Memorystores and hosts, among other things, operating system (OS)to provide a software platform for execution of instructions in system. Additionally, applicationscan execute on the software platform of OSfrom memory. Applicationsrepresent programs that have their own operational logic to perform execution of one or more functions. Processesrepresent agents or routines that provide auxiliary functions to OSor one or more applicationsor a combination. OS, applications, and processesprovide software logic to provide functions for system. In one example, memory subsystemincludes memory controller, which is a memory controller to generate and issue commands to memory. It will be understood that memory controllercould be a physical part of processoror a physical part of interface. For example, memory controllercan be an integrated memory controller, integrated onto a circuit with processor.
732 732 750 750 750 732 750 732 750 732 750 750 750 In some examples, OScan determine a capability of a device associated with a device driver. For example, OScan receive an indication of a capability of a device (e.g., NIC) to configure a NICto perform any of the capabilities described herein (e.g., NICconsolidating access requests and reducing connections with a back-end storage system,). OScan request a driver to enable or disable NICto perform any of the capabilities described herein. In some examples, OS, itself, can enable or disable NICto perform any of the capabilities described herein. OScan provide requests (e.g., from an application or virtual machine) to NICto utilize one or more capabilities of NIC. For example, any application can request use or non-use of any of capabilities described herein by NIC.
700 While not specifically illustrated, it will be understood that systemcan include one or more buses or bus systems between devices, such as a memory bus, a graphics bus, interface buses, or others. Buses or other signal lines can communicatively or electrically couple components together, or both communicatively and electrically couple the components. Buses can include physical communication lines, point-to-point connections, bridges, adapters, controllers, or other circuitry or a combination. Buses can include, for example, one or more of a system bus, a Peripheral Component Interconnect (PCI) bus, a Hyper Transport or industry standard architecture (ISA) bus, a small computer system interface (SCSI) bus, a universal serial bus (USB), or an Institute of Electrical and Electronics Engineers (IEEE) standard 1394 bus (Firewire).
700 714 712 714 714 750 700 750 750 750 750 710 720 750 In one example, systemincludes interface, which can be coupled to interface. In one example, interfacerepresents an interface circuit, which can include standalone components and integrated circuitry. In one example, multiple user interface components or peripheral components, or both, couple to interface. Network interfaceprovides systemthe ability to communicate with remote devices (e.g., servers or other computing devices) over one or more networks. Network interfacecan include an Ethernet adapter, wireless interconnection components, cellular network interconnection components, USB (universal serial bus), or other wired or wireless standards-based or proprietary interfaces. Network interfacecan transmit data to a device that is in the same data center or rack or a remote device, which can include sending data stored in memory. Network interfacecan receive data from a remote device, which can include storing received data into memory. Various embodiments can be used in connection with network interface, processor, and memory subsystem. Various embodiments of network interfaceuse embodiments described herein to receive or transmit timing related signals and provide protection against circuit damage from misconfigured port use while providing acceptable propagation delay.
700 760 760 700 770 700 700 In one example, systemincludes one or more input/output (I/O) interface(s). I/O interfacecan include one or more interface components through which a user interacts with system(e.g., audio, alphanumeric, tactile/touch, or other interfacing). Peripheral interfacecan include any hardware interface not specifically mentioned above. Peripherals refer generally to devices that connect dependently to system. A dependent connection is one where systemprovides the software platform or hardware platform or both on which operation executes, and with which a user interacts.
700 780 780 720 780 784 784 786 700 784 730 710 784 730 700 780 782 784 782 714 710 710 714 In one example, systemincludes storage subsystemto store data in a nonvolatile manner. In one example, in certain system implementations, at least certain components of storagecan overlap with components of memory subsystem. Storage subsystemincludes storage device(s), which can be or include any conventional medium for storing large amounts of data in a nonvolatile manner, such as one or more magnetic, solid state, or optical based disks, or a combination. Storageholds code or instructions and datain a persistent state (e.g., the value is retained despite interruption of power to system). Storagecan be generically considered to be a “memory,” although memoryis typically the executing or operating memory to provide instructions to processor. Whereas storageis nonvolatile, memorycan include volatile memory (e.g., the value or state of the data is indeterminate if power is interrupted to system). In one example, storage subsystemincludes controllerto interface with storage. In one example controlleris a physical part of interfaceor processoror can include circuits or logic in both processorand interface.
A volatile memory is memory whose state (and therefore the data stored in it) is indeterminate if power is interrupted to the device. Dynamic volatile memory uses refreshing the data stored in the device to maintain state. One example of dynamic volatile memory incudes DRAM (Dynamic Random Access Memory), or some variant such as Synchronous DRAM (SDRAM). An example of a volatile memory include a cache. A memory subsystem as described herein may be compatible with a number of memory technologies, such as DDR3 (Double Data Rate version 3, original release by JEDEC (Joint Electronic Device Engineering Council) on June 27, 2007). DDR4 (DDR version 4, initial specification published in September 2012 by JEDEC), DDR4E (DDR version 4), LPDDR3 (Low Power DDR version3, JESD209-3B, August 2013 by JEDEC), LPDDR4) LPDDR version 4, JESD209-4, originally published by JEDEC in August 2014), WIO2 (Wide Input/output version 2, JESD229-2 originally published by JEDEC in August 2014, HBM (High Bandwidth Memory, JESD325, originally published by JEDEC in October 2013, LPDDR5 (currently in discussion by JEDEC), HBM2 (HBM version 2), currently in discussion by JEDEC, or others or combinations of memory technologies, and technologies based on derivatives or extensions of such specifications. The JEDEC standards are available at www.jedec.org.
A non-volatile memory (NVM) device is a memory whose state is determinate even if power is interrupted to the device. In one embodiment, the NVM device can comprise a block addressable memory device, such as NAND technologies, or more specifically, multi-threshold level NAND flash memory (for example, Single-Level Cell (“SLC”), Multi-Level Cell (“MLC”), Quad-Level Cell (“QLC”), Tri-Level Cell (“TLC”), or some other NAND). A NVM device can also comprise a byte-addressable write-in-place three dimensional cross point memory device, or other byte addressable write-in-place NVM device (also referred to as persistent memory), such as single or multi-level Phase Change Memory (PCM) or phase change memory with a switch (PCMS), Intel® Optane™ memory, NVM devices that use chalcogenide phase change material (for example, chalcogenide glass), resistive memory including metal oxide base, oxygen vacancy base and Conductive Bridge Random Access Memory (CB-RAM), nanowire memory, ferroelectric random access memory (FeRAM, FRAM), magneto resistive random access memory (MRAM) that incorporates memristor technology, spin transfer torque (STT)-MRAM, a spintronic magnetic junction memory based device, a magnetic tunneling junction (MTJ) based device, a DW (Domain Wall) and SOT (Spin Orbit Transfer) based device, a thyristor based memory device, or a combination of any of the above, or other memory.
700 700 700 A power source (not depicted) provides power to the components of system. More specifically, power source typically interfaces to one or multiple power supplies in systemto provide power to the components of system. In one example, the power supply includes an AC to DC (alternating current to direct current) adapter to plug into a wall outlet. Such AC power can be renewable energy (e.g., solar power) power source. In one example, power source includes a DC power source, such as an external AC to DC converter. In one example, power source or power supply includes wireless charging hardware to charge via proximity to a charging field. In one example, power source can include an internal battery, alternating current supply, motion-based power supply, solar power supply, or fuel cell source.
700 In an example, systemcan be implemented using interconnected compute sleds of processors, memories, storages, network interfaces, and other components. High speed interconnects can be used such as: Ethernet (IEEE 802.3), remote direct memory access (RDMA), InfiniBand, Internet Wide Area RDMA Protocol (iWARP), Transmission Control Protocol (TCP), User Datagram Protocol (UDP), quick UDP Internet Connections (QUIC), RDMA over Converged Ethernet (RoCE), Peripheral Component Interconnect express (PCIe), Intel QuickPath Interconnect (QPI), Intel Ultra Path Interconnect (UPI), Intel On-Chip System Fabric (IOSF), Omnipath, Compute Express Link (CXL), HyperTransport, high-speed fabric, NVLink, Advanced Microcontroller Bus Architecture (AMBA) interconnect, OpenCAPI, Gen-Z, Cache Coherent Interconnect for Accelerators (CCIX), Infinity Fabric (IF), 3GPP Long Term Evolution (LTE) (4G), 3GPP 5G, and variations thereof. Data can be copied or stored to virtualized storage nodes or accessed using a protocol such as NVMe over Fabrics (NVMe-oF) or NVMe.
8 FIG. 800 802 804 806 800 808 810 812 814 816 804 818 depicts an environmentincludes multiple computing racks, each including a Top of Rack (ToR) switch, a pod manager, and a plurality of pooled system drawers. Various embodiments can be used by environmentto reduce connections or consolidate commands based on embodiments described herein. Generally, the pooled system drawers may include pooled compute drawers and pooled storage drawers. Optionally, the pooled system drawers may also include pooled memory drawers and pooled Input/Output (I/O) drawers. In the illustrated embodiment the pooled system drawers include an Intel® Xeon® processor pooled computer drawer, and Intel® ATOM™ processor pooled compute drawer, a pooled storage drawer, a pooled memory drawer, and a pooled I/O drawer. Each of the pooled system drawers is connected to TOR switchvia a high-speed link, such as an Ethernet link or Silicon Photonics (SiPh) optical link.
802 804 820 802 806 Multiple of the computing racksmay be interconnected via their ToR switches(e.g., to a pod-level switch or data center switch), as illustrated by connections to a network. In some embodiments, groups of computing racksare managed as separate pods via pod manager(s). In one embodiment, a single pod manager is used to manage all of the racks in the pod. Alternatively, distributed pod managers may be used for pod management operations.
800 822 824 800 Environmentfurther includes a management interfacethat is used to manage various aspects of the environment. This includes managing rack configuration, with corresponding parameters stored as rack configuration data. In an example, environmentcan be implemented using interconnected compute sleds of processors, memories, storages, network interfaces, and other components.
In some examples, network interface and other embodiments described herein can be used in connection with a base station (e.g., 3G, 4G, 5G and so forth), macro base station (e.g., 5G networks), picostation (e.g., an IEEE 802.11 compatible access point), nanostation (e.g., for Point-to-MultiPoint (PtMP) applications), on-premises data centers, off-premises data centers, edge network elements, fog network elements, and/or hybrid data centers (e.g., data center that use virtualization, cloud and software-defined networking to deliver application workloads across physical data centers and distributed multi-cloud environments).
Embodiments herein may be implemented in various types of computing and networking equipment, such as switches, routers, racks, and blade servers such as those employed in a data center and/or server farm environment. The servers used in data centers and server farms comprise arrayed server configurations such as rack-based servers or blade servers. These servers are interconnected in communication via various network provisions, such as partitioning sets of servers into Local Area Networks (LANs) with appropriate switching and routing facilities between the LANs to form a private Intranet. For example, cloud hosting facilities may typically employ large data centers with a multitude of servers. A blade comprises a separate computing platform that is configured to perform server-type functions, that is, a “server on a card.” Accordingly, each blade includes components common to conventional servers, including a main printed circuit board (main board) providing internal wiring (e.g., buses) for coupling appropriate integrated circuits (ICs) and other components mounted to the board.
Various examples may be implemented using hardware elements, software elements, or a combination of both. In some examples, hardware elements may include devices, components, processors, microprocessors, circuits, circuit elements (e.g., transistors, resistors, capacitors, inductors, and so forth), integrated circuits, ASICs, PLDs, DSPs, FPGAs, memory units, logic gates, registers, semiconductor device, chips, microchips, chip sets, and so forth. In some examples, software elements may include software components, programs, applications, computer programs, application programs, system programs, machine programs, operating system software, middleware, firmware, software modules, routines, subroutines, functions, methods, procedures, software interfaces, APIs, instruction sets, computing code, computer code, code segments, computer code segments, words, values, symbols, or any combination thereof. Determining whether an example is implemented using hardware elements and/or software elements may vary in accordance with any number of factors, such as desired computational rate, power levels, heat tolerances, processing cycle budget, input data rates, output data rates, memory resources, data bus speeds and other design or performance constraints, as desired for a given implementation. It is noted that hardware, firmware and/or software elements may be collectively or individually referred to herein as “module,” or “logic.” A processor can be one or more combination of a hardware state machine, digital control logic, central processing unit, or any hardware, firmware and/or software elements.
Some examples may be implemented using or as an article of manufacture or at least one computer-readable medium. A computer-readable medium may include a non-transitory storage medium to store logic. In some examples, the non-transitory storage medium may include one or more types of computer-readable storage media capable of storing electronic data, including volatile memory or non-volatile memory, removable or non-removable memory, erasable or non-erasable memory, writeable or re-writeable memory, and so forth. In some examples, the logic may include various software elements, such as software components, programs, applications, computer programs, application programs, system programs, machine programs, operating system software, middleware, firmware, software modules, routines, subroutines, functions, methods, procedures, software interfaces, API, instruction sets, computing code, computer code, code segments, computer code segments, words, values, symbols, or any combination thereof.
According to some examples, a computer-readable medium may include a non-transitory storage medium to store or maintain instructions that when executed by a machine, computing device or system, cause the machine, computing device or system to perform methods and/or operations in accordance with the described examples. The instructions may include any suitable type of code, such as source code, compiled code, interpreted code, executable code, static code, dynamic code, and the like. The instructions may be implemented according to a predefined computer language, manner or syntax, for instructing a machine, computing device or system to perform a certain function. The instructions may be implemented using any suitable high-level, low-level, object-oriented, visual, compiled and/or interpreted programming language.
One or more aspects of at least one example may be implemented by representative instructions stored on at least one machine-readable medium which represents various logic within the processor, which when read by a machine, computing device or system causes the machine, computing device or system to fabricate logic to perform the techniques described herein. Such representations, known as “IP cores” may be stored on a tangible, machine readable medium and supplied to various customers or manufacturing facilities to load into the fabrication machines that actually make the logic or processor.
The appearances of the phrase “one example” or “an example” are not necessarily all referring to the same example or embodiment. Any aspect described herein can be combined with any other aspect or similar aspect described herein, regardless of whether the aspects are described with respect to the same figure or element. Division, omission or inclusion of block functions depicted in the accompanying figures does not infer that the hardware components, circuits, software and/or elements for implementing these functions would necessarily be divided, omitted, or included in embodiments.
Some examples may be described using the expression “coupled” and “connected” along with their derivatives. These terms are not necessarily intended as synonyms for each other. For example, descriptions using the terms “connected” and/or “coupled” may indicate that two or more elements are in direct physical or electrical contact with each other. The term “coupled,” however, may also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other.
The terms “first,” “second,” and the like, herein do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The terms “a” and “an” herein do not denote a limitation of quantity, but rather denote the presence of at least one of the referenced items. The term “asserted” used herein with reference to a signal denote a state of the signal, in which the signal is active, and which can be achieved by applying any logic level either logic 0 or logic 1 to the signal. The terms “follow” or “after” can refer to immediately following or following after some other event or events. Other sequences of steps may also be performed according to alternative embodiments. Furthermore, additional steps may be added or removed depending on the particular applications. Any combination of changes can be used and one of ordinary skill in the art with the benefit of this disclosure would understand the many variations, modifications, and alternative embodiments thereof.
Disjunctive language such as the phrase “at least one of X, Y, or Z,” unless specifically stated otherwise, is otherwise understood within the context as used in general to present that an item, term, etc., may be either X, Y, or Z, or any combination thereof (e.g., X, Y, and/or Z). Thus, such disjunctive language is not generally intended to, and should not, imply that certain embodiments require at least one of X, at least one of Y, or at least one of Z to each be present. Additionally, conjunctive language such as the phrase “at least one of X, Y, and Z,” unless specifically stated otherwise, should also be understood to mean X, Y, Z, or any combination thereof, including “X, Y, and/or Z.”
Illustrative examples of the devices, systems, and methods disclosed herein are provided below. An embodiment of the devices, systems, and methods may include any one or more, and any combination of, the examples described below.
Flow diagrams as illustrated herein provide examples of sequences of various process actions. The flow diagrams can indicate operations to be executed by a software or firmware routine, as well as physical operations. In some embodiments, a flow diagram can illustrate the state of a finite state machine (FSM), which can be implemented in hardware and/or software. Although shown in a particular sequence or order, unless otherwise specified, the order of the actions can be modified. Thus, the illustrated embodiments should be understood only as an example, and the process can be performed in a different order, and some actions can be performed in parallel. Additionally, one or more actions can be omitted in various embodiments; thus, not all actions are required in every embodiment. Other process flows are possible.
Various components described herein can be a means for performing the operations or functions described. Each component described herein includes software, hardware, or a combination of these. The components can be implemented as software modules, hardware modules, special-purpose hardware (e.g., application specific hardware, application specific integrated circuits (ASICs), digital signal processors (DSPs), etc.), embedded controllers, hardwired circuitry, and so forth.
Example 1 includes an apparatus comprising: a network device, when operational, to: receive memory access requests in a first number of connections from one or more front-end clients destined to a storage system and consolidate the memory access requests to a second number of connections between the network device and the storage system, wherein the second number is less than the first number.
Example 2 includes any example, wherein to consolidate the memory access requests to a second number of connections between the network device and the storage system, the network device, when operational, is to: combine read commands with other read commands destined to the storage system and combine write commands with other write commands destined to the storage system.
Example 3 includes any example, wherein to consolidate the memory access requests to a second number of connections between the network device and the storage system, the network device, when operational, is to: perform protocol conversion to a format accepted by a target storage system.
Example 4 includes any example, wherein the network device, when operational, is to: provide a storage interface to a front-end client for the first number of connections and provide a front-end client interface to the storage system for the second number of connections.
Example 5 includes any example, wherein to consolidate the memory access requests to a second number of connections between the network device and the storage system, the network device, when operational, is to: store a state of a received command from at least one of the one or more front-end clients for formation of a response to at least one of the one or more front-end clients.
Example 6 includes any example, wherein the network device, when operational, is to: receive a response to a memory access request from the storage system; selectively convert the response to a format consistent with a format of a received memory access request among the received memory access requests; transmit the response in at least one packet using a particular connection among the first number of connections, wherein the connection among the first number of connections is associated with the memory access request to which the response is received.
Example 7 includes any example, wherein the format comprises one or more of: Internet Small Computer Systems Interface (iSCSI), Nonvolatile Memory Express over Fabrics (NVMe-oF), or a key value (KV) format.
Example 8 includes any example, wherein the network device comprises one or more of: a switch, router, endpoint transmitter, or endpoint receiver.
Example 9 includes any example, wherein the network device comprises a P4 or C language programmable packet processing pipeline.
Example 10 includes any example, and includes a method comprising: receiving memory access requests in a first number of connections from one or more front-end clients destined to a storage system and consolidating the memory access requests to a second number of connections between a network device and the storage system, wherein the second number is less than the first number.
Example 11 includes any example, wherein consolidating the memory access requests to a second number of connections between a network device and the storage system comprises: combining read commands with other read commands destined to the storage system among connections of the first number of connections and combining write commands with other write commands destined to a same storage system among connections of the first number of connections.
Example 12 includes any example, wherein consolidating the memory access requests to a second number of connections between a network device and the storage system comprises: performing protocol conversion to a format accepted by the storage system.
Example 13 includes any example, and includes identifying read or write commands based on content of a header of a received packet, wherein the received packet includes a read or write command.
Example 14 includes any example, wherein consolidating the memory access requests to a second number of connections between a network device and the storage system comprises: storing a state of a received command from at least one of the one or more front-end clients for formation of a response to at least one of the one or more front-end clients.
Example 15 includes any example, and includes receiving a response to a memory access request from the storage system; selectively converting the response to a format consistent with a format of a received memory access request among the received memory access requests; and transmitting the response in at least one packet using a particular connection among the first number of connections, wherein the connection among the first number of connections is associated with the memory access request to which the response is received.
Example 16 includes any example, wherein the format comprises one or more of: Internet Small Computer Systems Interface (iSCSI), Nonvolatile Memory Express over Fabrics (NVMe-oF), or a key value (KV) format.
Example 17 includes any example, and includes configuring a network device to perform the consolidating using a P4 or C language configuration.
Example 18 includes any example, and includes a computer-readable medium comprising instructions stored thereon, that if executed by a data plane of a network device, cause the data plane to: combine memory access requests received on a first number of connections destined for a storage system to a second number of connections between a network device and the storage system, wherein the second number is less than the first number.
Example 19 includes any example, wherein to combine memory access requests received on a first number of connections destined for a storage system to a second number of connections between a network device and the storage system, wherein the second number is less than the first number, the data plane is to: combine read commands with other read commands received among connections of the first number of connections and combine write commands with other write commands received among connections of the first number of connections.
Example 20 includes any example, and includes instructions stored thereon, that if executed by a data plane of a network device, cause the data plane to: provide a storage interface to a front-end client for the first number of connections and provide a front-end client interface to the storage system for the second number of connections.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
November 18, 2025
March 12, 2026
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.