A switch includes at least two interface modules, a switching module, and an ingress processing module. The at least two interface modules include a first interface and a second interface. The first interface is configured to send and receive a first packet that is based on a first protocol, and the second interface is configured to send and receive a second packet that is based on a second protocol. The interface module is configured to obtain a to-be-forwarded packet, where the to-be-forwarded packet is one of the first packet and the second packet. The ingress processing module is configured to receive the to-be-forwarded packet from the interface module, and obtain first indication information corresponding to the to-be-forwarded packet, where the first indication information indicates a destination interface corresponding to the to-be-forwarded packet.
Legal claims defining the scope of protection, as filed with the USPTO.
receive a to-be-forwarded packet; and a first interface configured to send and receive a first packet based on the first protocol; and a second interface configured to send and receive a second packet based on the second protocol, and wherein the to-be-forwarded packet is the first packet or the second packet; send the to-be-forwarded packet based on a first protocol or a second protocol, wherein the at least two interfaces comprise: at least two interfaces configured to: receive the to-be-forwarded packet; obtain first indication information corresponding to the to-be-forwarded packet, wherein the first indication information indicates a destination interface in each of the at least two interfaces corresponding to the to-be-forwarded packet; and send the first indication information; and an ingress processing apparatus configured to: receive the to-be-forwarded packet and the first indication information; and send the to-be-forwarded packet to the destination interface based on the first indication information. a switching apparatus configured to: . A switch, comprising:
claim 1 . The switch of, wherein the ingress processing apparatus is further configured to connect to the at least two interfaces.
claim 1 receive the to-be-forwarded packet and the first indication information from the switching apparatus; and send the to-be-forwarded packet to one of the at least two interfaces, wherein one of the at least two interfaces comprises the destination interface. . The switch of, further comprising an egress processing apparatus configured to:
claim 3 . The switch of, wherein the egress processing apparatus is configured to connect to the at least two interfaces.
claim 3 . The switch of, wherein the first indication information indicates the destination interface corresponding to the to-be-forwarded packet.
claim 1 . The switch of, wherein the first protocol and the second protocol are a Compute eXpress Link (CXL) protocol or a unified buffer (UB) protocol.
receive a to-be-forwarded packet; and a first interface configured to send and receive a first packet based on the first protocol; and a second interface configured to send and receive a second packet based on the second protocol, and wherein the to-be-forwarded packet is the first packet or the second packet; send the to-be-forwarded packet based on a first protocol or a second protocol, wherein the at least two interfaces comprise: at least two interfaces and configured to: receive the to-be-forwarded packet; convert the to-be-forwarded packet into a third packet based on a configuration, wherein the third packet and the to-be-forwarded packet are based on different protocols; and send the third packet; a multi-protocol conversion apparatus configured to: receive the to-be-forwarded packet; obtain indication information corresponding to the to-be-forwarded packet, wherein the indication information indicates a destination interface in each of the at least two interfaces corresponding to the to-be-forwarded packet; and send the indication information; and an ingress processing apparatus configured to: receive the third packet and the indication information; and send the third packet to a corresponding destination interface of the at least two interfaces based on the indication information. a first switching apparatus configured to: . A switch, comprising:
claim 7 . The switch of, wherein the at least two interfaces are connected to the ingress processing apparatus, and wherein the ingress processing apparatus is connected to the multi-protocol conversion apparatus.
claim 7 . The switch of, wherein the interface apparatus is connected to the multi-protocol conversion apparatus, and wherein the multi-protocol conversion apparatus is connected to the ingress processing apparatus.
claim 7 . The switch of, further comprising a second switching apparatus connected to the ingress processing apparatus, wherein the interface apparatus is connected to the ingress processing apparatus and to the multi-protocol conversion apparatus.
claim 7 receive the to-be-forwarded packet and the indication information; and send the to-be-forwarded packet to one of the interface apparatus, wherein each of the at least two interfaces comprise the destination interface. . The switch of, further comprising an egress processing apparatus configured to:
claim 11 . The switch of, wherein the indication information indicates the destination interface corresponding to the to-be-forwarded packet.
claim 11 . The switch of, wherein the multi-protocol conversion apparatus is configured to receive the to-be-forwarded packet according to a first configuration of the ingress processing apparatus and a second configuration of the egress processing apparatus.
claim 7 . The switch of, wherein the first protocol and the second protocol are a Compute eXpress Link (CXL) protocol or a unified buffer (UB) protocol.
receive a to-be-forwarded packet; and a first interface configured to send and receive a first packet based on the first protocol; and a second interface configured to send and receive a second packet based on the second protocol, and wherein the to-be-forwarded packet is the first packet or the second packet; send the to-be-forwarded packet based on a first protocol or a second protocol, wherein the at least two interfaces comprise: at least two interfaces configured to: a switch comprising: receive the to-be-forwarded packet; obtain first indication information corresponding to the to-be-forwarded packet, wherein the first indication information indicates a corresponding destination interface of the at least two interfaces that corresponds to the to-be-forwarded packet; and send the first indication information; an ingress processing apparatus configured to: receive the to-be-forwarded packet and the first indication information; and send the to-be-forwarded packet to the destination interface of the at least two interfaces based on the first indication information; at least one server electrically connected to the switch; and at least one resource pool electrically connected to the switch, wherein the at least one resource pool comprises at least one of a plurality of memory resources, a plurality of hard disk resources, and a plurality of accelerator resources. a first switching apparatus configured to: . A switch cabinet, comprising:
claim 15 . The switch cabinet of, further comprising a multi-protocol conversion apparatus electrically connected to the switch and configured to receive the to-be-forwarded packet.
claim 16 convert the to-be-forwarded packet into a third packet based on a configuration, wherein the third packet and the to-be-forwarded packet are based on different protocols; and send the third packet. . The switch cabinet of, wherein the multi-protocol conversion apparatus is further configured to:
claim 16 . The switch cabinet of, further comprising an egress processing apparatus, wherein the multi-protocol conversion apparatus is configured to receive the to-be-forwarded packet according to a first configuration of the ingress processing apparatus and a second configuration of the egress processing apparatus.
claim 16 . The switch cabinet of, further comprising a second switching apparatus connected to the ingress processing apparatus, wherein the interface apparatus is connected to the ingress processing apparatus and to the multi-protocol conversion apparatus.
claim 15 receive the to-be-forwarded packet and the first indication information; and send the to-be-forwarded packet to a corresponding destination interface of the at least two interfaces. . The switch cabinet of, further comprising an egress processing apparatus configured to:
Complete technical specification and implementation details from the patent document.
This is a continuation of International Patent Application No. PCT/CN2024/101703 filed on Jun. 26, 2024, which claims priority to Chinese Patent Application No. 202310814632.0 filed on Jul. 4, 2023. The disclosures of the aforementioned applications are hereby incorporated by reference in their entireties.
Embodiments of this disclosure relate to the field of electronic information, and in particular, to a switch, a switch cabinet, and a data switching method.
With popularization of cloud computing and high-performance computing, more applications and data are migrated to data centers. Services and an amount of data processed by the data centers grow explosively, and requirements on computing power of the data centers are increasingly high. However, due to limitations of Moore's Law and slowdown in processor performance growth, a domain specific architecture (DSA) gradually becomes a trend, and more heterogeneous processors/accelerators are introduced into the data centers. In addition, based on a memory wall, both a capacity and a bandwidth of a memory of a single server become bottlenecks for performance of a high-performance processor. Therefore, a disaggregated data center is promoted. In an example, key resources such as a heterogeneous processor, an accelerator, and a memory are pooled, to meet continuously growing computing power and memory requirements. In this architecture, a communication delay and a transmission bandwidth between different types of resources limit performance of the high-performance processor.
This disclosure provides a switch, a switch cabinet, and a data switching method, to optimize networking in the switch cabinet and improve resource utilization in the cabinet.
According to a first aspect, this disclosure provides a switch, for example including at least two interface modules, a switching module, and an ingress processing module. The at least two interface modules include a first interface and a second interface. The first interface is configured to send and receive a first packet that is based on a first protocol, and the second interface is configured to send and receive a second packet that is based on a second protocol. The interface module is configured to obtain a to-be-forwarded packet, where the to-be-forwarded packet is one of the first packet and the second packet. The ingress processing module is configured to receive the to-be-forwarded packet from the interface module, and obtain first indication information corresponding to the to-be-forwarded packet, where the first indication information indicates a destination interface corresponding to the to-be-forwarded packet. The switching module is configured to receive the to-be-forwarded packet and the first indication information from the ingress processing module, and send the to-be-forwarded packet to the destination interface based on the first indication information.
A plurality of interfaces supporting different protocols, the ingress processing module supporting switching of at least one protocol, and a switching module supporting different protocols are configured for the switch, to receive, process, switch, and forward packets of a plurality of different protocols, so as to reduce costs and power consumption of a switch device. In addition, local resources (such as a memory, a hard disk, and an accelerator) that are originally configured in a server and that overflow may be configured outside a cabinet as pooled resources, and data transmission is implemented through a bus and a top-of-rack (ToR) switch that supports switching of a plurality of protocols. This further reduces system costs and power consumption, and also removes a limitation that the local resources are used by only the server, thereby improving resource utilization, implementing flexible resource configuration, and implementing a plurality of new services.
In an implementation, the ToR switch further includes an egress processing module, configured to receive the to-be-forwarded packet and the first indication information from the switching module, and send the to-be-forwarded packet to the interface module. The interface module includes the destination interface.
In an implementation, the ingress processing module in the switch is configured to connect to at least two interfaces, where the at least two interfaces include the first interface and the second interface. The ingress processing module is enabled to work in different protocols, to reduce a quantity of ingress processing modules that are necessary in the switch, so as to reduce system costs and power consumption.
In an implementation, the egress processing module in the switch is configured to connect to at least two interfaces, where the at least two interfaces include the first interface and the second interface. The egress processing module is enabled to work in different protocols, to reduce a quantity of egress processing modules that are necessary in the switch, so as to reduce system costs and power consumption.
According to a second aspect of this disclosure, a switch is provided, and includes at least two interface modules, a first switching module, an ingress processing module, and a multi-protocol conversion module. The at least two interface modules include a first interface and a second interface. The first interface is configured to send and receive a first packet that is based on a first protocol, and the second interface is configured to send and receive a second packet that is based on a second protocol. The interface module is configured to obtain a to-be-forwarded packet, where the to-be-forwarded packet is one of the first packet and the second packet. The multi-protocol conversion module is configured to receive the to-be-forwarded packet, and convert the to-be-forwarded packet into a third packet based on a configuration, where the third packet and the to-be-forwarded packet are packets based on different protocols. The first switching module is configured to receive the third packet and corresponding indication information, and send the received third packet to the destination interface based on the indication information. The indication information is generated by the ingress processing module and indicates the destination interface corresponding to the received packet, and the ingress processing module is configured to obtain the indication information corresponding to the received packet.
A plurality of interfaces supporting different protocols, an ingress processing module supporting switching of at least one protocol, a switching module supporting one protocol, and a multi-protocol conversion module used for protocol conversion are configured for the switch, to implement conversion between intra-cabinet protocols and inter-cabinet different protocols, and cancel deployment of a multi-protocol conversion module on each server, so as to reduce system costs and power consumption. In addition, a new networking manner in the cabinet can be updated through the switch, so that the server can access a plurality of pooled resources in the cabinet by using only one bus protocol, including but not limited to a memory, a hard disk, and an accelerator. This further reduces system interconnection costs and power consumption.
In an implementation, the interface module in the switch is connected to the ingress processing module, and the ingress processing module is connected to the multi-protocol conversion module.
In an implementation, the interface module in the switch is connected to the multi-protocol conversion module, and the multi-protocol conversion module is connected to the ingress processing module.
In an implementation, the switch further includes a second switching module. The interface module is connected to the ingress processing module, the ingress processing module is connected to the second switching module, and the second switching module is connected to the multi-protocol conversion module.
In an implementation, the switch further includes an egress processing module, configured to receive the to-be-forwarded packet and the first indication information from the switching module, and send the to-be-forwarded packet to the interface module. The interface module includes the destination interface.
According to a third aspect of this disclosure, a switch cabinet is provided, and includes the switch according to the first aspect, at least one server electrically connected to the switch, and at least one resource pool electrically connected to the switch. The resource pool includes at least one of a plurality of memory resources, a plurality of hard disk resources, and a plurality of accelerator resources.
In an implementation, the switch cabinet further includes a multi-protocol conversion apparatus, and the multi-protocol conversion apparatus is electrically connected to the switch.
According to a fourth aspect of this disclosure, a switch cabinet is provided, and includes the switch according to the second aspect, at least one server electrically connected to the switch, and at least one resource pool electrically connected to the switch. The resource pool includes at least one of a plurality of memory resources, a plurality of hard disk resources, and a plurality of accelerator resources.
To make the objectives, technical solutions, and advantages of this disclosure clearer, the following describes the technical solutions in this disclosure with reference to the accompanying drawings in this disclosure. It is clear that the described embodiments are a part rather than all of embodiments of this disclosure. All other embodiments obtained by a person of ordinary skill in the art based on embodiments of this disclosure without creative efforts shall fall within the protection scope of this disclosure.
In the specification, embodiments, claims, and accompanying drawings of this disclosure, the terms “first”, “second”, and the like are merely intended for distinguishing and description, and shall not be understood as indicating or implying relative importance, or indicating or implying a sequence. Moreover, the terms “include”, “have”, and any other variant thereof are intended to cover a non-exclusive inclusion, for example, including a series of steps or units. For example, a method, system, product, or device is not necessarily limited to those steps or units expressly listed, but may include other steps or units not expressly listed or inherent to such a process, method, product, or device.
It should be understood that in this disclosure, “at least one piece (item)” refers to one or more and “a plurality of” refers to two or more. The term “and/or” is used to describe an association relationship between associated objects, and indicates that three relationships may exist. For example, “A and/or B” may indicate the following three cases: Only A exists, only B exists, and both A and B exist, where A and B may be singular or plural. The character “/” generally indicates an “or” relationship between the associated objects. The expression “at least one of the following items (pieces)” or a similar expression means any combination of these items, including a single item (piece) or any combination of a plurality of items (pieces). For example, at least one of a, b, or c may indicate a, b, c, a and b, a and c, b and c, or a, b, and c, where a, b, and c may be singular or plural.
Rapid development of technologies such as cloud computing, big data, and artificial intelligence poses higher requirements on data center networks that carry data traffic. Requirements of services on the data center networks include high throughput, high reliability, low latency, and adaptation to server virtualization. To meet network requirements of services, more enterprises choose to build their own data centers or rent public clouds to carry increasing service traffic. A modern data center is equipped with a physical rack in which at least one physical server is mounted. Each physical server is deployed with at least a central processing unit (CPU), and is further deployed with localized resources, including a memory, a solid-state drive (SSD), an accelerator (for example, including a graphics processing unit (GPU), a network interface card, and the like.
Because a scale of a current data center is increased, a plurality of physical racks may be placed in a cabinet, and physical servers in the plurality of physical racks may implement connection and communication through switches. Based on deployment locations of the switches in the cabinet (which may also be understood as different access manners of a server), a physical architecture of the switches is generally classified into two types such as ToR and end-of-row (EoR)/middle-of-row (MoR).
ToR means that one or two switches are deployed in each server cabinet, and a server is directly connected to a switch in the cabinet to implement interconnection between the server and the switch in the cabinet. Generally, deploying switches on the top of the cabinet facilitates cabling. In an example, this architecture is most widely used.
In contrast, the EoR architecture provides a unified network access point at the end of each row of cabinets. The MoR architecture is an improvement of the EoR architecture, and also provides a unified network access cabinet for a server. However, the MoR architecture requires placing the network cabinet in the middle of an entire row of cabinets. This shortens a distance between a server cabinet and the network cabinet to some extent, and simplifies cable management and maintenance.
1 FIG. 100 100 102 104 100 106 108 102 102 104 104 provides an architecture of a cabinet. The cabinetincludes a plurality of serversand at least one ToR switchinside the cabinet. A CPUand a local resourcedeployed on each server are used by only the serverand cannot be shared. The plurality of serversmay be connected to the ToR switchaccording to an Ethernet protocol. Further, the ToR switchmay be connected to a spine switch (not shown) through an Ethernet interface, to establish a data center network and meet a larger scale requirement.
102 106 110 112 114 104 102 101 108 101 106 114 112 116 106 108 118 1 FIG. Each serverinis generally configured with at least one CPU, a memory (for example, a double data rate dynamic random-access memory (DDR DRAM), a solid-state drive (SSD), and an accelerator GPU. External data may be transmitted from the switchto the serverthrough the Ethernet interface, and further transmitted to different localized resourcesthrough an internal bus and different interfaces of the server for processing or storage. For example, the Ethernet interfacemay be connected to the CPU, the GPU, and the SSDthrough a Peripheral Component Interconnect Express (PCIe) bus. When performing data processing, the CPUstores data in the memorythrough a double data rate (DDR) input/output (I/O) interface.
Because a ratio of a computing capability to a localized resource in a physical server is fixed during delivery or configuration of the physical server, resources may be configured based on a maximum capacity of a related service preset during delivery or configuration. In an example, not only a resource waste may occur, but also when a new service proposes a larger capacity requirement, a single server cannot provide sufficient resources to meet the requirement of the new service. New services include a REmote Dictionary Server (Redis), a database, artificial intelligence (AI) computing, and the like. These new services usually have higher requirements on a memory capacity and a hard disk capacity.
2 FIG. 200 200 204 202 206 208 210 208 210 200 204 200 Based on the foregoing service requirement change, the problem may be resolved by adding an interface to the server and implementing resource pooling in the cabinet.shows an architecture of a cabinetin which some resources are shared. The cabinetincludes a plurality of servers, at least one ToR switch, at least one resource switch, a memory pool, and an SSD poolinside. The memory pooland the SSD poolin the cabinetare shared by all the serversin the cabinet.
202 201 The ToR switchimplements communication between the plurality of servers, and may be connected to a spine switch through an Ethernet interfaceto establish a data center network and meet a larger scale requirement.
206 204 208 210 The resource switchimplements data exchange between the serverand the memory poolor the SSD poolby supporting communication protocols such as Compute eXpress Link (CXL) or unified buffer (UB).
204 202 206 212 218 216 204 216 208 200 204 Each serveris configured with an Ethernet interface to implement communication with the ToR switch, and is further configured with at least one interface to implement data transmission with the resource switchaccording to the communication protocols such as CXL or UB. In addition, a CPUand some local resources such as a GPU, a network interface card, and a local dynamic random-access memory (DRAM)are deployed in the server. The local DRAMand the memory poolin the cabinetmay form a memory structure with different tiers. These local resources are used by only the serverand cannot be shared.
With development of an internet, data explosion occurs, and virtualization technologies develop. Cloud computing and high-performance computing become more popular. As a result, greater challenges are posed to computing power of data centers, and further optimization is required for a layout and resources in an existing cabinet. In an example, a new switch architecture is urgently needed to improve resource utilization in the cabinet, optimize an operation and switching rate, and meet an evolving service requirement.
3 FIG. 300 304 302 300 308 310 306 302 provides an architecture of a cabinet. A plurality of servers, at least one switch, and a plurality of resource pools are disposed in the cabinet. The plurality of resource pools includes but are not limited to a memory pool(for example, a DDR DRAM pool), a hard disk pool(for example, an SSD pool), an accelerator pool(for example, a GPU pool), and a network interface card. The switchmay be a ToR switch, an EoR switch, or an MoR switch, and is used for intra-cabinet communication and data communication between a cabinet and the outside of the cabinet. The following uses the ToR switch as an example.
304 300 308 310 306 304 300 The ToR switch is configured with an internal switching interface, and the internal switching interface supports an internal data interconnection protocol, for example, including a CXL protocol and a UB protocol. Through the internal switching interface, the ToR switch interconnects the serversin the cabinetwith various resource pools, so that the memory pool, the hard disk pool, the accelerator pool, the network interface card, and the like are shared by all the serversin the cabinet.
A relationship between a service requirement and an upper limit of a configured resource capacity of a server at delivery or configuration is decoupled by pooling and centralizing local resource pools originally in the server to resource pools in a rack, and enabling the TOR switch to connect the server to the resource pools, to improve resource utilization and meet flexible configuration of computing resources for various services. In addition, due to transfer of the local resource in the server, a type of CPU bus interconnection in the server can be reduced, to reduce system costs and power consumption.
Further, the ToR switch is further configured with a standard Ethernet interface, and may be connected to a spine switch outside the cabinet through the standard Ethernet interface, to establish a data center network and meet a larger scale requirement.
304 300 304 312 The serverin the cabinetis also configured with an internal switching interface, and the internal switching interface supports an internal data interconnection protocol, for example, including a CXL protocol and a UB protocol. In addition, the serverfurther includes a CPU, and is connected to the ToR switch through the internal switching interface.
304 Optionally, a memory resource is further configured inside the server, and a two-layer memory architecture is implemented by using a local memory resource and a shared memory pool in the rack, to balance performance and a capacity, so that a service that requires a large memory can be resolved at low costs.
4 FIG. 400 1 2 3 4 1 2 4 2 3 To implement the foregoing rack structure,provides an architecture of a ToR switch, which includes a multi-function interface module, an ingress processing unit, an egress processing unit, and a multi-protocol switching unit. The multi-function interface moduleis connected to at least one ingress processing unitand one downlink packet processing unit, and the multi-protocol switching unitis connected to all ingress processing unitsand egress processing units.
1 1 1 1 1 1 The multi-function interface moduleis also referred to as a multi-function integrated facility management (MIFM) module. The multi-function interface moduleincludes a media access control (MAC) layer, a physical coding sublayer (PCS), and a physical layer (PHY), and supports at least one intra-rack data interconnection protocol, including but not limited to a standard Ethernet protocol, a standard CXL protocol, a standard UB protocol, or another bus protocol. The multi-function interface modulemay support a plurality of data interconnection protocols. After configuration is completed, the multi-function interface moduleworks in one of the plurality of data interconnection protocols. In other words, one multi-function interface modulereceives and/or sends packets that are based on a same protocol at the same time. For example, a first MIFM modulemay be configured to work in the standard Ethernet protocol, and may be connected to a spine switch outside the cabinet. A second MIFM module may be configured to work in the standard CXL protocol, and may be connected to the server in the cabinet. Data switching in the memory pool is based on the standard CXL protocol.
1 2 3 Each multi-function interface moduleis connected to one ingress processing moduleand one egress processing module.
2 1 1 2 3 2 2 1 1 2 An ingress processing (IP) moduleis connected to at least one multi-function interface module, is used for processing, including but is not limited to processing such as data aggregation, parsing, table lookup, route selection, editing and forwarding, on an uplink packet from the multi-function interface module, and supports at least one intra-rack data forwarding protocol, including but not limited to the standard Ethernet protocol (including a standard Ethernet layer/layerforwarding protocol and standard Ethernet tunnel processing), the standard CXL protocol, the standard UB protocol, or another bus protocol. Preferably, the ingress processing modulemay support a plurality of data forwarding protocols, and may support forwarding of packets a plurality of protocols at the same time. When the IP moduleis connected to two or more MIFM modules, the two or more MIFM modulesmay be configured to work in different data interconnection protocols. In an example, the IP modulemay receive and/or send packets of a plurality of protocols at the same time.
3 1 1 2 3 3 An egress processing (EP) moduleis connected to at least one multi-function interface module, is used for processing, including but is not limited to processing such as data parsing, table lookup, encapsulation, editing, and distribution, on a downlink packet from the multifunction interface module, and supports at least one intra-rack data forwarding protocol, including but not limited to the standard Ethernet protocol (including the standard Ethernet layer/layerforwarding protocol and standard Ethernet tunnel processing), the standard CXL protocol, the standard UB protocol, or another bus protocol. Similar to the ingress processing module, the egress processing modulemay also support a plurality of data forwarding protocols, that is, support forwarding of packets a plurality of protocols at the same time.
4 2 3 2 3 2 4 The multi-protocol switching (MFS) moduleis connected to all the ingress processing modulesand egress processing modules, and performs switching processing, including but not limited to operations such as data buffering, switching, enqueue, scheduling, and quality of service (QoS) management, on the packets the plurality of protocols from the ingress processing module. The plurality of supported protocols includes but are not limited to a standard Ethernet protocol (including a standard Ethernet layer/layerswitching protocol and standard Ethernet tunnel switching), a standard CXL protocol, a standard UB protocol, or another bus protocol. Similar to the ingress processing module, the MFS modulealso supports switching of packets a plurality of protocols at the same time.
400 1 1 3 2 4 400 400 The ToR switchis configured with a plurality of interfaces that support different protocols (for example, a first interface that is in the MIFM moduleand that supports the standard Ethernet protocol and a second interface in the MIFM moduleand that supports the standard CXL protocol), an ingress processing module, an egress processing module, and a multi-protocol switching modulethat support switching of a plurality of types of protocols. In an example, the ToR switchcan receive, process, switch, and forward packets of a plurality of different protocols, without requiring at least two switches that separately support communication of different protocols. This reduces costs and power consumption of a switch device. The server in the cabinet can not only receive and send conventional packet data based on the standard Ethernet protocol, but also store, in a pooled resource in the cabinet, packet data based on the standard CXL protocol or the standard UB protocol. In addition, local resources (such as a memory, a hard disk, and an accelerator) that are originally configured in a server and that overflow may be configured outside a cabinet as pooled resources, and data transmission is implemented through a bus and a ToR switchthat supports switching of a plurality of protocols. This further reduces system costs and power consumption, and also removes a limitation that the local resources are used by only the server, thereby improving resource utilization, implementing flexible resource configuration, and implementing a plurality of new services.
400 Optionally, the ToR switchmay be further connected to a multi-protocol converter, for example, a network interface card, configured to implement fast protocol conversion of a packet, for example, convert a packet based on the standard Ethernet protocol into a packet based on the standard CXL protocol, to improve packet switching efficiency and expand a storage acceleration service.
5 FIG. 4 FIG. provides a packet processing procedure of the ToR switch in.
1 Step: Receive a data stream through a serializing/deserializing circuitry (SerDes).
The data stream is a data stream based on a first protocol, and the first protocol is one of a plurality of protocols supported by the multi-function interface module and the multi-protocol switching module in the TOR switch. The ToR switch is connected to the server or a local pooled resource, including but is not limited to a memory pool, a storage pool, an accelerator pool, and the like, in the cabinet through the SerDes.
For example, the ToR switch receives a data stream from the server in the cabinet, and needs to store the data in an SSD pool. The data stream may be a data stream based on a standard CXL protocol.
2 Step: The multi-function interface module receives the data stream, generates a first packet, and sends the first packet to the ingress processing module.
In an example, operations performed by the MIFM module on the data stream include decoding, error correction, framing, error check, retransmission, and the like. For example, when the MIFM module is configured to work in a standard CXL mode, the first packet is a standard CXL packet.
3 Step: The ingress processing module receives the first packet, and forwards the first packet and control information to the multi-protocol switching module.
In an example, the IP module performs operations including packet parsing, table lookup, forwarding, editing, and the like on the first packet, and determines, based on a prestored routing table, a destination MIFM module corresponding to the first packet, and information related to the destination MIFM module is carried in the control information.
For example, when the first packet is a standard CXL packet, the IP module works in a standard CXL mode, to process the first packet.
In a possible implementation, the IP module is connected to a plurality of MIFM modules, and switches a working mode based on a received packet, to process packets of different protocols.
4 Step: The multi-protocol switching module receives the first packet and the control information, and switches the first packet to the corresponding destination MIFM module based on the control information.
In a possible implementation, the multi-protocol switching module stores the first packet in a built-in cache, stores the control information in a corresponding queue based on the destination MIFM module that is of the first packet and that is indicated by the control information, then schedules the control information from the queue, reads the first packet from the cache, and switches the first packet and the control information to the corresponding destination MIFM module together.
For example, when the first packet is a standard CXL packet, the MFS module works in a standard CXL mode, to forward the first packet.
5 Step: The egress processing module receives the first packet and the control information from the multi-protocol switching module, and sends the first packet to the connected destination MIFM module based on the control information.
In an example, operations performed by the EP module on the first packet include data parsing, table lookup, encapsulation, editing, distribution, and the like.
For example, when the first packet is a standard CXL packet, the EP module works in a standard CXL mode, to process the first packet.
In a possible implementation, the EP module switches a working mode based on received packets and control information, to process packets of different protocols, and is further connected to a plurality of MIFM modules, and forwards a packet to a corresponding destination MIFM module based on control information from the MFS module.
6 Step: The destination MIFM module receives the first packet and the control information, and sends the data stream to the outside through the SerDes.
In an example, processing performed by the MIFM module on the first packet includes processing such as segmentation and encoding.
For example, the destination MIFM module works in a standard CXL mode, and is connected to an SSD pool. The ToR switch completes data switching between the server and the SSD pool.
According to the foregoing processing procedure, the ToR switch completes processing of a data stream from a server or a resource pool in the cabinet, and forwards the data stream to another server or another resource pool in the cabinet.
Further, the ToR switch may be further connected to a network interface card module, and the network interface card module implements multi-protocol conversion on a packet. For example, the network interface card module is connected to at least two MIFM modules of the ToR switch, including a first MIFM module that works in a standard Ethernet mode and a second MIFM module that works in a standard CXL mode. When the ToR switch switches data with a spine switch outside the cabinet by using a standard Ethernet packet, and the ToR switch switches data with the server in the cabinet by using a standard CXL packet, the network interface card module may convert a standard Ethernet packet from the spine switch outside the cabinet into a standard CXL packet, and forward the standard CXL packet to the server.
6 FIG. 4 FIG. 6 FIG. 6 FIG. 600 1 2 3 4 5 600 1 2 3 4 provides an architecture of another ToR switch. Similar to the switch architecture in, the switch architecture inalso includes a multi-function interface module, an ingress processing module, an egress processing module, and a multi-protocol switching module. In addition, a multi-protocol conversion moduleis added to implement processing of the ToR switchon packets based on a plurality of protocols. In the switch architecture in, the multi-function interface module, the ingress processing module, and the egress processing modulesupport only one of the plurality of protocols, and this is not changed after configuration is completed. The multi-protocol switching unitsupports switching of the plurality of protocols. The plurality of protocols includes but are not limited to a standard Ethernet protocol, a standard CXL protocol, a standard UB protocol, or another bus protocol.
5 600 The newly added multi-protocol conversion (CFB) moduleis configured to convert a packet that is based on a first protocol into a packet that is based on a second protocol, to implement communication between different bus data transmission through only one ToR switch, for example, including processing such as data buffering, protocol parsing, protocol conversion, out-of-order rearrangement, segmentation and reassembly, data check, and data retransmission.
5 5 The multi-protocol conversion moduleis integrated into the switch to implement conversion between intra-cabinet protocols and inter-cabinet different protocols, and a multi-protocol conversion moduledoes not need to be deployed on each server. This reduces system costs and power consumption. In addition, a new networking manner in the cabinet can be updated through the switch, so that the server can access a plurality of pooled resources in the cabinet by using only one bus protocol, including but not limited to a memory, a hard disk, and an accelerator. This further reduces system interconnection costs and power consumption.
5 2 3 5 5 2 3 2 3 5 5 2 3 5 In an implementation, the multi-protocol conversion moduleis connected to the ingress processing moduleand/or the egress processing module. The multi-protocol conversion modulemay be referred to as a distributed multi-protocol conversion module (DCFB). Optionally, each distributed multi-protocol conversion moduleis connected to one ingress processing moduleand one egress processing module. That is, there are N groups of ingress processing modulesand egress processing modules, and correspondingly, N distributed multi-protocol conversion modulesneed to be configured. Optionally, at least one distributed multi-protocol conversion modulemay be configured for the N groups of ingress processing modulesand egress processing modulesbased on a preconfiguration. The DCFB modulesupports conversion of packets of a plurality of protocols at the same time.
5 1 2 3 1 5 2 3 5 1 5 2 5 3 1 Optionally, each DCFB moduleis connected in series between an MIFM moduleand a group of an IP moduleand an EP module. In a possible implementation, the MIFM moduleconnected to the DCFB moduleworks in a first protocol, and the group of the IP moduleand the EP moduleconnected to the DCFB moduleworks in a second protocol. For a packet that is received from the MIFM moduleand that is based on the first protocol, the DCFB modulefirst completes protocol conversion, and then sends a packet based on the second protocol to the IP modulefor forwarding-related processing. Correspondingly, the DCFB modulemay further receive a packet from the EP module, and send a packet to the MIFM moduleafter completing protocol conversion.
5 2 3 4 2 4 4 5 4 3 3 5 Optionally, each DCFB moduleis connected in series between a group of an IP moduleand an EP moduleand an MFS module. A packet sent by the IP moduleto the MFS moduleis sent to the MFS moduleafter protocol conversion by the DCFB module. A packet sent by the MFS moduleto the EP moduleis sent to the EP moduleafter protocol conversion by the DCFB module.
1 2 3 2 3 5 Optionally, each MIFM moduleis connected to a group of an IP moduleand an EP module, and the IP moduleand the EP moduleare connected to a corresponding DCFB module.
5 2 3 5 For example, the DCFB modulemay include the following several types such as a network interface card (NIC), a smart network interface card (Smart NIC), a data processing unit (DPU), or the like. When the IP moduleor the EP moduleis connected to a plurality of servers, preferably, the DCFB modulesupports a multi-host (Multi-Host) mode, and can be shared by the plurality of servers.
4 6 In an implementation, the multi-protocol conversion module is connected to only the MFS module, and the multi-protocol conversion module may be referred to as a centralized multi-protocol conversion module (CCFB).
4 2 4 6 6 4 4 1 After a packet is sent to the MFS modulethrough the IP module, the MFS modulesends the packet on which protocol conversion needs to be performed to the CCFB module. After completing protocol conversion of the packet, the CCFB modulesends the packet back to the MFS module. Finally, the MFS moduleswitches the packet to a destination MIFM module.
6 2 3 6 For example, the CCFB modulemay include the following several types: a macro NIC, a macro Smart NIC, and a macro DPU. When the IP moduleor the EP moduleis connected to a plurality of servers, preferably, the CCFB modulesupports a multi-host mode, and can be shared by the plurality of servers.
5 6 5 6 5 6 Optionally, both the DCFB moduleand the CCFB modulemay be configured to implement conversion between multi-protocol packets. In an example, packets of different interface modules may be configured to be processed by the DCFB moduleor the CCFB modulebased on different service requirements. For example, the DCFB moduleis configured to perform packet conversion of a simple service with a large bandwidth, and the CCFB moduleis configured to perform packet conversion of a complex service.
5 5 4 2 3 A ToR switch configured with a DCFB moduleis used as an example. The DCFB moduleis configured to be connected between an MFS moduleand a group of an IP moduleand an EP module. In this case, a packet processing procedure of the ToR switch is as follows.
1 Step: Receive a data stream through a SerDes.
2 1 Step: An MIFM moduleimplements processing such as decoding, error correction, framing, error check, and retransmission on the data stream, generates a first packet, and sends the first packet.
301 2 1 Step: The IP moduleperforms processing such as packet parsing, table lookup, forwarding, and editing on the received first packet, determines, based on a prestored routing table, a destination MIFM modulecorresponding to the received packet, and sends the first packet and control information.
302 5 4 Step: The DCFB moduledetermines, based on a preconfiguration, that protocol conversion needs to be performed on the first packet, converts the first packet based on a first protocol into a second packet based on a second protocol, and sends the second packet to a multi-protocol switching module.
4 4 1 Step: The MFS modulereceives the second packet and the control information, and switches the packet to a corresponding destination MIFM modulebased on the control information.
5 1 Step: An egress processing module receives the second packet, performs operations such as data parsing, table lookup, encapsulation, editing, and distribution, and sends the first packet to the connected destination MIFM module.
3 1 3 1 4 In a possible implementation, the egress processing moduleis connected to a plurality of MIFM modules, and the egress processing moduleforwards the first packet to the destination MIFM modulebased on control information sent by the multi-protocol switching module.
6 1 Step: The destination MIFM modulereceives the packet, completes processing such as segmentation and encoding, and then sends the recovered data stream to the outside through the SerDes.
5 1 2 3 Optionally, the DCFB modulemay alternatively be configured between the MIFM moduleand a group of an IP moduleand an EP module.
6 6 4 4 6 6 6 4 4 Optionally, the multi-protocol conversion module may alternatively be a CCFB module, and the CCFB moduleis connected to the MFS module. After receiving a packet, the MFS modulefirst sends all received packets to the CCFB module, or sends some packets to the CCFB modulebased on a configuration. The CCFB modulereceives the packet from the MFS module, and sends a converted packet back to the MFS moduleafter completing protocol conversion.
2 3 2 3 2 3 5 1 1 In an implementation, the IP moduleand the EP modulemay be configured to be in a single-protocol mode, and a single protocol supported by the IP moduleand a single protocol supported by the EP moduleinclude but are not limited to one of a standard Ethernet protocol, a standard CXL protocol, a standard UB protocol, or another bus protocol. A protocol mode supported by the IP moduleand a protocol mode supported by the EP moduleshould be configured based on a location of the DCFB module, the MIFM module, and a type of an external resource connected to the MIFM module.
1 5 1 2 3 2 3 5 For example, the MIFM moduleis connected to an SSD pool that supports the standard CXL protocol. When the DCFB moduleis configured between the MIFM moduleand the IP moduleor the EP module, the IP moduleor the EP modulemay be configured to support the standard Ethernet protocol, and the DCFB moduleconverts a packet from a packet based on the standard Ethernet protocol to a packet based on the standard CXL protocol, to implement packet forwarding.
1 5 4 2 3 2 3 2 3 1 For example, the MIFM moduleis connected to an SSD pool that supports the standard CXL protocol. When the DCFB moduleis configured between the MFS moduleand the IP moduleor the EP module, the IP moduleor the EP modulemay be configured to support the standard CXL protocol, to prevent a packet forwarding failure caused by a mismatch between a protocol configured for the IP moduleor the EP moduleand a protocol for the connected MIFM module.
600 According to the foregoing processing procedure, the ToR switchcompletes processing of a data stream from a server or a resource pool in the cabinet, and forwards the data stream to another server or another resource pool in the cabinet.
7 FIG. 700 1 2 3 41 5 6 42 41 42 5 6 700 provides an architecture of another ToR switch. In the switch architecture, not only the MIFM module, the IP module, the EP module, the MFS module, and the multi-protocol conversion module/exist, but also another MFS moduleexists. In the architecture, the multi-protocol conversion module is disposed between the two MFS modulesand, so that protocol conversion is implemented on a packet between two times of switching performed by the two MFS modules/, and the ToR switchcan support switching of a multi-protocol packet. In this architecture, the modules (except the multi-protocol conversion module) can support only one type of protocol switching. This reduces requirements on components of the modules and reduces costs.
5 5 41 42 5 41 42 41 42 5 In an implementation, the multi-protocol conversion module may be at least one DCFB module, and the DCFB moduleis disposed between the two MFS modulesandin parallel. The DCFB modulereceives a packet from an MFS moduleandon either side, and sends the packet to an MFS moduleandon the other side after completing protocol conversion. A quantity of disposed DCFB modulesmay be determined based on a total service quantity or a total interface quantity.
6 6 6 6 41 42 41 42 6 In another implementation, the multi-protocol conversion module may be at least one CCFB module. When there is a plurality of CCFB modules, the CCFB modulesare disposed between the two MFS modules in parallel. The CCFB modulereceives a packet from an MFS moduleandon any side, and sends the packet to another MFS moduleandafter completing protocol conversion. A quantity of disposed CCFB modulesmay be determined based on a total service quantity or a total interface quantity.
A packet processing procedure of the ToR switch is as follows.
1 Step: Receive a data stream through a SerDes.
2 1 Step: A multi-function interface MIFM moduleimplements processing such as decoding, error correction, framing, error check, and retransmission on the data stream, generates a first packet, and sends the first packet to an ingress processing module.
3 2 1 41 42 Step: The IP moduleperforms processing such as packet parsing, table lookup, forwarding, and editing on the first packet, determines, based on a prestored routing table, a destination MIFM modulecorresponding to the first packet, and sends the first packet and control information to a first MFS moduleand.
2 1 In a possible implementation, the ingress processing moduleis connected to a plurality of MIFM modules.
4 41 42 Step: After receiving the first packet and the control information, the first MFS moduleanddetermines a switching path, and sends the first packet and the control information to a multi-protocol conversion module corresponding to the switching path.
41 42 The first MFS moduleandworks in a first protocol corresponding to the first packet.
5 41 42 Step: The multi-protocol conversion module converts the first packet based on the first protocol into a second packet based on a second protocol, and sends the second packet to a second MFS moduleand.
6 41 42 1 Step: The second MFS moduleandswitches the second packet to a corresponding destination MIFM modulebased on the control information.
41 42 The second MFS moduleandworks in the second protocol corresponding to the second packet.
7 3 Step: The egress processing modulereceives the second packet from a multi-protocol switching module, performs operations such as data parsing, table lookup, encapsulation, editing, and distribution, and sends the second packet to the connected destination MIFM module.
8 1 Step: The destination MIFM modulereceives the second packet, completes processing such as segmentation and encoding, and then sends the recovered data stream to the outside through the SerDes.
8 FIG. 800 2 3 5 6 5 6 4 provides an architecture of another ToR switch. In the switch architecture, functions of an IP moduleand an EP moduleare combined into a multi-protocol switching module/. In an example, in addition to implementing inter-protocol conversion, the multi-protocol switching module/needs to implement processing, including but not limited to processing such as data aggregation, parsing, table lookup, route selection, editing and forwarding, on an uplink packet. In addition, the multi-protocol switching module needs to implement processing, including but not limited to processing such as data parsing, table lookup, encapsulation, editing, and distribution, on a downlink packet. In the architecture, the MFS modulesupports only one protocol switching. In an example, requirements on components of the MFS module can be reduced, and costs and component requirements of the switch can be reduced.
2 3 5 6 1 4 2 3 5 6 1 4 1 2 3 Optionally, all functions of the IP moduleand the EP moduleare integrated into at least one multi-protocol switching module/, that is, each MIFM moduleis connected to a corresponding MFS module. Alternatively, some functions of the IP moduleand the EP moduleare integrated into at least one multi-protocol switching module/, so that some MIFM modulesare connected to corresponding MFS modules, and remaining MIFM modulesare respectively connected to corresponding IP modulesand EP modules. For example, the some MIFM modules may be MIFM modules connected to all servers in a cabinet.
5 5 4 5 4 4 5 s In an implementation, the at least one multi-protocol conversion module may be a DCFB module, and the DCFB moduleis disposed between two MFS modulesin parallel. The DCFB modulereceives a packet from an MFS moduleson either side, and sends the packet to an MFS moduleson the other side after completing protocol conversion. A quantity of disposed DCFB modulemay be determined based on a total service quantity or a total interface quantity.
6 6 6 4 6 4 4 6 In another implementation, the multi-protocol conversion module may alternatively be at least one CCFB module. When there is a plurality of CCFB modules, the CCFB modulesare disposed between the two MFS modulesin parallel. The CCFB modulereceives a packet from an MFS moduleon any side, and sends the packet back to the MFS moduleafter completing protocol conversion. A quantity of disposed CCFB modulesmay be determined based on a total service quantity or a total interface quantity.
800 5 1 For example, for a ToR switchin which a DCFB moduleis configured for only an MIFM moduleconnected to a server, a packet processing procedure is as follows.
1 5 6 800 Step: Receive a data stream from the server through a SerDes, where the data stream is a data stream based on a first protocol, and the first protocol is one of a plurality of protocols supported by the multi-function interface module and the multi-protocol switching module/in the ToR switch.
2 1 5 Step: The multi-function interface MIFM moduleimplements processing such as decoding, error correction, framing, error check, and retransmission on the data stream, generates a first packet, and sends the first packet to the DCFB module.
3 5 4 Step: The DCFB moduleperforms processing such as packet parsing, table lookup, forwarding, and editing on the first packet, and when determining that protocol conversion is required, converts the first packet based on a first protocol into a second packet based on a second protocol, and sends the second packet and control information to an MFS module.
4 4 1 Step: The MFS modulereceives the second packet and the control information, and switches the second packet to a corresponding destination MIFM module.
5 5 1 5 6 1 Step: The DCFB modulecorresponding to the destination MIFM modulereceives the second packet from the multi-protocol switching module/, performs operations such as data parsing, table lookup, encapsulation, editing, and distribution, and sends the second packet to the connected destination MIFM module.
6 1 Step: The destination MIFM modulereceives the second packet, completes processing such as segmentation and encoding, and then sends the recovered data stream to the outside through the SerDes.
In some embodiments, the switch may be disposed in a chassis or a cabinet of a disaggregated data center network, to internally provide a bus interface like a CXL/UB interface and interconnect all servers (for example, CPUs) and various pooled resources (accelerators, memories, storage, network interface cards, and the like) for mutual communication, and externally provide a standard Ethernet interface to connect to a spine switch, to establish a large-scale data center network.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
December 31, 2025
May 7, 2026
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.