US-12572500-B2

Scalable network-on-chip for high-bandwidth memory

PublishedMarch 10, 2026

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

Described herein are memory controllers for integrated circuits that implement network-on-chip (NoC) to provide access to memory to couple processing cores of the integrated circuit to a memory device. The NoC may be dedicated to service the memory controller and may include one or more routers to facilitate management of the access to the memory controller.

Patent Claims

Legal claims defining the scope of protection, as filed with the USPTO.

. An electronic device, comprising:

. The electronic device of, wherein the fabric NOC is coupled to the memory controller NOC and configurable to receive the data from the memory controller NOC to route data and route the data to one or more processing cores of a plurality of processing cores of the integrated circuit device, wherein a first router of the memory controller NOC is configurable to route data to a second router of the fabric NOC.

. The electronic device of, wherein a first router of the memory controller NOC comprises a first port configurable to exchange the data and a second port configurable to exchange the data with the HBM device.

. The electronic device of, wherein the integrated circuit device comprises a bridge configurable to couple the integrated circuit device to the HBM device.

. The electronic device of, wherein the bridge comprises a plurality of data links configurable to couple a memory interface of the memory controller to a memory channel of the HBM device.

. The electronic device of, wherein the memory interface is configurable to operate in accordance with an Advanced Extensible Interface 4 (AXI4) protocol, an AXI3 protocol, an AXI-Lite protocol, an AXI Coherency Extensions (ACE) protocol, an Avalon Interface protocol, or a combination thereof.

. The electronic device of, wherein the first data processing core comprises digital signal processing (DSP) circuitry, a reduced instruction set computer (RISC) processor core, an advanced RISC machine (ARM) processor core, or a combination thereof.

. The electronic device of, wherein the memory controller NOC comprises a first router comprising a set of ports and crossbar circuitry configurable to link ports of the set of ports.

. The electronic device of, wherein the first router is configurable to route data to a second router of the fabric NOC.

. The electronic device of, wherein the second data processing core is configurable to provide data packets in a NoC protocol via the direct interconnect.

. An integrated circuit device, comprising:

. The integrated circuit device of, wherein the first router comprises a set of ports and crossbar circuitry configurable to link ports of the set of ports.

. The integrated circuit device of, wherein the memory controller comprises hardened circuitry.

. The integrated circuit device of, wherein the plurality of processing cores comprises a digital processor core, a reduced instruction set computer (RISC) processor core, an advanced RISC machine (ARM) processor core, or a combination thereof.

. An electronic device comprising:

. The electronic device of, wherein the plurality of data processing cores comprises a digital processing core, a reduced instruction set computer (RISC) processor core, an advanced RISC machine (ARM) processor core, or a combination thereof.

. The electronic device of, wherein the memory controller comprises crossbar circuitry, and wherein the memory controller is configurable to block the crossbar circuitry from transmitting data in a bypass mode.

. The electronic device of, comprising a bridge coupled to the memory controller, wherein the bridge is configurable to convert the data packet between a NOC protocol and a memory interface protocol.

. The electronic device of, wherein the memory interface protocol comprises an Advanced Extensible Interface 4 (AXI4) protocol, an AXI3 protocol, an AXI-Lite protocol, an AXI Coherency Extension (ACE) protocol, an Avalon Interface protocol, or a combination thereof.

. The electronic device of, wherein the memory controller comprises a first clock crossing circuitry configurable to couple the first router of the first NOC to a first data processing core of the plurality of data processing cores, wherein the first data processing core operates in a first clocking domain and the first router operates in a memory controller clock domain.

Detailed Description

Complete technical specification and implementation details from the patent document.

This application is a continuation of U.S. patent application Ser. No. 18/089,237, entitled “Scalable Network-on-Chip for High-Bandwidth Memory,” filed on Dec. 27, 2022, which is a continuation of U.S. patent application Ser. No. 16/235,608, entitled “Scalable Network-on-Chip for High-Bandwidth Memory,” filed on Dec. 28, 2018, which claims priority from and the benefit of U.S. Provisional Application Ser. No. 62/722,741, entitled “An Efficient And Scalable Network-On-Chip Topology For High-Bandwidth Memory, And Applications,” filed Aug. 24, 2018, each of which are hereby incorporated by reference in its entirety for all purposes.

This disclosure relates to digital circuitry and, more specifically, to data routing circuitry in digital electronic devices.

This section is intended to introduce the reader to various aspects of art that may be related to various aspects of the present disclosure, which are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present disclosure. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art.

Programmable logic devices are a class of integrated circuits that can be programmed to perform a wide variety of operations. A programmable logic device may include programmable logic elements that can be configured to perform custom operations or to implement one or more data processing circuits. The data processing circuits programmed in the programmable logic devices may exchange data with one another and with off-circuit devices via interfaces. To that end, the programmable logic devices may include routing resources (e.g., dedicated interconnects) to connect different data processing circuits to external interfaces (e.g., memory controllers, transceivers). As an example, certain devices may be configured in a System-in-Package (SiP) form, in which a programmable device, such as a field programmable gate array (FPGA) is coupled to a memory, such as a high bandwidth memory (HBM) using a high bandwidth interface. The FPGA may implement multiple data processing circuits that may access the HBM via the routing resources. As the amount of data, the speed of processing, and the number of functional blocks in a device increases, the routing resources may become insufficient to provide the requested access and, in some occasions, may become a bottleneck that may reduce the capacity of operation of the electronic device

One or more specific embodiments will be described below. In an effort to provide a concise description of these embodiments, not all features of an actual implementation are described in the specification. It may be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another. Moreover, it may be appreciated that such a development effort might be complex and time consuming, but would nevertheless be a routine undertaking of design, fabrication, and manufacture for those of ordinary skill having the benefit of this disclosure.

When introducing elements of various embodiments of the present disclosure, the articles “a,” “an,” and “the” are intended to mean that there are one or more of the elements. The terms “comprising,” “including,” and “having” are intended to be inclusive and mean that there may be additional elements other than the listed elements. Additionally, it should be understood that references to “one embodiment” or “an embodiment” of the present disclosure are not intended to be interpreted as excluding the existence of additional embodiments that also incorporate the recited features. Furthermore, the phrase A “based on” B is intended to mean that A is at least partially based on B. Moreover, unless expressly stated otherwise, the term “or” is intended to be inclusive (e.g., logical OR) and not exclusive (e.g., logical XOR). In other words, the phrase A “or” B is intended to mean A, B, or both A and B.

The highly flexible nature of programmable logic devices makes them an excellent fit for accelerating many computing tasks. Programmable logic devices are increasingly being used as accelerators for machine learning, video processing, voice recognition, image recognition, and many other highly specialized tasks, particularly those that would be too slow or inefficient in software running on a processor. As the size and the complexity of programmable logic devices increase, there is increase in the number and in the amount of data processed by functional blocks (e.g., accelerators, processors, co-processors, digital signal processors) implemented within the programmable logic device. As a result of the increased amount of data exchanged between the cores and/or between core and external devices, a substantial amount of interconnect resources of the programmable device may be consumed. Moreover, in heterogeneous systems (e.g., systems with multiple processing units or cores with different operating frequencies and/or bandwidths), cores that require access to the memory may receive a pre-allocated amount of memory, which may be fixed. During operation, some cores may require more memory space than what was pre-allocated to them, while other cores may underutilize the memory space due to lower workloads. Managing such allocations may further complicate the tasks performed by the memory controllers.

In order to prevent bottlenecks in the access to external devices by cores of the programmable devices, advanced data routing topologies may be used. The present disclosure describes the use of router-based topologies, such as Network-on-Chip (NoC) topologies, to facilitate the connection with external interfaces, such as memory interfaces. The programmable logic device may have a NoC that connects multiple data processing cores of the programmable device to the memory interface. Moreover, the external interfaces (e.g., memory interfaces) may include a dedicated NoC connected to the FPGA NoC, to allow access to the interface using data packets. The dedicated NoC may also allow flexible routing for the data packets to decrease or prevent data congestion from simultaneous access to the interface by multiple data processing cores of the programmable device. The interface controllers described herein may be configurable to allow direct communication between cores in the programmable logic device and the interface, by employing bridges and/or configurable bypass modes to allow direct access to the memory controller. The NoC of the memory interface may also include virtual channels to allow prioritization of certain data packets through the interface to provide Quality-of-Service (QoS) functionality and grouping of multiple channels to allow wide interface connection between a data processing core and the interface. The systems described herein may be used, for example, in System-in-Package (SiP) devices in which processors and memory devices may be coupled with a field programmable gate array (FPGA) device in a single package, coupled by high bandwidth interfaces (e.g., 2.5D interfaces, interconnect bridges, microbump interfaces).

By way of introduction,illustrates a block diagram of a systemthat may employ a programmable logic devicethat can be configured to implement one or more data processing cores, in accordance with embodiments presented herein. Using the system, a designer may implement logic circuitry to implement the data processing cores on an integrated circuit, such as a reconfigurable programmable logic device, such as a field programmable gate array (FPGA). The designer may implement a circuit design to be programmed onto the programmable logic deviceusing design software, such as a version of Intel® Quartus® by Intel Corporation of Santa Clara, California. The design softwaremay use a compilerto generate a low-level circuit-design defined by bitstream, sometimes known as a program object file and/or configuration program, which programs the programmable logic device. Thus, the compilermay provide machine-readable instructions representative of the circuit design to the programmable logic device. For example, the programmable logic devicemay receive one or more configuration programs (bitstreams)that describe the hardware implementations that should be stored in the programmable logic device.

A configuration program (e.g., bitstream)may be programmed into the programmable logic deviceas a configuration program. The configuration programmay, in some cases, represent one or more accelerator functions to perform for machine learning, video processing, voice recognition, image recognition, or other highly specialized task. The configuration programmay also include data transfer and/or routing instructions to couple the one or more data processing cores to each other and/or to external interfaces, such as processors, memory (e.g., high bandwidth memory (HBM), volatile memory such as random-access memory (RAM) devices, hard disks, solid-state disk devices), or serial interfaces (Universal Serial Bus (USB), Peripheral Component Interconnect Express (PCIe)).

The programmable logic devicemay be, or may be a component of, a data processing system. For example, the programmable logic devicemay be a component of a data processing system, shown in. The data processing systemmay include one or more host processors, memory and/or storage circuitry, and a network interface. The data processing systemmay include more or fewer components (e.g., electronic display, user interface structures, application specific integrated circuits (ASICs)), which may be coupled to one another via a bus. The host processormay include one or more suitable processors, such as an Intel® Xeon® processor or a reduced-instruction processor (e.g., a reduced instruction set computer (RISC), an Advanced RISC Machine (ARM) processor) that may manage a data processing request for the data processing system(e.g., to perform machine learning, video processing, voice recognition, image recognition, data compression, database search ranking, bioinformatics, network security pattern identification, spatial navigation, or the like). The memory and/or storage circuitrymay include random access memory (RAM), read-only memory (ROM), one or more hard drives, flash memory, or the like. The memory and/or storage circuitrymay be considered external memory to the programmable logic deviceand may hold data to be processed by the data processing system. In some cases, the memory and/or storage circuitrymay also store configuration programs (bitstreams) for programming the programmable logic device. The network interfacemay allow the data processing systemto communicate with other electronic devices. The devices in the data processing systemmay include several different packages or may be contained within a single package on a single package substrate (e.g., System-in-Package (SiP)).

In one example, the data processing systemmay be part of a data center that processes a variety of different requests. For instance, the data processing systemmay receive a data processing request via the network interfaceto perform machine learning, video processing, voice recognition, image recognition, data compression, database search ranking, bioinformatics, network security pattern identification, spatial navigation, or some other specialized task. The host processormay cause the programmable logic fabric of the programmable logic devicebe programmed with a particular accelerator related to requested task. For instance, the host processormay instruct that configuration data (bitstream) stored on the storage circuitryor cached in a memory of the programmable logic devicebe programmed into the programmable logic fabric of the programmable logic device. The configuration data (bitstream) may represent multiple data processing circuits that implement accelerator functions relevant to the requested task. The processing cores in the programmable logic devicemay then retrieve data from an interface (e.g., memory interface, network interface) and/or from the processor to perform the requested task. The presence of the dedicated NoC in the interfaces, as described herein, may allow quick performance of the required tasks. Indeed, in one example, an accelerator core may assist with a voice recognition task less than a few milliseconds (e.g., on the order of microseconds) by rapidly exchanging and processing large amounts of data with a high bandwidth memory (HBM) device (e.g., storage circuitry) coupled to the programmable logic device.

In some systems, the programmable logic devicemay be connected to memory devices and/or processor devices via high bandwidth interfaces.illustrates a schematic diagram of a System-in-Chip (SiP)that may include a programmable logic device. The programmable logic devicemay be connected to processorsand to a storage circuitry, which may be a high bandwidth memory (HBM). The connection between the programmable logic deviceand the memorymay take place via a high-bandwidth bridge. The high-bandwidth bridgemay be a 2.5D bridge, a 3D bridge, a microbump bridge, an interconnect bridge, or any other multi-channel interconnect. The programmable logic devicemay be connected to processorsthrough bridges. In some embodiments, bridgesmay be high bandwidth bridges similar to the high-bandwidth bridge, such as in systems that benefit from high data rate transfers between processorsand the programmable logic device. In some embodiments, bridgesmay be regular interfaces (e.g., serial interfaces, 1D interconnects).

illustrates a diagram of a data processing systemwhich may include a SiP, such as that of. The SiPmay include a programmable logic deviceconnected to an HBMvia a high-bandwidth bridge. The SiPmay also be connected to external processors, via bridges. The connection through bridgesmay include transceiversto allow serial connection. As discussed above, the programmable logic devicemay implement one or more data processing coresA-K. Specifically, the diagram ofillustrates digital signal processing (DSP) coresA,E,J,G, accelerator coresB,I, andK, and image processing coresC,D,F, andH. The illustration is merely descriptive, and other number and/or types of descriptions may be employed.

In order to exchange data, the data processing coresA-K may be directly connected using a direct interconnectof the programmable logic device. As discussed above, the routing through the direct interconnectsmay be programmed in the configuration of the programmable logic device(e.g., bitstreamof), as discussed above. The data processing coresA-K may also exchange data using the Network-on-Chip (NoC)of the programmable logic device. To that end, the cores may exchange data packets through NoC interconnectswith a NoC routerof the NoC. The data packets sent via the NoCmay include destination address information for appropriate routing and/or priority information to provide quality of service (QoS) in the data transmission. The NoC routersof the NoCmay inspect the destination address information and/or the header and route the data packages to the appropriate router or processor core. The processorsmay also access the NoCthrough NoC interconnectscoupled to the bridges. The NoCmay also be coupled to the HBMvia a memory controller, as illustrated. The presence of the NoCmay allow flexible exchange of data between data processing coresA-K, the HBM, and the processors, through an efficient use of routing resources in the programmable logic device.

The memory controllermay include a dedicated memory controller NoC. The memory controller NoCmay be connected to the NoCvia router-to-router NoC links. The NoC linksmay allow transmission of data packets between the NoC routersand the memory controller routersof the memory controller NoC. The memory controller NoCmay also be directly accessed by the data processing coresA-K via direct memory controller interconnects, as illustrated. In some embodiments, the data processing coresA-K may provide data packets in the NoC protocol via the direct memory controller interconnects. In some embodiments, the data processing coresA-K may employ a protocol compatible with the memory controller. In such embodiments, bridge circuitry may be used to translate between the NoC protocol and the memory protocol, as detailed below.

The high-bandwidth bridgemay include multiple physical data links. The routersof the memory controller NoCmay access the data linksvia the memory channel circuitryof the memory controller. In some embodiments, memory channel circuitrymay include hardened circuitry. The memory channel circuitrymay include multiple memory channel interfaces, which manage the access to the data links. Each memory channel interfacemay connect with a memory channelA-H of the HBM. A bridge circuitry may be used to convert the data packets from the memory controller routerto the memory protocol employed by the memory channel interface(e.g., a memory interface protocol).

The flow chartinillustrates a method that may be used by data processing cores (e.g., data processing coresA-K in) to access the memory device (e.g., HBMof) using the programmable logic device NoC (e.g., NoC) and the memory controller NoC. In process block, the data directed to the memory device is sent by the data processing core to a router of the programmable logic device NoC. The data may be packaged in a NoC protocol and may include a header having an address and/or a priority information. In process block, the programmable logic device NoC transports the data through its routers to a router connected to the memory controller NoC using a header information. In process block, the data is transferred from the programmable logic device NoC to the memory controller NoC. Such process may take place by a router-to-router link, such as the NoC linksillustrated in.

In process block, the data is sent from the memory controller NoC to the hardened memory controllers and, subsequently, to the memory via one of the channels. In this process, the data packet in the NoC format may be converted to a format employed by the memory controller that may be compatible with the memory device. The flow chartis illustrative of methods to interact with memory using a memory controller with a dedicated NoC. Methods to retrieve data from the memory to a data processing core and methods to exchanged data between memory and other devices attached to the programmable logic device (e.g., processors) can be obtained by adapting flow chart.

The diagraminillustrates the flexibility of data exchanges that may take place between the programmable logic deviceand the HBMusing the memory controller NoC. The memory controller NoCmay be accessed through the NoC, using the NoC protocol, or directly by data processing coresA-P, using an interconnect protocol that is compatible with the memory interface (e.g., a memory interface protocol). In diagram, the memory controller NoCincludes 8 routers. Each memory controller routermay connect to memory channel interfacesthrough a connection. Specifically, each routeris connected to two memory channel interfaces, and each channel interfacesis connected to a port of a memory channelA-H via data links. The routersmay also connect to each other through connectionsthat form the memory controller NoC. The connectionsmay allow an alternative routing that may mitigate congestion in the programmable logic device NoC.

Each memory controller routermay also be connected to NoC routersof the programmable logic device NoC. In the diagram, each routeris connected to a single NoC router. This connection may be used to transport data packets from the programmable logic deviceto the HBMvia the programmable logic device NoC, as discussed above. The routersmay also be connected directly to data processing coresA-P through dedicated interconnects, as illustrated. In the diagram, each routeris coupled to two processing cores via two dedicated interconnects. The data processing coresA-P may be configured to access the routerusing an memory interface protocol and, as detailed below, bridge circuitry may be used to allow the router to process data packets from the NoC routerand memory access requests from data processing coresA-P.

More generally, the memory controller NoCmay, effectively, operate as a crossbar between the programmable fabric of the programmable logic deviceand the high bandwidth memory. In the illustrated example, the memory controller NoCmay operate as a 16×16 crossbar that may allow any of the data processing coresA-P to access any of thememory channels through any of theinputs of the NoC routers, independent from the location of the data processing core. It should be understood that other crossbar dimensions for the memory controller NoCmay be obtained (e.g., 8×8, 32×32, 64×64) by adjusting the number of routersand the number of memory channelsin the memory channel circuitry, to support other versions of memory, (e.g., HBM3 that may havepseudo channels).

The diagraminillustrates the memory controller router. As discussed above, the memory controller routermay receive data packets from the programmable logic device NoC, from a neighboring memory controller routerof the memory controller NoC, or from a direct access by a data processing core. Moreover, the memory controller routermay interact with a memory channel interface. As the NoC router may employ data packets in a NoC protocol that may be different from the protocol of the memory interface, bridge circuitry may be used to translate between the protocols. To that end, the memory controller routermay be connected to two memory-side bridgesand, and two device-side bridgesand. The memory-side bridgesandmay be used to connect the memory controller routerto the memory channel interfacesand the device-side bridgesandmay be used to provide direct access to the memory controller routerby data processing coresin the fabric of the programmable logic device. The illustrated bridges,,, andmay be compliant with an interconnect protocol that is compatible with the memory interface (e.g., a memory interface protocol), such as an Advanced Extensible Interface 4 (AXI4) protocol. It should be noted that the bridges may comply with other protocols, including Advanced Microcontroller Bus Architecture (AMBA) protocols which may include AXI3 or other AXI protocols, lite versions such as AXI-Lite protocols, and coherence extensions, such as AXI Coherency Extensions (ACE) or ACE-Lite protocols, and Avalon Interface protocols. This operation is detailed further in.

A memory controller routermay have multiple ports. The illustrated memory controller router, may have 8 ports,,,,,,, and. The ports may be connected to each other through a crossbar. Ports may, generally, receive and/or transmit data packets in the NoC protocol format. For example, portsandmay be used to connect to neighboring NoC routersof the memory controller NoCand portmay be used to connect to a NoC routerof the programmable logic device NoC. Portsandmay be used to provide direct data access by data processing cores through bridgesand, respectively. Portsandmay be used to exchange data with the HBMvia the memory channel interfaceand bridgesand, as illustrated. Bridges,,, andmay provide data packets in the NoC protocol to allow the crossbarto manage data routing seamlessly, as all inputs are “packetized.” As a result, the memory controller routermay use the crossbarto manage the access to the memory channel interfacesfrom data processing coresthat access the memory either directly or via the NoCto provide high throughput access and prevent deadlocks, as detailed further in.

When providing direct access to a data processing core, the bridges may operate as master-slave pairs that coordinate operations. For example, bridgemay be slave to bridge, and bridgemay be slave to bridge. This coordination may allow transparent transport of data in a memory interface protocol through the router. Moreover, the memory controller routermay have two bypass routesA andB, which may directly connect portto port, and portto port, respectively. The bypass routesA andB may be used in situations in which the data processing coresbenefit from direct access to the memory controllerand/or the HBM. This may be used, for example, to provide deterministic latency between the data processing coreand the HBM, and/or to provide a high-bandwidth connection between the data processing coreand the HBMby grouping multiple memory channels.

illustrates a diagram of device-side bridge, such as device-side bridgesand. The device side bridge may receive data from the programmable fabric in a memory interface protocol, such as the AXI4 protocol. The device-side bridgemay include clock crossing circuitry, which may adjust the data rate frequency to the clock domain of the memory controller NoC. The bridgemay also include protocol specific circuitry (e.g., AXI4 converter). In the example, the protocol specific circuitry may include a read address block, a data read block, a write address block, a data write block, and a write response block. Data buffers, such as the data read blockand the data write blockmay include width convertersandto provide data-rate matching and prevent blocking in the device-side bridge. The protocol specific circuitry may be converted to a packet format compatible with the memory controller routerby a virtual bridge channel. The use of the virtual bridge channelwith multiple FIFOs may mitigate head-of-line (HOL) blocking.

provides a logical diagram for the dataflow through ports of the memory controller routerwhen employing the crossbar. The data may come in through any of the ports,,,,,, andand be routed by the crossbar, to any of the ports,,,,,, and. To that end, the crossbarmay be an 8×8 crossbar. Each port may include clock crossing circuitry. The clock crossing circuitrymay facilitate the conversion of the rate of the data to the clock domain of the memory controller router. For example, the data may be received from a NoC routerof the NoCor from a neighboring memory controller router, which may operate with a different data rate from the memory controller router. The clock crossing circuitrymay, thus, allow the routers (e.g., NoC routers, memory controller routers) to run at different frequencies and to connect to each other seamlessly.

Data from each port may also be managed by virtual channel circuitry, which may include dedicated FIFO buffers to help increase throughput and mitigate the occurrences of deadlock. A virtual channel allocatormay be used to manage the virtual channel circuitriesby inspecting each incoming data packet and/or data packet header and assigning it to the appropriate virtual channel. In order to manage the crossbar, a switch allocatorand/or a routing computation blockmay be used. The switch allocatormay arbitrate the input-to-output routing requests through the crossbarto assign routing resources. The routing computation blockmay inspect the data packet headers and identify the physical output port that is appropriate for the data packet. As such, the routing computation blockmay generate requests for routing for the switch allocatorand provide an optimized routing of data packets through the memory controller router.

A diagraminillustrates how the memory controller NoCmay be configured to provide direct access to memory (e.g., HBM) by data processing cores of the programmable logic device, such as data processing coresA andB. In this diagram, the data packets may be sent directly from the data processing coresA andB to the routerof the memory controller NoCin a format compatible with the memory controller. The data is initially sent to a rate-matching FIFO, that may decouple the operating frequency of the data processing coresA andB from the operating data frequency of the memory controller. The rate-matching FIFOsmay be configured independently and, as a result, data processing coresA andB may operate with different data frequencies and/or data frequency rates. Master half-rate adaptorsand slave adaptors, may be used in coordination to adjust the data rate of the memory controller(e.g., HBM data rate) to a half-data rate clock that may be appropriate for operation in the bridges,,, andand/or the memory controller router.

The data from the memory controller routermay be translated in the memory-side bridgesorto a memory interface protocol and provided to the slave adaptors. From the slave adaptors, the data may be sent to the memory channel interfaces. Memory channel interfacesmay include write data buffersand read data buffers, which may manage the data flow between the memory controller NoCand the data link. A memory control gasketmay be used to assist the control of the data flow. The memory control gasketmay generate and/or receive HBM-compliant command and data to perform read and write operations over the data link.

The diagrams inillustrate two usage models that employ the dedicated memory controller NoC. The diagraminillustrates a system in which multiple data processing cores may access the same portof the HBMtransparently using memory controller routers. Such application may be useful in platforms in which kernels or compute units may access a shared constant memory, such as in OpenCL platforms. Kernel programs and coefficients may be stored in a common memory channel and the presence of the memory controller NoCmay allow multiple kernels in multiple different data processing coresto access the common channel (e.g., memory channelA).

As illustrated, each data processing coreA-P may send data directly to a corresponding neighboring memory controller router. As discussed above, the data may be converted from a memory interface protocol to a NoC compatible protocol when sent to the neighboring router. The data packets may have a destination address associated with, for example, the routerthat is adjacent to the memory channel controllerand coupled to the port. Each neighboring routermay then transmit the data via memory controller NoCto the router. As the routerreceives the data packets from the neighboring routers, the memory requests may be prioritized based on the header information and requests for memory access may be issued to the memory channel controller. As a result, all the data processing coresA-P may access the portof the memory channelA.

The diagraminillustrates a system in which the data processing cores may be configured to access the HBMthrough a wide memory interface. In the example, the data processing coresA,B, andC may be allocated to groups of memory channelsA,B, andC, respectively. Data processing coreA may access 4 data linkshaving, each, 64 I/O connections forming an interface with width of 256 I/O lines. Data processing coreB may access 8 data linkshaving, each 64 I/O connections, forming an interface with width of 512 I/O lines. Data processing coreC may access 8 data linkshaving, each, 64 I/O connections, forming an interface with width of 256 I/O lines. In some embodiments, the priority information in the header of packets may be used to provide synchronization between all the packets coming from the same core. This may be useful in situations where a routeris providing access via the wide interface from data processing coresA,B, orC, as well as to other data processing circuitry (e.g., data processing coreA of) or to a processor (e.g., processor) via the NoC. In such situations, the virtual channels in the memory controller routermay be used to time the requests from the wide interface and/or the NoCin a manner that is transparent for the data processing circuitry and/or the processors.

To further facilitate the binding of the wide interfaces, the memory controller routersmay be configured in the bypass mode, as discussed above, to provide deterministic latency. A methodfor enabling a bypass mode is illustrated in. In a process block, the bypass mode may be implemented in the memory controller router, as discussed above. The bypass mode may bind the input and the output ports of the router as illustrated in. In some embodiments, enabling the bypass mode may block the crossbarand/or cause the buffering in the other virtual channels (e.g., virtual channel circuitry) to hold the data during the bypass mode transmission. For example, the crossbarmay assign higher priority to data transfers during the bypass mode. In process block, the data processing core may interact with the memory (e.g., HBM) through direct addressing, and with a deterministic latency, as discussed above. At the end of the data exchange, the routermay exit the bypass mode and resume regular routing.

The methods and devices of this disclosure may be incorporated into any suitable circuit. For example, the methods and devices may be incorporated into numerous types of devices such as microprocessors or other integrated circuits. Exemplary integrated circuits include programmable array logic (PAL), programmable logic arrays (PLAs), field programmable logic arrays (FPLAs), electrically programmable logic devices (EPLDs), electrically erasable programmable logic devices (EEPLDs), logic cell arrays (LCAs), field programmable gate arrays (FPGAs), application specific standard products (ASSPs), application specific integrated circuits (ASICs), and microprocessors, just to name a few.

Moreover, while the method operations have been described in a specific order, it should be understood that other operations may be performed in between described operations, described operations may be adjusted so that they occur at slightly different times or described operations may be distributed in a system which allows the occurrence of the processing operations at various intervals associated with the processing, as long as the processing of overlying operations is performed as desired.

The embodiments set forth in the present disclosure may be susceptible to various modifications and alternative forms, specific embodiments have been shown by way of example in the drawings and have been described in detail herein. However, it may be understood that the disclosure is not intended to be limited to the particular forms disclosed. The disclosure is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the disclosure as defined by the following appended claims. In addition, the techniques presented and claimed herein are referenced and applied to material objects and concrete examples of a practical nature that demonstrably improve the present technical field and, as such, are not abstract, intangible or purely theoretical. Further, if any claims appended to the end of this specification contain one or more elements designated as “means for [perform]ing [a function] . . . ” or “step for [perform]ing [a function] . . . ,” it is intended that such elements are to be interpreted under 35 U.S.C. 112(f). For any claims containing elements designated in any other manner, however, it is intended that such elements are not to be interpreted under 35 U.S.C. 112(f).

While the embodiments set forth in the present disclosure may be susceptible to various modifications and alternative forms, specific embodiments have been shown by way of example in the drawings and have been described in detail herein. However, it may be understood that the disclosure is not intended to be limited to the particular forms disclosed. The disclosure is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the disclosure as defined by the following appended claims.

Patent Metadata

Filing Date

Unknown

Publication Date

March 10, 2026

Inventors

Unknown

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search