Patentable/Patents/US-10580109
US-10580109

Data distribution fabric in scalable GPUs

PublishedMarch 3, 2020
Assigneenot available in USPTO data we have
Inventorsnot available in USPTO data we have
Technical Abstract

One embodiment provides for a processor comprising a three-dimensional (3D) integrated circuit stack including multiple graphics processor cores and interconnect logic to interconnect the graphics processor cores of the 3D integrated circuit stack to enable data distribution between the graphics processor cores over a virtual channel including multiple programmatically pre-assigned traffic classifications.

Patent Claims
20 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1

1. A processor comprising: a three-dimensional (3D) integrated circuit stack including multiple graphics processor cores; and interconnect logic to interconnect the graphics processor cores of the 3D integrated circuit stack to enable data distribution between the graphics processor cores over a virtual channel including multiple programmatically pre-assigned traffic classifications.

2

2. The processor of claim 1 , wherein the virtual channel is transmitted over at least one physical data channel, the interconnect logic includes multiple data channels, and each of the multiple data channels is a separately clock gated bus.

3

3. The processor of claim 2 , wherein each bus is to use early indications to signal incoming activity.

4

4. The processor of claim 1 , wherein the interconnect logic is to couple the graphics processor cores to a shared resource and enable coherent access to the shared resource for execution threads of the graphics processor cores, wherein to enable coherent access to the shared resource the interconnect logic is to route traffic that originates from a single execution thread of the graphics processor cores within a single traffic classification.

5

5. The processor of claim 4 , wherein the shared resource is a shared memory or a shared cache.

6

6. The processor of claim 5 , wherein the interconnect logic is to enable the data distribution over multiple virtual channels, the multiple virtual channels including the virtual channel and one or more additional channels.

7

7. The processor of claim 6 , wherein the multiple virtual channels are to be arbitrated based on a programmable priority system, at least one virtual channel is to be assigned multiple traffic classifications, and each of the multiple traffic classifications has a programmable priority.

8

8. The processor of claim 7 , wherein the programmable priority is relative to traffic classifications within a same virtual channel of the multiple virtual channels.

9

9. The processor of claim 1 , wherein the interconnect logic operates at a higher frequency than the graphics processor cores.

10

10. The processor of claim 1 , additionally including a set of interconnect nodes to couple the graphics processor cores with the interconnect logic, wherein the interconnect logic includes multiple data channels and the set of interconnect nodes is configured to switch data between the multiple data channels when transiting one of the graphics processor cores.

11

11. A graphics processor device comprising: a system interface bus; a processor including a three-dimensional (3D) circuit stack including a plurality of graphics processor cores coupled via interconnect logic having at least one clock gated physical data channel and a set of virtual channels including one or more virtual channels, the one or more virtual channels having multiple programmatically pre-assigned traffic classifications; and memory coupled to the interconnect logic and at least one graphics processor core of the plurality of graphics processor cores, the memory to store data for the at least one graphics processor core before transmission via the interconnect logic.

12

12. The graphics processor device as in claim 11 , wherein the plurality of graphics processor cores couple with a shared resource on the processor via the interconnect logic, wherein the shared resource on the processor is a shared memory resource.

13

13. The graphics processor device as in claim 12 , wherein the shared memory resource includes a shared cache memory.

14

14. The graphics processor device of claim 11 , wherein the set of virtual channels includes multiple virtual channels, the multiple virtual channels in the set of virtual channels are to be arbitrated based on a programmable priority system, and each of the multiple programmable traffic classifications is prioritized relative to other traffic classes assigned to a same virtual channel of the multiple virtual channels.

15

15. The graphics processor device of claim 11 , wherein the graphics processor device is a graphics processor card.

16

16. A method comprising: determining a channel access status on a multiple node shared bus for a message from a source node to a target node, wherein at least one node of the multiple node shared bus couples with a graphics processor core of an integrated circuit and at least one node of the multiple node shared bus couples with a shared resource on the integrated circuit; transmitting a message from the source node to the target node over a first data channel, wherein the message is associated with a first traffic classification having a first priority; receiving the message at a first data bus connector coupled with the graphics processor core; and based on at least the source node and the target node, switching the message from the first data channel to a second data channel.

17

17. The method of claim 16 , additionally including determining that the message is associated with the first traffic classification and switching the message from the first data channel to the second data channel based at least on the first traffic classification.

18

18. The method of claim 16 , wherein determining the channel access status comprises: determining, using a channel access protocol, if a third data channel is available to transmit the message from the source node to the target node; and after determining that transmission over the third data channel is blocked, transmitting the message over the first data channel.

19

19. The method of claim 18 , wherein the first, second, and third data channel are virtual data channels.

20

20. The method of claim 19 , wherein the channel access protocol is a time division multiple access protocol or a carrier sense multiple access protocol.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

Patent Metadata

Filing Date

May 21, 2019

Publication Date

March 3, 2020

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “Data distribution fabric in scalable GPUs” (US-10580109). https://patentable.app/patents/US-10580109

© 2026 Patentable. All rights reserved.

Patentable is a research and drafting-assistant tool, not a law firm, and does not provide legal advice. Documents we generate are drafts for review by a licensed patent attorney.