Legal claims defining the scope of protection, as filed with the USPTO.
1. A system comprising: a processor having one or more cores; and a programmable fabric device, wherein the processor is stacked in a three-dimensional orientation above the programmable fabric device, and wherein the programmable fabric device comprises: a programmable fabric comprising a plurality of partitions configured to perform fine-grained acceleration operations; and one or more interfaces configured to provide connections between the programmable fabric and the processor, wherein the programmable fabric device is operable to: receive one or more sets of data from a processor pipeline via the one or more interfaces; configure a first portion of the programmable fabric comprising the plurality of partitions coupled to one or more executions units of the one or more cores of the processor to perform the fine-grained acceleration operations, wherein the fine-grained acceleration operations comprise extending an instruction-set architecture of the processor to initiate a custom opcode space to interface with the programmable fabric; receive one or more additional sets of data from the processor pipeline; and configure a second portion of the programmable fabric comprising one or more system memory portions reserved for the programmable fabric to interface with the processor to perform coarse-grained acceleration operations.
2. The system of claim 1, wherein the fine-grained acceleration operations comprise performing operations that read and write data to and from a register file, an L1 cache, an L2 cache, or any other per-core cache of the processor.
3. The system of claim 2, wherein the one or more interfaces comprise one or more ports for sections of the register file, the L1 cache, the L2 cache, or any other per-core cache of the processor.
4. The system of claim 1, wherein the one or more interfaces comprise a three-dimensional integrated circuit face-to-face die stacking packaging-based interface.
5. The system of claim 1, wherein a number of the plurality of partitions corresponds to at least the number of the one or more cores of the processor.
6. The system of claim 1, wherein the fine-grained acceleration and the coarse-grained acceleration operations are performed concurrently by configuring the first portion of the programmable fabric for the fine-grained acceleration operations and configuring the second portion of the programmable fabric for the coarse-grained acceleration operations.
7. The system of claim 1, wherein the coarse-grained acceleration operations comprise performing operations using one or more compute express link (CXL) devices that utilize a shared memory component with the processor.
8. The system of claim 1, wherein the one or more interfaces comprise one or more input/outputs (I/Os) of the programmable fabric, one or more external general-purpose input/outputs (GPIOs), or both.
9. The system of claim 1, wherein a workload architecture of the processor leverages the custom opcode space to define a set of custom instructions to perform the fine-grained acceleration operations.
10. The system of claim 9, wherein the set of custom instructions are leveraged by a compiler, curated libraries, or both.
11. The system of claim 1, where in the programmable fabric device comprises a field-programmable gate array (FPGA).
12. A method of data transfer between a processor stacked in a three-dimensional orientation above a programmable fabric device comprising: receiving, via the processor, one or more sets of data from a processor pipeline via one or more interfaces; configuring, via the processor, a first portion of a programmable fabric of the programmable fabric device comprising a plurality of partitions coupled to one or more execution units of one or more cores of the processor to perform fine-grained acceleration operations, wherein the fine-grained acceleration operations comprise extending an instruction-set architecture of the processor to initiate a custom opcode space to interface with the programmable fabric; receiving, via the processor, one or more additional sets of data from the processor pipeline; and configuring, via the processor, a second portion of the programmable fabric comprising one or more system memory portions reserved for the programmable fabric to interface with the processor to perform coarse-grained acceleration operations on the one or more additional sets of data.
13. The method of claim 12, comprising performing, via the processor, the fine-grained acceleration and the coarse-grained acceleration operations concurrently by configuring the first portion of the programmable fabric for the fine-grained acceleration operations and configuring the second portion of the programmable fabric for the coarse-grained acceleration operations concurrently.
14. The method of claim 12, comprising performing, via the processor, the coarse-grained acceleration operations with one or more external general-purpose input/outputs (GPIOs) operable to provide an interface for the programmable fabric with the processor to perform the coarse-grained acceleration operations.
15. The method of claim 12, wherein the coarse-grained acceleration operations comprise performing operations using one or more compute express link (CXL) devices that utilize a shared memory component with the processor.
16. The method of claim 12, wherein performing the fine-grained acceleration operations comprises performing operations that read and write data to and from a register file, an L1 cache, an L2 cache, or any other per core cache of the processor.
17. A system comprising: one or more compute chiplets; a programmable fabric base die, wherein the one or more compute chiplets are stacked in a three-dimensional orientation above the programmable fabric base die, and wherein the programmable fabric base die comprises one or more interfaces configured to provide connections between the programmable fabric base die and the one or more compute chiplets, wherein the programmable fabric base die is operable to: enable data transfer between the one or more compute chiplets that are three-dimensionally stacked above the programmable fabric base die; and receive, via the one or more compute chiplets, one or more sets of data via the one or more interfaces; configure, via the one or more compute chiplets, a first portion of the programmable fabric base die comprising a plurality of partitions coupled to one or more portions of the one or more compute chiplets to perform fine-grained acceleration operations, wherein the fine-grained acceleration operations comprise extending an instruction-set architecture of the one or more compute chiplets to initiate a custom opcode space to interface with the programmable fabric base die; receive, via the one or more compute chiplets, one or more additional sets of data via the one or more interfaces; and configure, via the one or more compute chiplets, a second portion of the programmable fabric base die comprising one or more system memory portions reserved for the programmable fabric base die to interface with the one or more compute chiplets to perform coarse-grained acceleration operations on the one or more additional sets of data.
18. The system of claim 17, wherein the one or more compute chiplets comprises one or more Central Processing Unit (CPU) chiplets, one or more Graphical Processing Unit (GPU) chiplets, one or more Dual accelerator (DL) chiplets, or any combination thereof.
19. The system of claim 17, wherein the one or more compute chiplets provision a memory, one or input/output (I/O) resources, or both dynamically based on an expected workload of the one or more compute chiplets.
20. The system of claim 17, wherein the programmable fabric base die comprises a field-programmable gate array (FPGA).
Unknown
May 6, 2025
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.