An AI accelerator apparatus using in-memory compute chiplet devices. The apparatus includes one or more chiplets, each of which includes a plurality of tiles. Each tile includes a plurality of slices, a central processing unit (CPU), and a hardware dispatch device. Each slice can include a digital in-memory compute (DIMC) device configured to perform high throughput computations. In particular, the DIMC device can be configured to accelerate the computations of attention functions for transformer-based models (a.k.a. transformers) applied to machine learning applications. A single input multiple data (SIMD) device configured to further process the DIMC output and compute softmax functions for the attention functions. The chiplet can also include die-to-die (D2D) interconnects, a peripheral component interconnect express (PCIe) bus, a dynamic random access memory (DRAM) interface, and a global CPU interface to facilitate communication between the chiplets, memory and a server or host system.
Legal claims defining the scope of protection, as filed with the USPTO.
8. The apparatus of claim 1 wherein each of the chiplets comprises a plurality of tiles arranged symmetrically to each other, each of the tiles comprising a portion of the plurality of slices.
16. The device of claim 11 wherein each of the chiplets comprises a plurality of tiles arranged symmetrically to each other, each of the tiles comprising a portion of the plurality of slices.
17. The device of claim 11 wherein the DIMC device is configured to support one or more block floating point data types using a shared exponent or to support a block structured sparsity.
18. The device of claim 11 further comprising a network on chip (NoC) device configured for a multicast process and coupled to each of the plurality of slices.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
October 17, 2022
January 30, 2024
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.