11922535

Compute Optimization Mechanism for Deep Neural Networks

PublishedMarch 5, 2024
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
11 claims

Legal claims defining the scope of protection, as filed with the USPTO.

3

3. The graphics processing unit as in claim 2, wherein the set of FPUs includes first FPUs to perform the FP32 operations and second FPUs to perform the FP16 operations.

4

4. The graphics processing unit as in claim 1, wherein the first set of operands include one or more 64-bit operands.

5

5. The graphics processing unit as in claim 1, wherein the first set of processing cores of the first type is configured to perform an in-place matrix to vector transformation for a first type of operand stored in the register file, the in-place matrix to vector transformation includes a set of operations having a source and destination, the source and destination within the register file, wherein the source includes a register address start limit, stride, number of elements, and element size.

6

6. The graphics processing unit as in claim 1, wherein the one or more multiprocessors have a single instruction multiple thread (SIMT) architecture.

9

9. The method as in claim 8, further comprising performing the FP32 operations at first FPUs of the set of FPUs and performing the FP16 operations at second FPUs of the set of FPUs.

10

10. The method as in claim 7, wherein the first set of operands include one or more 64-bit operands.

11

11. The method as in claim 7, further comprising performing an in-place matrix to vector transformation for a first type of operand stored in the register file via the first set of processing cores of the first type, wherein the in-place matrix to vector transformation includes a set of operations having a source and destination, the source and destination are within the register file, and the source includes a register address start limit, stride, number of elements, and element size.

12

12. The method as in claim 7, wherein the GPU includes one or more multiprocessors comprising the first set of processing cores of the first type, the second set of processing cores of the second type, and an instruction cache to store a first instruction associated with the first set of operands and a second instruction associated with the second set of operands, wherein the one or more multiprocessors have a single instruction multiple thread (SIMT) architecture.

15

15. The graphics processing system as in claim 14, wherein the set of FPUs includes first FPUs to perform the FP32 operations and second FPUs to perform the FP16 operations.

16

16. The graphics processing system as in claim 13, wherein the first set of operands include one or more 64-bit operands.

17

17. The graphics processing system as in claim 16, wherein the first set of processing cores of the first type is configured to perform an in-place matrix to vector transformation for a first type of operand stored in the register file, the in-place matrix to vector transformation includes a set of operations having a source and destination, the source and destination within the register file, and the source includes a register address start limit, stride, number of elements, and element size.

Patent Metadata

Filing Date

Unknown

Publication Date

March 5, 2024

Inventors

Prasoonkumar Surti
Narayan Srinivasa
Feng Chen
Joydeep Ray
Ben J. Ashbaugh
Nicolas C. Galoppo Von Borries
Eriko Nurvitadhi
Balaji Vembu
Tsung-Han Lin
Kamal Sinha
Rajkishore Barik
Sara S. Baghsorkhi
Justin E. Gottschlich
Altug Koker
Nadathur Rajagopalan Satish
Farshad Akhbari
Dukhwan Kim
Wenyin Fu
Travis T. Schluessler
Josh B. Mastronarde
Linda L. Hurd
John H. Feit
Jeffery S. Boles
Adam T. Lake
Karthik Vaidyanathan
Devan Burke
Subramaniam Maiyuran
Abhishek R. Appu

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “COMPUTE OPTIMIZATION MECHANISM FOR DEEP NEURAL NETWORKS” (11922535). https://patentable.app/patents/11922535

© 2026 Patentable. All rights reserved.

Patentable is a research and drafting-assistant tool, not a law firm, and does not provide legal advice. Documents we generate are drafts for review by a licensed patent attorney.