US-10817338

Dynamic partitioning of execution resources

PublishedOctober 27, 2020

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

Embodiments of the present invention set forth techniques for allocating execution resources to groups of threads within a graphics processing unit. A compute work distributor included in the graphics processing unit receives an indication from a process that a first group of threads is to be launched. The compute work distributor determines that a first subcontext associated with the process has at least one processor credit. In some embodiments, CTAs may be launched even when there are no processor credits, if one of the TPCs that was already acquired has sufficient space. The compute work distributor identifies a first processor included in a plurality of processors that has a processing load that is less than or equal to the processor loads associated with all other processors included in the plurality of processors. The compute work distributor launches the first group of threads to execute on the first processor.

Patent Claims

21 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A computer-implemented method for assigning local memory to thread groups within a graphics processing unit, the method comprising: receiving an indication that a first thread group associated with a first subcontext has been assigned to execute on a first processor; identifying a first record in a local memory block assignment table corresponding to the first subcontext; identifying a first local memory block that is currently not assigned; storing a first value in the first record indicating that the first local memory block is assigned to the first subcontext and the first processor; receiving an indication that a second thread group associated with the first subcontext has been assigned to execute on a second processor; identifying a second local memory block that is not currently assigned; and storing a second value in the first record indicating that the second local memory block is assigned to the first subcontext and the second processor.

2. The computer-implemented method of claim 1 , further comprising storing a first index to the first local memory block in the first record.

3. The computer-implemented method of claim 2 , further comprising: storing a second index to the second local memory block in the first record, wherein the second index is greater than the first index.

4. The computer-implemented method of claim 3 , wherein: the first processor and the second processor are included in a processing cluster; and the first local memory block is accessible by both the first processor and the second processor.

5. The computer-implemented method of claim 3 , wherein the first processor and the second processor are included in a plurality of processors, and the local memory block assignment table is initialized by an operating system or a hypervisor prior to launching any group of threads to execute on any processor included in the plurality of processors.

6. The computer-implemented method of claim 2 , further comprising: retrieving the first index to the first local memory block from the first record; and associating the first local memory block with the first index.

7. The computer-implemented method of claim 2 , further comprising transmitting a message to the first processor that includes the first index to the first local memory block.

8. The computer-implemented method of claim 1 , further comprising: determining that the first thread group has completed execution on the first processor; and storing a new value in the first record indicating that first local memory block is not assigned to the first subcontext.

9. The computer-implemented method of claim 1 , further comprising: receiving an indication that a third thread group associated with a second subcontext has been assigned to execute on a third processor; identifying a second record in the local memory block assignment table corresponding to the second subcontext; and determining that a memory block has already been assigned to the second subcontext.

10. The computer-implemented method of claim 1 , wherein with first thread group is launched to execute on the first processor prior to storing the value in the first record indicating that first local memory block is assigned to the first subcontext.

11. A parallel processing system, comprising: a scheduler that transmits a plurality of tasks to a computer work distributor; and a compute work distributor that: selects a task corresponding to a process from a task list associated with a first subcontext, identifies a first thread group associated with the first subcontext that has been assigned to execute on a first processor, determines that the first subcontext has at least one processor credit, identifies a first record in a local memory block assignment table corresponding to the first subcontext, identifies a first local memory block that is currently not assigned, stores a first value in the first record indicating that the first local memory lock is assigned to the first subcontext and the first processor; receives an indication that a second thread group associated with the first subcontext has been assigned to execute on a second processor, identifies a second local memory block that is not currently assigned, and stores a second value in the first record indicating that the second local memory block is assigned to the first subcontext and the second processor.

12. The parallel processing system of claim 11 , wherein the compute work distributor further stores a first index to the first local memory block in the first record.

13. The parallel processing system of claim 12 , wherein the compute work distributor further: stores a second index to the second local memory block in the first record, wherein the second index is greater than the first index.

14. The parallel processing system of claim 13 , wherein: the first processor and the second processor are included in a processing cluster; and the first local memory block is accessible by both the first processor and the second processor.

15. The parallel processing system of claim 13 , wherein the first processor and the second processor are included in a plurality of processors, and the local memory block assignment table is initialized by an operating system or a hypervisor prior to launching any group of threads to execute on any processor included in the plurality of processors.

16. The parallel processing system of claim 12 , wherein the compute work distributor further: retrieves the first index to the first local memory block from the first record; and associates the first local memory block with the first index.

17. The parallel processing system of claim 11 , wherein the first processor accesses the first local memory block via a virtual address space associated with the first subcontext.

18. The parallel processing system of claim 17 , wherein the compute work distributor further: generates a launch packet for the first processor that includes a page directory base address associated with the virtual address space; and transmits the launch packet to the first processor.

19. The parallel processing system of claim 18 , wherein the launch packet further includes local memory assignment information related to the first local memory block.

20. A computer-implemented method for assigning local memory to thread groups within a graphics processing unit, the method comprising: receiving an indication that a first thread group associated with a first subcontext has been assigned to execute on a first processor; identifying a first record in a local memory block assignment table corresponding to the first subcontext; determining whether local memory blocks included in a plurality local memory blocks are statically assigned; if the local memory blocks included in the plurality local memory blocks are not statically assigned, then identifying a first local memory block that is currently not assigned; and storing a first value in the first record indicating that the first local memory block is assigned to the first subcontext and the first processor; receiving an indication that a second thread group associated with the first subcontext has been assigned to execute on a second processor; identifying a second local memory block that is not currently assigned; and storing a second value in the first record indicating that the second local memory block is assigned to the first subcontext and the second processor.

21. The computer-implemented method of claim 20 , further comprising: if the local memory blocks included in the plurality local memory blocks are statically assigned, then retrieving a first index associated with a third local memory block from the first record.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G06F G06T

Patent Metadata

Filing Date

January 31, 2018

Publication Date

October 27, 2020

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search