Method and Apparatus for Load Balancing in a Ray Tracing Architecture

PublishedFebruary 18, 2020

Assigneenot available in USPTO data we have

Technical Abstract

Patent Claims

18 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A graphics processing apparatus comprising: one or more execution units, one of which includes a thread reserved for offloading; and an intersection circuit comprising: an intersection unit circuit to test a plurality of rays against a plurality of primitives to identify a closest primitive that each ray intersects, wherein an intersection unit queue is to store work to be performed for testing the plurality of rays in an intersection queue; and an intersection unit offload circuit to monitor the intersection queue to determine a pressure level for testing the plurality of rays, and wherein the intersection circuit is to responsively offload some of the work in the intersection queue to intersection program code executed by the thread on the one or more execution units of the graphics processing apparatus, and wherein the thread is started prior to the work of the intersection circuit being offloaded.

2. The graphics processing apparatus as in claim 1 wherein the intersection circuit is to determine when the intersection queue reaches a specified threshold of work prior to offloading work to the intersection program code.

3. The graphics processing apparatus as in claim 1 wherein the intersection circuit implements a Plucker-based test to identify a closest primitive and the intersection program code running on the execution units uses a Möller-Trumbore test to identify a closest primitive.

4. The graphics processing apparatus as in claim 3 wherein the primitives comprise a plurality of triangles.

5. The graphics processing apparatus as in claim 1 further comprising: a traversal circuit to traverse each ray against a bounding volume hierarchy (BVH) or other data structure.

6. The graphics processing apparatus as in claim 5 , wherein the traversal circuit is to: store work to be performed for traversing each ray in a traversal queue, monitor the traversal queue to determine a pressure level for traversing the each ray in the traversal queue, and responsively offload some of the work in the traversal queue to traversal program code executed on the one or more of the execution units of the graphics processing apparatus.

7. The graphics processing apparatus as in claim 6 wherein at least some results generated by the traversal circuit and the traversal program code are to be stored in the intersection queue.

8. The graphics processing apparatus as in claim 6 wherein, regardless of the pressure levels on the traversal queue and/or the intersection queue, the traversal and/or the intersection circuits are to offload work to traversal program code or intersection program code, respectively, executed on the execution units if it is determined that the execution units are busy below a specified threshold.

9. The graphics processing apparatus as in claim 1 wherein the intersection program code comprises a plurality of Single Instruction Multiple Data (SIMD) instructions to identify the closest primitives that each ray intersects.

11. The graphics processing apparatus as in claim 1 further comprising: traversal unit program code to be executed by an execution unit (EU), the traversal unit program code to traverse each ray against a bounding volume hierarchy (BVH) or other data structure.

12. A graphics processing apparatus comprising: one or more execution units, one of which includes a thread reserved for offloading; and a traversal circuit comprising: a traversal unit circuit to traverse each of a plurality of rays against a bounding volume hierarchy (BVH) or other data structure, wherein a traversal unit queue is to store work to be performed for traversing each of the plurality of rays in a traversal queue; and a traversal unit offload circuit to monitor the traversal queue to determine a pressure level for traversing each of the plurality of rays, and wherein the traversal circuit is to responsively offload some of the work in the traversal queue to traversal program code executed by the thread on the one or more execution units of the graphics processing apparatus, and wherein the thread is started prior to the work of the traversal circuit being offloaded.

13. The graphics processing apparatus as in claim 12 wherein the traversal circuit is to determine when the traversal queue reaches a specified threshold of work prior to offloading work to the traversal program code.

14. The graphics processing apparatus as in claim 12 further comprising: an intersection circuit to test a plurality of rays against a plurality of primitives to identify a closest primitive that each ray intersects.

15. The graphics processing apparatus as in claim 12 further comprising: intersection unit program code to be executed on an executed unit (EU) to test a plurality of rays against a plurality of primitives to identify a closest primitive that each ray intersects.

16. A method comprising: testing a plurality of rays against a plurality of primitives with an intersection circuit to identify a closest primitive that each ray intersects, wherein the intersection circuit is within a graphics processing apparatus; storing work to be performed for testing the plurality of rays in an intersection queue; monitoring the intersection queue to determine a pressure level for testing the plurality of rays; and responsively offloading some of the work in the intersection queue to intersection program code executed by a thread on one or more execution units of the graphics processing apparatus, wherein one of the execution units includes the thread reserved for offloading, and wherein the thread is started prior to the work of the intersection circuit being offloaded.

17. The method as in claim 16 further comprising: determining when the intersection queue reaches a specified threshold of work prior to offloading work to the intersection program code.

18. The method as in claim 16 wherein the intersection circuit implements a Plucker-based test to identify a closest primitive and the intersection program code running on the execution units uses a Möller-Trumbore test to identify a closest primitive.

19. The method as in claim 18 wherein the primitives comprise a plurality of triangles.

Patent Metadata

Filing Date

Unknown

Publication Date

February 18, 2020

Inventors

TOMAS G. AKENINE-MOLLER

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search