Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
1. A graphics processing apparatus comprising: one or more execution units, one of which includes a thread reserved for offloading; and an intersection circuit comprising: an intersection unit circuit to test a plurality of rays against a plurality of primitives to identify a closest primitive that each ray intersects, wherein an intersection unit queue is to store work to be performed for testing the plurality of rays in an intersection queue; and an intersection unit offload circuit to monitor the intersection queue to determine a pressure level for testing the plurality of rays, and wherein the intersection circuit is to responsively offload some of the work in the intersection queue to intersection program code executed by the thread on the one or more execution units of the graphics processing apparatus, and wherein the thread is started prior to the work of the intersection circuit being offloaded.
This invention relates to graphics processing, specifically optimizing ray-primitive intersection testing in a graphics processing apparatus. The problem addressed is efficiently managing computational load during ray tracing, where testing rays against geometric primitives can create bottlenecks. The apparatus includes execution units, one of which has a reserved thread for offloading tasks, and an intersection circuit. The intersection circuit tests multiple rays against multiple primitives to identify the closest intersecting primitive. An intersection unit queue stores pending intersection tests. An intersection unit offload circuit monitors the queue to assess workload pressure. When the queue exceeds a certain threshold, the circuit offloads some intersection tests to a pre-started thread executing intersection program code on the execution units. This dynamic workload distribution balances processing between specialized hardware and general-purpose execution units, improving performance and resource utilization. The reserved thread is initialized before offloading occurs, ensuring immediate availability for additional tasks. The system enhances efficiency by adaptively redistributing intersection computations based on real-time queue conditions.
2. The graphics processing apparatus as in claim 1 wherein the intersection circuit is to determine when the intersection queue reaches a specified threshold of work prior to offloading work to the intersection program code.
This invention relates to graphics processing systems, specifically addressing the challenge of efficiently managing intersection computations in real-time rendering. The system includes a graphics processing apparatus with an intersection circuit and an intersection program code. The intersection circuit is responsible for performing intersection calculations, such as determining whether geometric primitives intersect with other objects or surfaces in a scene. To optimize performance, the intersection circuit includes an intersection queue that temporarily stores intersection tasks before they are processed. The intersection program code is a software-based component that handles more complex or time-consuming intersection computations when necessary. The intersection circuit monitors the workload in the intersection queue and determines when it reaches a specified threshold of pending tasks. Once this threshold is met, the intersection circuit offloads some of the intersection tasks to the intersection program code for further processing. This dynamic workload distribution ensures that the intersection circuit does not become overwhelmed, maintaining smooth and efficient rendering performance. The system may also include additional components, such as a geometry processing unit and a rasterization unit, which work in conjunction with the intersection circuit to process and render graphical data. The invention improves rendering efficiency by balancing hardware and software-based intersection computations, particularly in scenarios with high computational demands.
3. The graphics processing apparatus as in claim 1 wherein the intersection circuit implements a Plucker-based test to identify a closest primitive and the intersection program code running on the execution units uses a Möller-Trumbore test to identify a closest primitive.
This invention relates to graphics processing, specifically improving the efficiency and accuracy of ray-primitive intersection tests in real-time rendering. The problem addressed is the computational overhead and potential inaccuracies in determining the closest primitive intersected by a ray, which is critical for applications like ray tracing, collision detection, and visibility determination. The apparatus includes an intersection circuit that performs a Plucker-based test to identify the closest primitive intersected by a ray. The Plucker coordinates provide a robust mathematical framework for handling geometric intersections, particularly in 3D space, by simplifying the intersection calculations. Additionally, the apparatus includes execution units that run an intersection program code implementing the Möller-Trumbore test, another efficient method for ray-triangle intersection. The Möller-Trumbore algorithm is optimized for performance, reducing the number of arithmetic operations required compared to traditional methods. By combining these two approaches—Plucker-based testing in hardware and Möller-Trumbore testing in software—the system ensures both accuracy and computational efficiency. The intersection circuit handles initial filtering or coarse intersection checks, while the execution units refine the results using the Möller-Trumbore test. This hybrid approach minimizes redundant calculations and improves overall rendering performance, making it suitable for high-performance graphics applications.
4. The graphics processing apparatus as in claim 3 wherein the primitives comprise a plurality of triangles.
The invention relates to graphics processing systems designed to efficiently render geometric primitives, particularly triangles, in real-time applications such as video games or computer-aided design. A key challenge in graphics processing is optimizing the rendering pipeline to handle complex scenes with numerous geometric elements while maintaining high performance and visual quality. The apparatus includes a processing unit configured to receive and process geometric primitives, which are fundamental shapes used to construct 3D models. These primitives are specifically triangles, a common choice due to their simplicity and efficiency in rendering. The processing unit applies transformations, such as perspective projection and lighting calculations, to these triangles to prepare them for display. The system may also include a memory unit to store vertex data and a control unit to manage the rendering pipeline stages. By focusing on triangle-based rendering, the apparatus ensures compatibility with standard graphics APIs and hardware acceleration techniques, enabling faster rendering and reduced computational overhead. The invention aims to improve rendering efficiency by leveraging optimized algorithms for triangle processing, including rasterization and depth testing, to enhance frame rates and image quality in real-time graphics applications.
5. The graphics processing apparatus as in claim 1 further comprising: a traversal circuit to traverse each ray against a bounding volume hierarchy (BVH) or other data structure.
This invention relates to graphics processing, specifically improving the efficiency of ray traversal in rendering systems. The problem addressed is the computational overhead in ray tracing, where rays must be tested against complex scene geometries to determine intersections. Traditional methods often involve inefficient traversal of spatial data structures, leading to performance bottlenecks. The apparatus includes a traversal circuit designed to optimize ray traversal against a bounding volume hierarchy (BVH) or similar spatial data structure. The BVH organizes scene geometry into nested bounding volumes, allowing for hierarchical culling of rays that do not intersect with the scene. The traversal circuit efficiently navigates this hierarchy, minimizing the number of intersection tests required. It may also support alternative data structures, such as kd-trees or octrees, depending on the scene's characteristics. The traversal circuit works in conjunction with a ray generation circuit that produces rays for intersection testing. These rays are directed through the traversal circuit, which processes them by traversing the BVH or other data structure to identify potential intersections with scene geometry. The circuit may include logic for early termination of traversal paths, further reducing computational overhead. Additionally, it may support parallel processing to handle multiple rays simultaneously, improving overall rendering performance. The invention aims to enhance the speed and efficiency of ray tracing by optimizing the traversal of spatial data structures, making it particularly useful in real-time rendering applications.
6. The graphics processing apparatus as in claim 5 , wherein the traversal circuit is to: store work to be performed for traversing each ray in a traversal queue, monitor the traversal queue to determine a pressure level for traversing the each ray in the traversal queue, and responsively offload some of the work in the traversal queue to traversal program code executed on the one or more of the execution units of the graphics processing apparatus.
A graphics processing apparatus includes a traversal circuit designed to manage ray traversal operations efficiently. The apparatus operates in the domain of real-time ray tracing, where the challenge is to balance computational load across hardware resources to avoid bottlenecks and ensure smooth performance. The traversal circuit stores work items for ray traversal in a traversal queue, where each work item represents tasks needed to process a ray through a scene's acceleration structures. The circuit monitors the queue to assess the pressure level, which indicates the workload demand. If the pressure exceeds a threshold, the circuit offloads some of the traversal work to traversal program code running on the graphics processing apparatus's execution units. This dynamic offloading mechanism helps distribute the workload, preventing queue congestion and improving overall system efficiency. The execution units may include programmable shaders or fixed-function hardware, depending on the implementation. By adaptively managing traversal tasks, the apparatus ensures optimal resource utilization and maintains high performance in ray-tracing applications.
7. The graphics processing apparatus as in claim 6 wherein at least some results generated by the traversal circuit and the traversal program code are to be stored in the intersection queue.
A graphics processing apparatus is designed to efficiently handle geometric computations, particularly for tasks like ray tracing or intersection testing in computer graphics. The apparatus includes a traversal circuit and a traversal program code that work together to process geometric data, such as bounding volume hierarchies (BVHs) or other spatial acceleration structures. The traversal circuit executes the traversal program code to traverse these structures, identifying intersections between rays and geometric primitives. The traversal program code may include instructions for traversal logic, intersection tests, and other operations required for efficient geometric processing. The apparatus further includes an intersection queue, which stores at least some of the results generated by the traversal circuit and the traversal program code. These results may include intersection points, traversal states, or other intermediate data needed for further processing. The intersection queue allows the apparatus to manage and prioritize these results, ensuring efficient handling of geometric computations. The traversal circuit and program code may dynamically update the intersection queue as traversal progresses, enabling real-time or near-real-time processing of complex scenes. This design improves performance in graphics rendering by reducing latency and optimizing resource usage.
8. The graphics processing apparatus as in claim 6 wherein, regardless of the pressure levels on the traversal queue and/or the intersection queue, the traversal and/or the intersection circuits are to offload work to traversal program code or intersection program code, respectively, executed on the execution units if it is determined that the execution units are busy below a specified threshold.
This invention relates to graphics processing systems, specifically addressing the efficient distribution of workload between hardware circuits and programmable execution units. In graphics processing, traversal and intersection operations are critical for tasks like ray tracing, where rays are tested against geometric primitives. The invention improves performance by dynamically offloading these operations between dedicated hardware circuits and software executed on general-purpose execution units. The traversal circuit handles spatial data traversal, while the intersection circuit performs ray-primitive intersection tests. The key innovation is a mechanism that monitors the workload of both the traversal and intersection queues. If the execution units (e.g., shader cores) are underutilized—below a specified threshold—the system offloads traversal or intersection tasks to software running on those units, even if the hardware queues are not fully loaded. This ensures optimal resource utilization by balancing hardware and software processing based on real-time system conditions. The approach prevents bottlenecks by dynamically adapting to varying workloads, improving overall graphics rendering efficiency.
9. The graphics processing apparatus as in claim 1 wherein the intersection program code comprises a plurality of Single Instruction Multiple Data (SIMD) instructions to identify the closest primitives that each ray intersects.
This invention relates to graphics processing, specifically optimizing ray-tracing operations to efficiently identify the closest primitives intersected by rays. In ray tracing, determining which geometric primitives (e.g., triangles, polygons) a ray intersects is computationally intensive, particularly in complex scenes with many objects. The invention addresses this by using a specialized intersection program code that leverages Single Instruction Multiple Data (SIMD) instructions to accelerate the intersection calculations. The intersection program code processes multiple rays simultaneously using SIMD parallelism, allowing the hardware to execute the same instruction on multiple data elements in parallel. This approach reduces the time required to identify the closest primitives intersected by each ray, improving overall rendering performance. The apparatus includes a ray generator to produce rays, a primitive storage to hold geometric data, and a processing unit that executes the SIMD-based intersection program to determine intersections efficiently. By utilizing SIMD instructions, the invention minimizes redundant computations and maximizes hardware utilization, making it particularly effective for real-time rendering applications where performance is critical. The solution is applicable to graphics processing units (GPUs) and other specialized hardware designed for ray tracing.
11. The graphics processing apparatus as in claim 1 further comprising: traversal unit program code to be executed by an execution unit (EU), the traversal unit program code to traverse each ray against a bounding volume hierarchy (BVH) or other data structure.
A graphics processing apparatus includes a traversal unit program code designed to be executed by an execution unit (EU). The traversal unit program code is configured to traverse each ray against a bounding volume hierarchy (BVH) or another data structure. The apparatus also includes a ray generation unit program code to generate rays for processing, a ray distribution unit program code to distribute the rays to multiple execution units, and a ray processing unit program code to process the rays. The ray processing unit program code includes a ray intersection unit program code to determine intersections between the rays and geometric primitives, and a ray shading unit program code to compute shading for the rays. The apparatus further includes a memory to store the program codes and data structures, and a control unit to manage the execution of the program codes. The traversal unit program code efficiently navigates the BVH or other data structure to accelerate ray traversal, improving rendering performance in real-time graphics applications. The system optimizes ray processing by distributing workloads across multiple execution units, reducing computational overhead and enhancing parallelism. This approach is particularly useful in ray tracing, where fast and accurate ray-object intersection tests are critical for high-quality visual output.
12. A graphics processing apparatus comprising: one or more execution units, one of which includes a thread reserved for offloading; and a traversal circuit comprising: a traversal unit circuit to traverse each of a plurality of rays against a bounding volume hierarchy (BVH) or other data structure, wherein a traversal unit queue is to store work to be performed for traversing each of the plurality of rays in a traversal queue; and a traversal unit offload circuit to monitor the traversal queue to determine a pressure level for traversing each of the plurality of rays, and wherein the traversal circuit is to responsively offload some of the work in the traversal queue to traversal program code executed by the thread on the one or more execution units of the graphics processing apparatus, and wherein the thread is started prior to the work of the traversal circuit being offloaded.
This invention relates to graphics processing, specifically optimizing ray traversal operations in real-time rendering. The problem addressed is efficiently managing the workload of ray traversal against hierarchical data structures like bounding volume hierarchies (BVHs) to prevent bottlenecks in graphics processing units (GPUs). The apparatus includes execution units with at least one thread reserved for offloading traversal tasks. A traversal circuit handles ray traversal, featuring a traversal unit circuit that processes rays against a BVH or similar structure. A traversal unit queue stores pending traversal work for multiple rays. A traversal unit offload circuit monitors the queue to assess workload pressure, dynamically offloading tasks to a traversal program executed by the reserved thread on the GPU's execution units. The offloaded thread is pre-initialized before any work is delegated, ensuring immediate availability. This hybrid approach balances hardware-accelerated traversal with programmable offloading, improving performance by dynamically redistributing workloads based on real-time demand. The system avoids stalling by proactively shifting tasks to software execution when hardware resources are saturated, enhancing efficiency in ray tracing pipelines.
13. The graphics processing apparatus as in claim 12 wherein the traversal circuit is to determine when the traversal queue reaches a specified threshold of work prior to offloading work to the traversal program code.
The invention relates to graphics processing systems, specifically improving efficiency in traversal operations for rendering pipelines. The problem addressed is the inefficiency in workload distribution between hardware and software components during traversal, which can lead to bottlenecks and suboptimal performance. The apparatus includes a traversal circuit that manages a traversal queue containing tasks for processing graphics data. The traversal circuit monitors the queue and determines when the workload reaches a specified threshold before offloading tasks to a traversal program code running on a processor. This threshold-based approach ensures that hardware resources are utilized efficiently, avoiding premature offloading to software when hardware can handle the workload. The traversal program code executes the offloaded tasks, allowing the system to balance workload distribution dynamically. The traversal circuit may also prioritize tasks based on urgency or complexity, further optimizing performance. This system enhances rendering efficiency by dynamically adjusting workload distribution between hardware and software components.
14. The graphics processing apparatus as in claim 12 further comprising: an intersection circuit to test a plurality of rays against a plurality of primitives to identify a closest primitive that each ray intersects.
This invention relates to graphics processing, specifically improving ray-tracing performance by efficiently identifying the closest primitive intersected by each ray. In ray tracing, determining which geometric primitive (e.g., triangle, polygon) a ray intersects is computationally expensive, especially when multiple primitives lie along a ray's path. The invention addresses this by introducing an intersection circuit that tests multiple rays against multiple primitives in parallel, then identifies the closest primitive each ray intersects. This avoids redundant calculations and accelerates rendering. The intersection circuit may work alongside a ray generation circuit that generates rays from a pixel grid or other input, and a primitive storage that organizes primitives in a hierarchical structure (e.g., a bounding volume hierarchy) to further optimize intersection testing. The system may also include a ray sorting circuit to group rays with similar directions or origins, improving cache efficiency. The invention is particularly useful in real-time rendering applications where performance is critical, such as video games or virtual reality. By parallelizing intersection tests and prioritizing the closest intersections, the apparatus reduces processing time and power consumption compared to traditional sequential methods.
15. The graphics processing apparatus as in claim 12 further comprising: intersection unit program code to be executed on an executed unit (EU) to test a plurality of rays against a plurality of primitives to identify a closest primitive that each ray intersects.
This invention relates to graphics processing, specifically improving ray-tracing performance by efficiently identifying the closest primitive intersected by each ray. In ray tracing, determining intersections between rays and geometric primitives (e.g., triangles, polygons) is computationally intensive, especially when handling complex scenes with many rays and primitives. The invention addresses this by introducing an intersection unit that executes program code on execution units (EUs) to test multiple rays against multiple primitives simultaneously. The intersection unit identifies the closest primitive that each ray intersects, optimizing the process by reducing redundant calculations and improving parallelism. The system may include a ray generation unit to produce rays and a primitive storage unit to hold geometric data. The intersection unit processes these inputs in parallel, accelerating the intersection testing phase. This approach enhances rendering speed and efficiency in real-time graphics applications, such as video games and virtual reality, by minimizing the computational overhead of ray-primitive intersection tests. The invention may also include additional features like dynamic workload distribution and adaptive precision control to further optimize performance based on scene complexity.
16. A method comprising: testing a plurality of rays against a plurality of primitives with an intersection circuit to identify a closest primitive that each ray intersects, wherein the intersection circuit is within a graphics processing apparatus; storing work to be performed for testing the plurality of rays in an intersection queue; monitoring the intersection queue to determine a pressure level for testing the plurality of rays; and responsively offloading some of the work in the intersection queue to intersection program code executed by a thread on one or more execution units of the graphics processing apparatus, wherein one of the execution units includes the thread reserved for offloading, and wherein the thread is started prior to the work of the intersection circuit being offloaded.
This invention relates to graphics processing, specifically optimizing ray-primitive intersection testing in a graphics processing apparatus. The problem addressed is efficiently managing the computational load of ray intersection tests, which can become a bottleneck in real-time rendering applications. The solution involves a hybrid approach combining hardware acceleration and software-based processing. The method tests multiple rays against multiple geometric primitives using a dedicated intersection circuit within the graphics processing apparatus to identify the closest primitive each ray intersects. Work for these tests is stored in an intersection queue. The system monitors the queue to assess the workload pressure. If the queue exceeds a certain threshold, some of the intersection tests are offloaded to software-based intersection program code executed by a thread on one or more execution units of the graphics processing apparatus. A specific execution unit is reserved for this offloading task, and the thread is pre-initialized before any work is delegated to it. This dynamic load balancing ensures efficient utilization of hardware and software resources, preventing bottlenecks during high workload periods. The approach improves performance in graphics rendering pipelines by adaptively distributing intersection computations between hardware and software components.
17. The method as in claim 16 further comprising: determining when the intersection queue reaches a specified threshold of work prior to offloading work to the intersection program code.
The invention relates to a system for managing work distribution in a computing environment, particularly in scenarios where multiple processing units or programs need to coordinate task execution. The problem addressed is inefficient workload distribution, which can lead to bottlenecks, resource underutilization, or delays in processing tasks. The system includes a primary program code that generates work items and an intersection program code that processes these items. The primary program code maintains an intersection queue to temporarily hold work items before they are offloaded to the intersection program code for execution. The intersection queue allows the primary program code to continue generating work items while the intersection program code processes them, improving overall system efficiency. The method further includes monitoring the intersection queue to determine when it reaches a specified threshold of work items. Once the threshold is reached, the primary program code offloads the accumulated work items to the intersection program code for processing. This ensures that work is distributed in manageable batches, preventing overloading of the intersection program code while maintaining continuous operation of the primary program code. The threshold-based approach optimizes resource utilization and reduces processing delays by balancing the workload between the primary and intersection program codes.
18. The method as in claim 16 wherein the intersection circuit implements a Plucker-based test to identify a closest primitive and the intersection program code running on the execution units uses a Möller-Trumbore test to identify a closest primitive.
This invention relates to computer graphics rendering, specifically methods for efficiently determining intersections between rays and geometric primitives in a scene. The problem addressed is the computational cost of ray-primitive intersection tests, which are critical for applications like ray tracing but can be computationally expensive. The method involves a two-stage intersection process. First, an intersection circuit performs a Plucker-based test to quickly identify potential candidate primitives that may intersect with a ray. The Plucker-based test is a geometric method that uses Plücker coordinates to efficiently narrow down the set of primitives that need further evaluation. This reduces the number of more computationally intensive tests required. Next, an intersection program code running on execution units performs a Möller-Trumbore test on the candidate primitives identified by the Plucker-based test. The Möller-Trumbore test is a well-known algorithm for ray-triangle intersection that provides precise results. By combining these two tests, the method balances speed and accuracy, reducing overall computational overhead while ensuring correct intersection detection. The approach is particularly useful in real-time rendering systems where performance is critical, as it minimizes the number of expensive intersection calculations while maintaining accuracy. The intersection circuit and execution units work together to efficiently process ray-primitive intersections, improving rendering performance in graphics applications.
19. The method as in claim 18 wherein the primitives comprise a plurality of triangles.
The invention relates to computer graphics and rendering techniques, specifically addressing the challenge of efficiently processing and displaying complex geometric shapes in real-time applications. The method involves decomposing three-dimensional models into smaller, simpler geometric elements called primitives, which are then processed to generate a visual representation. The key innovation lies in the use of a plurality of triangles as the primitives, which are fundamental building blocks in modern graphics rendering due to their simplicity and computational efficiency. By breaking down complex surfaces into triangles, the system can accurately approximate the original geometry while minimizing computational overhead. This approach is particularly useful in applications such as video games, virtual reality, and real-time simulations, where performance and visual fidelity are critical. The method ensures that the triangles are processed in a way that maintains the integrity of the original model while optimizing rendering speed. This technique leverages existing graphics hardware capabilities, such as programmable shaders and rasterization pipelines, to efficiently handle the triangle-based geometry. The use of triangles as primitives allows for precise control over surface details and lighting calculations, enhancing the overall visual quality of the rendered output. The method may also include additional steps to optimize the triangle mesh, such as reducing the number of triangles or simplifying the geometry without significantly affecting the visual appearance. This optimization is crucial for maintaining high frame rates in real-time applications. The invention provides a robust solution for rendering complex three-dimensional models efficiently and accurately,
Unknown
February 18, 2020
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.