Rendering systems that can use combinations of rasterization rendering processes and ray tracing rendering processes are disclosed. In some implementations, these systems perform a rasterization pass to identify visible surfaces of pixels in an image. Some implementations may begin shading processes for visible surfaces, before the geometry is entirely processed, in which rays are emitted. Rays can be culled at various points during processing, based on determining whether the surface from which the ray was emitted is still visible. Rendering systems may implement rendering effects as disclosed.
Legal claims defining the scope of protection, as filed with the USPTO.
. A machine-implemented method for rendering a 3D scene, the method comprising, for a frame of the 3D scene:
. The machine-implemented method of, further comprising:
. The machine-implemented method of, wherein the status data for the ray is updated to indicate that the ray is not current in response to the processing geometry data determining that a new element of geometry is visible at the sample position associated with the ray.
. The machine-implemented method of, wherein the ray is associated with an element of geometry, and wherein the status data for the ray is updated to indicate that the ray is not current in response to the processing geometry data determining that the element of geometry associated with the ray does not contribute to the rendering of the 3D scene.
. The machine-implemented method of, wherein said terminating the processing of the ray comprises setting a flag to indicate that the ray can be culled.
. The machine-implemented method of, wherein said ray tracing operations comprise one or more of traversing an acceleration structure, testing the ray for intersection with one or more primitives or shading of the ray for identified intersections.
. The machine-implemented method of, wherein the ray is associated with a hash of a combination of a sample ID and an ID of an element of geometry.
. The machine-implemented method of, wherein said element of geometry is associated with shader code.
. The machine-implemented method of, wherein said processing geometry data comprises using rasterization operations.
. The machine-implemented method of, wherein said status data comprises visibility data.
. The machine-implemented method of, wherein said status data comprises position data.
. The machine-implemented method of, wherein said position data comprises vertex data.
. The machine-implemented method of, wherein said updating the status data for the ray comprises providing an identifier for the ray.
. The machine-implemented method of, further comprising storing definition data for the ray associated with the identifier for the ray.
. The machine-implemented method of, wherein said updating the status data for the ray is based on the results of the rasterization operations.
. The machine-implemented method of, wherein said terminating the processing of the ray comprises terminating one or more shadow rays.
. The machine-implemented method of, wherein said terminating the processing of the ray further comprises using vertex data to terminate the processing of the ray.
. The machine-implemented method of, further comprising providing one or more decision criteria or flags for determining whether or not to terminate the processing of the ray.
. An apparatus for rendering a 3D scene, the apparatus comprising:
. The apparatus of, wherein the circuitry is further configured to:
. The apparatus of, wherein the circuitry is further configured to update the status data for the ray to indicate that the ray is not current in response to the processing geometry data determining that a new element of geometry is visible at the sample position associated with the ray.
. The apparatus of, wherein the ray is associated with an element of geometry, and wherein the circuitry is further configured to update the status data for the ray to indicate that the ray is not current in response to the processing geometry data determining that the element of geometry associated with the ray does not contribute to the rendering of the 3D scene.
. The apparatus of, wherein the circuitry is further configured to set a flag to indicate that the ray can be culled to thereby terminate the processing of the ray.
. The apparatus ofcomprising:
. A non-transitory computer readable storage medium having stored thereon computer readable code in a hardware description language that, when processed, enables fabrication of an apparatus for rendering a 3D scene, wherein the apparatus comprises circuitry configured to perform operations comprising:
Complete technical specification and implementation details from the patent document.
This application claims priority from U.S. Provisional Pat. App. No. 61/919,701, filed on Dec. 20, 2013, which is incorporated by reference for all purposes herein.
In one aspect, the disclosure generally relates to 3-D rendering systems, system architectures, and methods, and in a more particular aspect, the disclosure relates to systems, architectures, and methods for asynchronous and concurrent hybridized rendering, such as hybridized ray tracing and rasterization-based rendering.
Graphics Processing Units (GPUs) provide highly parallelized rasterization-based rendering hardware. A traditional graphics processing unit (GPU) used a fixed pipeline only for rendering polygons with texture maps and gradually evolved to a more flexible pipeline that allows programmable vertex and fragment stages. Even though modern GPUs support more programmability of geometry and pixel processing, a variety of functions within a GPU are implemented in fixed function hardware. Modern GPUs can range in complexity, with high performance GPUs having transistor budgets on the order of 4-6 billion transistors. GPUs are often used in real time rendering tasks, and optimizations for many GPU applications involve determining shortcuts to achieve a desired throughput of frames per second, while maintaining a desired level of subjective video quality. For example, in a video game, realistic modeling of light behavior is rarely an objective; rather, achieving a desired look or rendering effect is often a principal objective.
Traditionally, ray tracing is a technique used for high quality, non-real time graphics rendering tasks, such as production of animated movies, or producing 2-D images that more faithfully model behavior of light in different materials. In ray tracing, control of rendering and pipeline flexibility to achieve a desired result were often more critical issues than maintaining a desired frame rate. Also, some of the kinds of processing tasks needed for ray tracing are not necessarily implementable on hardware that is well-suited for rasterization.
One aspect relates to a machine-implemented method of graphics processing. The method comprises beginning to rasterize a stream of geometry for a frame of pixels. A value of each pixel is defined based on one or more samples for that pixel. The rasterization comprises determining a currently-visible element of geometry at each sample for each pixel in the frame of pixels. The currently-visible element of geometry at each sample may be updated as the rasterization of the stream of geometry proceeds. Responsive to determining the currently-visible element of geometry for a particular sample, a shader is run for that currently-visible element of geometry. The running of the shader comprises emitting a ray to be traced within a 3-D scene in which elements of the geometry are located. The ray associated with the particular sample. Prior to completion of the processing of the ray, a determination is made whether a currently-visible element of geometry for the sample associated with the ray is the same element of geometry that was visible at that sample when the ray was emitted. If so, then the ray is continued to be processed. Otherwise, processing for the ray is terminated. Systems that perform an implementation of such a process may also be provided. Such systems can operate according to an immediate mode rendering approach or a deferred mode rendering approach. Deferred mode rendering approaches can implement one or more passes to determine final object visibility, where each pass involves processing only a portion of the total geometry. Various other implementations and aspects are disclosed, and this summary is not limiting as to processes or apparatuses that implement any aspect of the disclosure.
Implementations can function using immediate mode geometry submission, can tile geometry and handle hidden surface removal tile-by-tile, can defer shading, can perform hidden surface removal on tiles and defer rendering. Implementations can perform partial renders of an entire geometry submission, and perform ray culling between such geometry submissions, for example.
The following description is presented to enable a person of ordinary skill in the art to make and use various aspects of the inventions. Descriptions of specific techniques, implementations and applications are provided only as examples. Various modifications to the examples described herein may be apparent to those skilled in the art, and the general principles defined herein may be applied to other examples and applications without departing from the scope of the invention. Although systems and methods of rasterization have developed and been implemented largely separate from systems and methods of ray tracing, there have been some efforts to combine the two. For example, it has been proposed to use a first pass of rasterization to determine object visibility and then to use a subsequent ray tracing pass to determine whether those visible surfaces are directly occluded by geometry from being illuminated by lights that emanate light into the 3-D scene.
In one aspect of the disclosure, ray tracing tasks proceed concurrently with rasterization tasks. Techniques to avoid performing ray tracing tasks that can be determined not to contribute to a final rendering product are disclosed.
depicts a block diagram of an example systemin which hybrid ray tracing and rasterization aspects disclosed herein can be implemented. In particular, systemis generally consistent with an immediate mode rasterization approach. Systemincludes a geometry unitthat can include a tesselator. Geometry unitproduces a stream of geometry. Geometry streamis inputted to a primitive setup process; for example, triangles may be a type of primitive that is processed. Processmay be implemented in a programmable processor module, fixed function circuitry, or a combination thereof. Geometry from streamalso is stored in geometry database. In some implementations, geometry streamis modified by a geometry modifier, which can modify the geometry prior to storage in geometry database. For example, geometry modifiermay reduce a number of primitives that represent a given shape. Geometry unitalso may store geometry directly in geometry database.
Geometry unitmay also output geometry streamfor a 3-D scene to an acceleration structure builder. Geometry output to acceleration structure buildermay be simplified, or otherwise have a reduce triangle count, compared with a set of source geometry, with tessellated geometry, or with geometry modified according to a modification procedure. Geometry provided to acceleration structure builderalso may be provided from geometry modifier.
Primitive setupoperates to define a relationship of a given primitive to a perspective from which a 2-D image will be rendered. For example, line equations for a triangular primitive may be setup. A parameter generation moduleis configured to generate per-sample parameters for the primitive. In some implementations, samples correspond to pixels of a 2-D image to be rendered; in other situations, samples may be multisampled for pixels, may be randomly or pseudorandomly selected, and so on. The per-sample parameters may include parameters such as depth of the primitive for the sample, and interpolated values for parameters associated with vertices forming the primitive.
A visible surface determination moduleuses at least some of the per-sample parameters, and data compiled for each sample to determine whether (or for which), if any, samples the primitive being processed provides a visible surface. Primitive setupmay involve setting a primitive up to be scanned for overlap with a set of samples of a frame of pixels (where each pixel may be formed from data taken from one or more samples). Sinceprimarily relates to an example implementation of the disclosure when triangles from the stream are immediate scan converted, rasterized and shaded (rather than deferring shading),may have various portions parallelized, such as a separate processesandfor each primitive in flight.
Visible surface determination moduleinterfaces with a Z buffer, which stores one or more depths of surfaces. The depth information stored in Z buffermay depend on how geometry is submitted. For example, if geometry is submitted in a known order, and opaque geometry is segregated from translucent geometry, then a depth of a closest opaque surface can be maintained. If translucency of geometry cannot be determined before shader execution, and geometry is not guaranteed to be sorted, then geometry may not be able to be culled using a visible surface determinationand Z bufferthat operates on per-sample depth calculations, such as by interpolating vertex parameters for a given primitive. In immediate mode rendering, it is commonly required to have geometry sorted before submission for rasterization, and also to segregate opaque from translucent geometry, so the above consideration may be addressed by convention.
Data concerning visible surfaces, as determined by visible surface determination module, can be provided for shading, with shading process(es). A separate shading process may be invoked per pixel, per surface, or per sample, for example. Shader code executed for each shading process may be associated with the source geometry. A ray processing controlalso may receive data concerning visible surfaces, as determined by visible surface determination module. Visible surface determination modulecan implement immediate mode rendering, in which each primitive is processed as it becomes available, and in contrast to deferred mode rendering; tiling can be implemented of geometry can be implemented in immediate and deferred rendering modes. A memory managercan manage and allocate portions of a memory. Some implementations of the disclosure also may implement deferred rendering, such as a tile-based deferred rendering approach. In some implementations or operating modes, samples may be taken one per pixel. In other implementations or operating modes, multiple samples may be taken for each pixel. Samples may correspond to particular areas within a 2-D grid of pixels being rendered; however, in some situations, samples also may correspond to a surface in a 3-D scene, or to an arbitrary point. As such, the usage of “sample” in this disclosure is generic to these options, and does not imply any requirement that rays be associated only with screen pixels or parts of such.
Ray processing controlmaintains in-process ray data. Ray processing control also controls ray traversal moduleand ray shading module. These modules may be implemented as one or more processes running on programmable hardware, on fixed function hardware or a combination thereof. A geometry databaseand an acceleration structure databasemay provide inputs to ray traversaland ray shadingmodules. A sample bufferprovides storage for outputs of shading process(es)and ray shading modules. Sample buffermay provide separate locations to store multiple values resulting from each ray shading processing and at least one rasterization process that affects a given pixel. Sample buffermay be implemented for one or more tiles or subsets of an entire rendering. Sample buffermay be used to update or otherwise be later-synchronized with a full buffer, for example.
depicts that ray traversal moduleand ray shading modulecan access Z buffer. Each of ray traversal moduleand ray shading modulemay represent a plurality of independently executing threads executing on programmable processor core(s), fixed function logic, and/or combinations of these. Accessing Z buffermay involve accessing a specific location in Z bufferfor a sample associated with a ray that is taken up for processing (e.g., traversal through an acceleration structure or a portion thereof, or shading). Each of ray traversal moduleand ray shading modulemay access Z bufferin order to determine whether a given ray should be further processed or not. For example, more computation and data access is avoided if a ray can be culled close to a start of traversing an acceleration structure rather than closer to completing such traversal. Ray shaders also may vary in computation cost and amounts of data to be accessed. Such costs can be hinted by shader code and used by a ray shader setup process to indicate how aggressively a given ray shader should be culled.
Such check may be conducted or not, depending on a cost of performing the operation, the amount of computation or data access that may be saved versus costs incurred to perform the check. In some cases, data in Z buffermay include ranges of depths for a set of pixels, or a hierarchy of ranges for differently sized groupings of pixels. These depth ranges may serve in one or more preliminary stages of culling of rays, so that fewer memory accesses will be required of a larger memory. In an example, such depth ranges can be cached in a small private cache that is accessed by ray processing control, or by ray traversaland ray shadingmodules. Where tiling is used, depth ranges may be maintained for a tile or a group of tiles, for example. Ranges can be maintained when new information is available, such as based on output of a calculation unit such as by per-sample parameter generation.
Contribution verification logicreceives outputs of ray shadingand also can access Z buffer. Contribution verification logicdetermines whether geometry has completed processing, that a given ray shading result is valid and thus can be written to a sample buffer. Contribution verification logicmay perform this check by determining whether a depth or other identifying information for a combination of ray and sample (see below) indicates that such ray may still contribute to a non-occluded surface. In some implementations, processing of geometry may need to entirely complete, such that a final visible surface for a given sample is finally determined, before it can be determined whether any result from ray shading may be stored. In some implementations, sample data for completed ray shaders may be stored, and then committed after it is determined that the result is valid. Such logicalso may determine that a given ray shading result is invalid, even though a valid result for a given sample is not yet known. Such logicmay operate to provide a correct rendering solution, in that logicshould be designed to avoid having a result of shading a ray for a now-obscured surface used to produce a rendering result. By contrast, ray traversaland ray shadingoperate to opportunistically avoid computation or data access, but do not necessarily need to identify each opportunity to cull a ray or other computation process relating thereto. In some implementations, logicwould be arranged so that false culling is avoided, at the expense of potentially performing unnecessary shading.
It should be understood that system/apparatuscan be configured to operate in a variety of ways. For example, ray processing controlcan be configured to generate ray culling signals directed to ray traversaland ray shading, and can generate these signals based on data received from visible surface determination module, or by accessing relevant data from Z buffer.depicts an example Z buffer. In, each pixel of a plurality of pixels is associated with one or more samples. Each sample may have an associated depth to a surface found to be visible for that sample; such depth may be initialized and updated as visible surfaces are found. Z bufferalso may include an indication whether it is acceptable to drop rays that are associated with farther surfaces. This aspect can be influenced by a depth compare mode in effect. Some implementations of the disclosure will identify a single closest visible surface, regardless whether surfaces are opaque or translucent, for a set of geometry. Then, effects associated with translucency of geometry will be handled through ray tracing. Thus, in such implementations, at a time of geometry sorting. whether or not a given surface is opaque or not may be disregarded. However, other implementations may involve rasterization shaders contributing color for opaque surfaces that are visible through translucent surfaces. Then, even though the opaque surface is farther, it would not be appropriate to drop rays emitted for shading that surface.
As another example, a signal can be provided, such as from visible surface determination module, which indicates when an updated depth is available for a given sample. Then, ray processing controlcan identify which rays, identified by in-process ray data, can be culled based on that update, if any.
depicts an example system that implements a deferred rendering implementation of aspects of the disclosure. A geometry modification unitcommunicates with a memory manager. Memory managerhas access to a free block list, which defines a set of locations in a memory available to receive tiling data, which comprises data indicating which primitives overlap which portions of a 2-D screen space. This is an operation that considers overlap of primitives transformed from a world space coordinate system into a screen space and depth representation. A macrotiling enginemay be provided to perform an initial coarse mapping of primitives to a set of coarse-grid screen space subdivisions (for the sake of convenience, “macrotiles”). At this stage, only 2-D overlap may be determined, and visibility determinations are postponed. Object dataalso may be provided to tiling engine. Tiling engineperforms a mapping of primitives to a finer scale subdivision of the 2-D pixel grid (called “tiles” for convenience). For example, each macrotile may have 4, 8, 12, 16 or 32 tiles, and each tile may be a square or a rectangle of a pre-determined number of pixels. Some implementations may provide that tiles are always a fixed size. Macrotiles may be fixed-size or variable-sized. Macrotiling enginemay not be included in all implementations, in that not all implementations may divide a 2-D set of pixels into macrotiles (i.e., some kind of hierarchical grouping of tiles).
Tiling enginemay generate control datathat is stored in a set of display lists. Each display list contains data from which can be determined a set of primitives that need to be considered in order to identify a visible surface for each sample (e.g., pixel) within a tile corresponding to that display list. In some cases, data can be shared among display lists. For example, each macrotile may have a display list, and hierarchical arrangements of data may be provided to indicate which primitives within the macrotile need to be considered for a given tile, and likewise for each pixel within a given tile.depicts an example of a tile-based deferred rendering system that may be implemented according to any of these examples.
A visible surface parameter generatoroperates to identify visible surfaces for samples and to produce parameters for those visible surfaces at each sample. In one example, generatorbegins operation after a stream of geometry has been completely processed, such that all primitives have been mapped to appropriate display list(s). Some implementations may support multiple partial render passes, in which some of the geometry is processed in each pass. Generatormay process these display lists on a macrotile by macrotile basis. When memory blocks used for a display list being processed are done, these can be signaled as freed, for reuse by memory manager. Where multiple partial rendering passes are implemented, secondary rays may be emitted by during shading of a given surface that is a candidate visible surface; these rays would need to be traced within a complete scene database, such as databaseofor modified geometryof.
Generatormay comprise circuitry or a functional elementfor determining a visible surface for each sample; elementmay access display lists. Generatoralso may include circuitry or a functional elementfor determining attributes of the surface at each sample (attributes can include depth only, or depth and other values). In one example, these elementsandmay be implemented so that per-sample attributes, including depth, are calculated, and then that depth may be used in a comparison with a current depth or depths being tracked for that sample. Interpolation is a term that most directly connotes planar primitives, such that a depth for a primitive at a particular sample can be derived from depth of vertices defining the planar primitive. However, the term interpolation as used here covers other approaches to deriving per-sample depth for other geometry definition approaches. Examples of other parameters that may be interpolated include normals and material properties. These parameters can be stored in memoryor made to propagate through a pipeline.
During processing of opaque primitives, a primitive with a depth closer to a viewpoint becomes a new current candidate for the visible surface at that sample. Some implementations may segregate opaque from translucent primitives, such that translucent primitives are provided together in a separate pass from opaque primitives. In some cases, when a new surface is identified as being the current closest (and potentially visible) surface at a sample, the parameter values for the prior closest surface may be overwritten, or may be retained. They may be retained for a period of time, such as in a cache. Possible usages of this information is addressed below. These are examples of implementation details and examples of how systems according to the disclosure may behave.
In addition to storing 3-D object data, free block list, display lists, and parameter data, such as depth, normaland materials properties, memoryalso may store modified geometry, ray definitionsand an acceleration structure. These data may be used for processing rays. Rays to be processed may be setup by a ray setup module, traversed by a ray traversal module(ray traversal here including the operations that are performed by a particular implementation to identify an intersection (such as the closest intersection) for rays being processed or to determine no intersection or to return a default value absent an intersection with geometry, for example.
By contrast with the example system of, the system ofdefers shading of surfaces until a final visible surface for a given sample has been identified. Then, a rasterization shadercan execute a set of shading instructions for that visible surface. Rasterization shadermay use texture sampling and filtering unitin order to obtain texture data for texturing each sample. Rasterization shadermay output rays that are to be traversed and potentially shaded. These rays can be setup by ray setup module, and traversed using ray traversal unit. Ray traversal unitmay operate on a subset of rays that are currently defined and needing to be processed in order to complete the rendering of the frame. When an intersection for a ray has been identified that requires shading, then ray shading modulemay execute a module of instructions to accomplish such shading. Implementations may implement rasterization shaderand ray shaderwithin the same set of programmable hardware units, using space and/or time division multiplexed computation, multi-threading, and other computation techniques. The term module, in the context of shading code, identifies a portion of shader code that is to be used, but does not imply a specific software organization.
Within the context of these example systems,depicts an example flow of actions that can be undertaken by a system according to these examples. Within a rasterization process, scan conversion is performed () for an element of geometry (e.g., a primitive, such as a triangle). Here, scan conversion refers to identification of samples covered by a particular primitive and derivation of per sample parameters (at least depth) for that particular primitive.
Samples where this element of geometry is currently visible are identified () and shading of the visible surface(s) for these samples is initiated (). Shading of visible surfaces may also involve defining color values that will be contributed to a sample buffer, which can be combined with other prior values, or stored separately, and used in a subsequent blending operation, for example. These operations can be performed by instructions executing on a programmable unit.
Such shading may result in ray(s) being defined () for tracing. In another implementation, each pixel overlapped by the element of geometry may be shaded, even though not all pixels overlapped may have the element of geometry as a visible surface. Actions,andmay be performed on a stream of geometry; an example of a stream of geometry is a sequence of geometry elements as defined by sets of definition data provided over a period of time. Elements of geometry also may be considered as groupings of individual primitives, where “primitive” is used to define an elemental representation, such as a point, line, or planar shape, such as a triangle. As such, using an example of triangular primitives does not exclude other ways to represent geometry. Formats for defining geometry elements may vary among implementations, and any suitable approach may be used here (e.g, triangle strips, fans, higher order surfaces, and so on). After performing identification () for a given element of geometry, Rays defined during shading are provided for ray intake processing ().
In one implementation, ray intake processing () includes providing an identifier for the ray (a rayID), and storing definition data for the ray, in association with the rayID, so that the rayID can be used to identify definition data for the ray. In addition, status data is associated () with the ray. Such status data can take a variety of forms, depending on implementation. Such status data provides, in an implementation of the disclosure, a way to determine whether a given ray is associated with an element of geometry that no longer contributes to a rendering. Various examples of such status data and how that status data is used to make such determination are explained below.
depicts another implementation possibility for defining rays to be traced. When a surface for a given sample is to begin a ray tracing operation, a shadercan be called with input parameters for that surface. For example, coordinates for a hit point can be supplied to a shader module associated with a surface that was found to be visible for the sample being processed. Shaderthen emits one or more rays to be processed. In such example, rays are not defined by shader code associated directly with the surface being shaded as a result of rasterization, but rather rasterization causes appropriate inputs to be provided to a ray shader module that causes emission of rays using those inputs. Those rays can then be traced.
Rays defined are then provided for processing. In one example, rays are selected () for processing and then status data for these selected rays is accessed (). A determination () whether the selected rays are still current is made. If any ray(s) is no longer current, then that ray or rays is culled (or flagged for culling) (). For any ray that is still current, processing can proceed () for that ray. Such processing may include further traversal through an acceleration structure, testing the ray for intersection with a primitive, or shading of the ray for an identified intersection, for example. Such processing () also may result in rays being defined () that require processing. Definition data for such rays can be returned to ray intaking processing (). A variety of approaches can be implemented for selecting () rays for (further) processing. For example, rays can be selected according to a Single Instruction Multiple Data (SIMD) computation model. Such selection may involve tracking program counters for a pool of traversal or primitive intersection testing routines and selecting a subset of these routines that require the same processing for one or more cycles. For example, packets of rays that are to be tested for intersection with the same acceleration structure element or primitive can be formed and scheduled for execution. A variety of other computation models may be employed in different implementations. Therefore, it should be understood that the example sequence of selecting rays for processing, or for further processing, and then determining whether those selected rays are still current is an example, but such selection and determination of currency may be implemented differently in different implementations. For example, both of these actions may be performed concurrently, and a conjunction of both respective subsets of remaining rays from each action can be made.
depicts a rendering pipeline in which visible surfaces for samples within a tile are determined for one or more sets of geometry, before shading of those visible surfaces is begun. A triangle (as a more general example of a primitive, which is an example of a surface) is submitted to a rasterization unit. This submission can occur a number of times for a given set of geometry. Within rasterization unit, a variety of processes occur to produce a dataset for a tile of samples (e.g., pixels or parts of pixels). Such dataset would include information for a visible surface at each sample, such information can include depth, an identifier for such surface, and interpolated parameters that were associated with vertices defining the surface. In some implementations, an entire set of geometry may be submitted and processed in different subsets. In one implementation, each partial render remains within the rasterization unituntil a final visible surface for each sample in the tile is found, which may involve multiple partial renders. In another example, treated in, ray processing may begin after each partial render, such that later submitted geometry may occlude a surface visible for a sample in an earlier partial render, thereby causing rays emitted for shading that now-occluded surface to be not-current. A variety of approaches can be implemented for geometry submission and tiling, andis simply a representation of such implementations. After data for a tile is produced, such data can be submitted to a ray frame shaderwhich can emit a set of rays for the samples within that tile. These rays can be traversed within a ray traversal portionand then shaded in a ray shading portionof the depicted pipeline. Ray traversal and ray shading portionsandcan be implemented in a variety of ways. In some implementations, portions of ray traversal may be implemented in fixed function or limited configurability circuits, while ray shading may be implemented in programmable processor(s).
depicts a processing flow, in which a series of partial renders with subsets of geometry are performed. Partial renders may be performed in circumstances in which a given scene has an amount of geometry that is too large to be tiled within an available memory for such tiling. Memory availability can include concepts such as confining processing to within an on-chip storage, even while more memory is available off-chip, for example. Also, althoughdepicts multiple partial renders, a person of ordinary skill would be able to understand, from the disclosure, bow a single render, in which all geometry is processed in one tiling operation, would be performed.
At, there is initial 2-D (macro)tiling. This initial tiling involves producing current lists for each tile or for each macrotile. In this example, a sequence of partial renders occurs, which would happen for scenes of relatively high complexity and/or for implementations that have relatively restricted amounts of memory for storing (macro)tile listscreated for the geometry processed thus far. Thus. (macro)tilingperforms an initial sorting of some portion of geometry into macrotiles or tiles. Some implementations may sort such geometry into a hierarchy. Thus,is illustrative of a variety of implementations of initial (macro)tile binning of geometry, resulting in data (called a display list here), that is used for determining a visible surface. Such display lists can be formatted in a variety of different ways, in order to reduce an amount of space required to store such data. Dataidentifies one division between partial renders; where an entirety of geometry is processed,represents a result of binning all scene geometry. When a partial render begins, one or more of the current (macro)tile lists are processed in order to free memory for processing more geometry.
Thus, visible surface determinationbegins for the display list(s) selected for rendering, resulting in culling of non-visible surfaces, calculation of per-sample parameters, maintenance of per-sample parameters for the currently visible surface at each pixel and release of memory used for those display list(s). This sequence of actions may repeat a number of times. Although some implementations may delay pixel shading until after geometry has been fully processed, so that a final visible surface for each sample can be determined, this example provides that pixel shading begins on candidate closest surfaces, before a respective final visible surface for each sample is determined. Pixel shadingproduces rays that need to be processed. In some examples, separate shaders may be called for ray emission.
During such processing, additional rays may be emitted. As depicted, pixel shadingand ray processingoverlap in time. Also overlapping with these ongoing actions is culling of non-current raysand ray shading. Cullingis explained in more detail below. Final blendingmay commence for some pixels after all rays have completed processing for those pixels. Thus,shows an example where pixel shading and ray emission and subsequent processing for a series of partial renders can be implemented. This approach may in some cases result in excess computation, since some rays may end up being partially processed but the surfaces from which they were emitted may be obscured. However, an overall latency of producing a frame can be reduced where resources are available to conduct such ray processing concurrently with rasterization processes.
shows an example in which depth can be used to cull rays that are non-current. In the example of, scan conversion is performed () for a set of geometry. From the scan converted geometry, depth ranges for tiles are optionally produced () from the depths produced during scan conversion. Shading of currently visible surfaces can be initiated (). This produces () rays to be traced. These rays are associated () with a sample ID and a depth to the currently visible surface for the identified sample. Rays are selected () for (further) processing. A sample ID corresponding to each selected ray is identified and a depth range for the tile(s) containing those sample IDs is accessed (). If the depth of a ray is outside of the current depth range of the tile, then that ray can be culled (or flagged) for culling ().
If a ray passes the range check (), then a comparison between the depth associated with the ray and a current depth associated with the sample associated with the ray can be performed (), and if that comparison fails, the ray can be culled (). Otherwise processing can proceed () for the ray. These depth comparisons can also incorporate a depth compare mode that is setup within a rendering pipeline for the subset of geometry being processed.
This approach to culling may be appropriate, for example, where a group of rays can be identified that all may contribute to a sample within the same tile (or macrotile). Then, a single depth range for that tile can be compared with the depth of each ray. Such technique may be most appropriate for situations where depth is smoothly varying among samples, or where only a few surfaces are visible within a tile. Depth ranges can be produced at different levels of granularity. An amount of computation to produce and maintain such depth ranges is a tradeoff with an amount of ray processing that can be avoided by culling rays, and by reducing an amount of memory accesses required to lookup depth ranges. Some implementations are tile-based deferred renderers and may produce depth ranges as a byproduct of hidden surface removal operations. These depth ranges can be made available for use in ray culling. This disclosure presents examples of tracking rays in batches for culling opportunities based on depth comparisons. Such a depth-oriented technique may be combined with other conditions and techniques disclosed herein, as explained below.
shows another example process that can be implemented. In, rays are produced () for tracing, such as according to approaches described above. Each ray is assigned () a per-sample generation ID. A per-sample generation ID is a sequential identifier that is specific to each sample. When a new primary ray set is emitted that may contribute to a particular sample, the generation ID is incremented. Note that the generation ID would not be incremented for secondary rays emitted as a result of processing a given primary ray, since those secondary rays would contribute through the ray path of their primary ray. Also, primary rays that were all emitted as a result of shading the same surface generally would be given the same generation ID (for simplicity). An interface between an executing shader and a ray intake processing function can be made to accommodate passing of data that can be used to determine whether the generation ID should be incremented or not. One benefit of using a generation ID is that a number of bits required to maintain such an identifier would be expected to be less than associating a per-primitive identifier with each ray (that uniquely identifies a surface). Then,shows that rays can be identified () that are associated with a prior-generation ID and those rays are (flagged for) culling. As with the prior examples, ray shading can itself generate rays to be traced and as explained these also would be associated with a sample ID and generation ID that is derived from their parent ray. Note that rather than repeating this data, these secondary rays also can simply refer back to these identifiers for their respective parent ray. The usage of ray generation identifiers can be combined with usage of depth range checks as disclosed above, by way of example.
depicts associating ray identifiers with ray definition data, and a sample identifier to which the ray may contribute. Such sample identifier can be a location in a frame buffer where the ray may contribute, or an index of a pixel or fragment subdivision of a pixel, for example.also depicts that each ray identifier may be associated with an index value for the identified sample, when an original ray associated with the identified ray was emitted. Here, identified rays may be child rays from a given parent ray. That parent ray in turn would have been emitted from a surface that was originally relevant to the identified sample, within a rendering being produced. Thus, an incrementing index value for each parent ray that is emitted from a given sample ID can be provided. Each child ray emitted from that parent ray can inherit this index. When a ray is emitted from a surface determined to be visible at a certain sample, then a current index associated with that sample can be incremented. A respective index associated with a given ray can be compared with a current index of the sample to which that ray may contribute. If the indexes do not match, then the ray can be considered to not be current.
In a multisampling situation, where multiple rays are emitted from the same surface, all such rays can share the same index value. Another approach to such indexing can be to provide a distance to the visible surface from which a given parent ray was emitted, and each surface index propagates or references such distance value from a parent ray. When a visible surface distance for a given sample changes, all rays associated with that sample which have a greater distance can be culled.
These index values also can be coded or otherwise associated with data indicating whether a surface from which a parent ray was emitted is translucent or opaque and culling of rays can be determined using translucency/opaque information. In particular, if a closer surface is translucent, then even though there are rays associated with a surface that is more distant, these rays may be maintained, as they may still contribute to the sample. The above example is in contrast to a situation where only the closest surface is shaded for ray tracing. In that situation, the translucent surface is treated as being opaque and transparency effects (refraction and transmission) are handled by emission of appropriate rays from that closest surface, and the opaque, more distant surface is not shaded.
In some implementations, geometry may be ordered, such that all opaque geometry is submitted for rendering during rasterization before translucent geometry. In such implementations, there may be an identifiable transition between when a rasterization element is rasterizing opaque geometry and translucent geometry; in some implementations, a flag may be associated with each geometry element that is transparent. If only a closest surface will be shaded (ray emission), then these flags may be ignored, and all geometry treated as though it were opaque for the sake of hidden surface removal. If multiple surfaces are to be shaded (not just a closest surface) closer geometry for a given sample may no longer be useful in culling rays that remain to be completed, because the rays for a prior surface may need to be maintained anyway. As such, implementations of the disclosed culling can stop checking rays after all opaque primitives have been processed, which can be indicated by data within a geometry stream being rasterized. Where there is a limitation that opaque and non-opaque geometry be submitted separately (generally with opaque geometry first), then this switch off can be implemented. In other implementations, this relative separation of opaque and non-opaque geometry may be left up to the programmer, application or artist. In such circumstances, a flag or other condition can be set to indicate whether such convention has been observed in a particular case.
Implementations of the disclosure also may provide a decision criteria or flag that determines whether or not to implement the culling in a particular circumstance. In particular, scenes with denser geometry will typically result in more occluded surfaces, such that more benefit would be realized from implementations of the disclosure. Where geometry is less dense, less benefit would be realized. However, an amount of computation performed in implementing the disclosure (e.g., a number of comparisons) also increases as geometry density increases, such that a computation cost to implement the disclosure would generally be less for a scene with less dense geometry. As such, a person of ordinary skill may determine whether or not to provide a capability to an application or programmer to turn ray culling according on or off for specific renderings.
depicts an example of data associating ray IDs to ray definition data, a sample ID and a ray/surface index indicating a current visible surface, when the identified ray was emitted. This provides an example of data that associates ray identifiers with a particular sample and an index.depicts an example of data associating rays with tile and sample identifiers (these can be a hierarchical identifier, for example)), a depth of a visible surface for the sample, when the ray was emitted, and identifiers of respective primitives that provided each visible surface. Instead of providing these identifiers directly, an identifier of a parent ray may be provided, which indicates where such data can be obtained for each ray.depicts an example where ray identifiers can be associated with hashes of a combination of a primitive ID and a sample ID. Such hash can be designed to make a collision between two different combinations of inputs highly unlikely. Such hashing may consume more processor resources but would allow reduction in an amount of memory required, compared with explicitly representing primitive and sample IDs for rays, which also may reduce total memory bandwidth.
The above examples were of relational data that could be queried according to ray identifiers (of course, in appropriately designed systems, other values shown in these tables could serve as keys upon which searches can be made. Other implementations may provide different organizations for data used to identify non-current rays.
shows an example where sample identifiers are associated with a current ray generation identifier. Thus, when any ray is to be processed (optionally, further processed), a sample identifier associated with that ray can be used to query such a table and a comparison can be performed on the value returned and a value associated with the ray.
Unknown
October 2, 2025
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.