10810784

Techniques for Preloading Textures in Rendering Graphics

PublishedOctober 20, 2020
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
23 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A method of displaying a scene, comprising: while performing texture mapping of the scene using a first block of a stored texture, generating a prefetch request, by a processor, to retrieve all or part of a second block of the stored texture from a first level memory of a memory hierarchy; retrieving said all or part of the second block from the first level memory in response to the prefetch request; storing the retrieved all or part of the second block in an area of a second level of memory of the memory hierarchy, wherein said area of the second level of memory is accessible by the processor and another processor, and wherein the second level of memory is an intermediate level of memory between the first level of memory and a third level of memory in the memory hierarchy; performing texture mapping of the scene using the retrieved all or part of the second block; and rendering the scene to a display device.

Plain English Translation

This invention relates to optimizing texture mapping in graphics rendering by prefetching texture data to reduce latency and improve performance. The method addresses the problem of delays caused by fetching texture blocks from memory during real-time rendering, which can lead to stuttering or reduced frame rates. The solution involves a multi-level memory hierarchy where texture data is prefetched from a first-level memory (e.g., cache) into a second-level memory (e.g., shared cache or intermediate buffer) before it is needed. While texture mapping a scene using a first texture block, a processor generates a prefetch request to retrieve all or part of a second texture block from the first-level memory. The prefetched data is stored in a shared area of the second-level memory, accessible by multiple processors. This allows subsequent texture mapping operations to use the prefetched data directly from the second-level memory, reducing access latency. The scene is then rendered to a display device. The method ensures efficient texture data management by leveraging memory hierarchy levels, minimizing bottlenecks in graphics rendering pipelines.

Claim 2

Original Legal Text

2. The method according to claim 1 , wherein the texture mapping of the scene using the first block is performed by the processor and the texture mapping of the scene using the second block is performed by said another processor.

Plain English Translation

This invention relates to a distributed texture mapping system for rendering scenes in computer graphics. The problem addressed is the computational burden of texture mapping, which involves applying detailed surface textures to 3D models, often requiring significant processing power. The solution involves dividing the texture mapping task between multiple processors to improve efficiency and performance. The system uses a first processing block to perform initial texture mapping of a scene, while a second processing block handles additional texture mapping tasks. The first block may handle base textures or lower-resolution mappings, while the second block refines or applies higher-detail textures. By distributing the workload, the system reduces processing bottlenecks and enables real-time rendering in applications like gaming, virtual reality, or 3D modeling. The processors may operate in parallel or sequentially, depending on the scene complexity and system configuration. This approach optimizes resource utilization and enhances rendering speed without compromising texture quality.

Claim 3

Original Legal Text

3. The method according to claim 1 , wherein an address for the all or part of the second block is dynamically calculated by the processor based on a block size value determined from header information of the texture, and wherein said block size value is different from another block size value indicated for another texture.

Plain English Translation

This invention relates to texture data processing in computer graphics, specifically addressing the challenge of efficiently managing texture blocks of varying sizes. In graphics rendering, textures are often divided into smaller blocks for storage and processing, but conventional methods may struggle with dynamically determining block addresses when textures have different block sizes. The invention provides a solution by dynamically calculating the address for all or part of a texture block based on a block size value derived from the texture's header information. This block size value can differ from the block size used for other textures, allowing flexible handling of textures with non-uniform block sizes. The processor computes the address by analyzing the header metadata, which specifies the block size for the current texture, ensuring accurate memory access without predefining fixed block sizes. This approach improves efficiency in texture mapping and reduces memory overhead by adapting to the specific block size requirements of each texture, particularly useful in systems where multiple textures with varying block sizes are processed simultaneously. The dynamic calculation ensures compatibility with different texture formats and optimizes rendering performance by minimizing unnecessary memory access or data duplication.

Claim 4

Original Legal Text

4. The method according to claim 3 , wherein the address for the all or part of the second block is determined based further on an address of the first block, and wherein the prefetch request includes the determined address of the all or part of second block and/or the block size value determined from the header information of the texture.

Plain English Translation

This invention relates to memory management in graphics processing, specifically optimizing texture data prefetching to reduce latency and improve rendering performance. The problem addressed is inefficient texture data access in graphics pipelines, where delays occur when fetching texture blocks from memory, leading to stalls in the rendering process. The method involves prefetching texture data blocks based on predicted access patterns. A first texture block is accessed, and its header information is analyzed to determine a block size value. The address for all or part of a second texture block is then calculated using the address of the first block, enabling predictive prefetching of subsequent blocks. The prefetch request includes the determined address of the second block and the block size value derived from the texture header. This allows the system to proactively fetch texture data before it is needed, minimizing access delays. The technique leverages spatial locality in texture access patterns, where adjacent blocks are often requested in sequence. By dynamically calculating addresses for subsequent blocks and prefetching them in advance, the system reduces memory access latency and improves rendering throughput. The method is particularly useful in real-time graphics applications where minimizing pipeline stalls is critical for performance.

Claim 5

Original Legal Text

5. The method according to claim 4 , wherein the first block and the second block each includes an integer number of texels stored in the first level of memory in a block linear layout in a same texture map.

Plain English Translation

This invention relates to texture mapping in computer graphics, specifically optimizing memory access for texture data. The problem addressed is inefficient memory access patterns when retrieving texture data, which can degrade performance in real-time rendering applications. The solution involves organizing texture data into blocks stored in a first-level memory, such as a cache or GPU memory, with each block containing an integer number of texels (texture elements) arranged in a block-linear layout within a single texture map. The block-linear layout ensures that texels are stored contiguously in memory, reducing access latency and improving cache efficiency. The method further includes dividing the texture map into at least two blocks, where each block is independently accessible and processed. This approach minimizes memory fragmentation and enhances parallel processing capabilities, particularly in systems where texture data is frequently accessed or updated. The invention is applicable to graphics processing units (GPUs) and other systems requiring optimized texture memory management.

Claim 6

Original Legal Text

6. The method according to claim 1 , wherein the method further includes counting a number of outstanding prefetch requests, and if the number of outstanding prefetch requests is less than a threshold, transmitting the prefetch request to the first level memory, otherwise dropping the prefetch request without transmitting it to the first level memory.

Plain English Translation

This invention relates to memory prefetching in computing systems, specifically addressing the problem of managing prefetch requests to optimize performance and resource utilization. The method involves dynamically controlling the transmission of prefetch requests to a first-level memory (e.g., cache) based on the number of outstanding prefetch requests. By monitoring the count of pending prefetch operations, the system determines whether to proceed with a new prefetch request or discard it. If the number of outstanding requests is below a predefined threshold, the new prefetch request is sent to the first-level memory. If the threshold is exceeded, the request is dropped to prevent excessive memory traffic and contention. This selective prefetching helps balance performance gains from early data retrieval with the overhead of managing too many concurrent prefetch operations, improving overall system efficiency. The threshold can be adjusted based on system conditions or workload characteristics to further optimize performance.

Claim 7

Original Legal Text

7. The method according to claim 6 , wherein the counting, the transmitting and the dropping being performed in circuitry associated with the second level memory.

Plain English Translation

The invention relates to memory management in computing systems, specifically addressing the challenge of efficiently handling data access and storage in multi-level memory architectures. The method involves counting data access operations, transmitting data between memory levels, and selectively dropping data from a second-level memory to optimize performance and resource utilization. The counting mechanism tracks the frequency or type of accesses to data stored in the second-level memory, which may include cache memory or intermediate storage layers. Based on the access patterns, the system determines whether to transmit data to or from the second-level memory, ensuring that frequently accessed data remains available while less-used data is dropped to free up space. The circuitry responsible for these operations is integrated directly into the second-level memory, reducing latency and improving efficiency by minimizing the need for external processing. This approach enhances system performance by dynamically adapting to workload demands, reducing unnecessary data transfers, and maintaining optimal memory usage. The method is particularly useful in systems where memory hierarchy management is critical, such as in high-performance computing, embedded systems, or real-time processing environments.

Claim 8

Original Legal Text

8. The method according to claim 1 , wherein the method further includes determining whether the requested all or part of the second block is present in the second level in the memory hierarchy, and, if determined to be present, dropping the prefetch request without transmitting the prefetch request to the first memory.

Plain English Translation

This invention relates to memory prefetching in computing systems, specifically optimizing prefetch operations in multi-level memory hierarchies. The problem addressed is inefficient prefetching when requested data is already present in a higher-level cache, leading to unnecessary memory accesses and wasted bandwidth. The method involves a multi-level memory hierarchy, including at least a first memory (e.g., main memory) and a second level (e.g., a cache). When a prefetch request is generated for a second block of data, the system first checks whether the requested data or a portion of it is already present in the second level. If the data is found in the second level, the prefetch request is dropped, preventing redundant access to the first memory. This avoids unnecessary latency and energy consumption associated with fetching data that is already available in a higher-level cache. The method improves system performance by reducing unnecessary memory traffic and improving cache efficiency. It is particularly useful in systems with deep memory hierarchies where prefetching can otherwise generate excessive requests for data already cached. The technique ensures that prefetch operations are only performed when necessary, optimizing memory bandwidth and power consumption.

Claim 9

Original Legal Text

9. The method according to claim 8 , wherein the dropping is performed in circuitry associated with the second level memory.

Plain English Translation

A method for optimizing memory access in a computing system involves managing data movement between different levels of memory to improve performance and reduce power consumption. The system includes at least two levels of memory, such as a higher-speed, lower-capacity first level and a lower-speed, higher-capacity second level. The method tracks data usage patterns to identify infrequently accessed data in the first level memory. When such data is detected, it is selectively moved (dropped) to the second level memory to free up space in the first level for more frequently accessed data. This process is performed by circuitry specifically associated with the second level memory, ensuring efficient data transfer and management. The method may also include monitoring access patterns to determine which data should be retained in the first level memory and which should be moved to the second level. By dynamically adjusting data placement based on usage, the system improves overall performance by reducing access latency for frequently used data while conserving energy by minimizing unnecessary data transfers. The circuitry handling the dropping operation ensures that the process is streamlined and does not disrupt system operations.

Claim 10

Original Legal Text

10. The method according to claim 8 , wherein the dropping being performed in circuitry associated with the processor.

Plain English Translation

A method for optimizing data processing in a computing system addresses inefficiencies in handling data packets, particularly in scenarios where packet processing resources are limited or underutilized. The method involves selectively dropping data packets based on predefined criteria to improve system performance, reduce latency, or conserve power. The dropping mechanism is implemented in circuitry directly associated with the processor, ensuring low-latency decision-making and minimal overhead. This circuitry may include specialized hardware accelerators, such as packet classifiers or filters, that evaluate incoming packets against rules or thresholds before determining whether to drop them. The method may also involve dynamically adjusting the dropping criteria based on system conditions, such as network congestion or processor load, to maintain optimal performance. By integrating the dropping logic into processor-associated circuitry, the method avoids the latency and complexity of software-based solutions, making it suitable for high-speed, real-time applications. The approach is particularly useful in network processors, embedded systems, or other environments where efficient packet handling is critical.

Claim 11

Original Legal Text

11. The method according to claim 1 , wherein the processor and said another processor access respectively different areas in the third level in respective L1 caches.

Plain English Translation

This invention relates to a multi-processor system where multiple processors share a third-level cache while maintaining separate first-level (L1) caches. The system addresses the challenge of cache coherence and efficient data access in multi-core architectures. Each processor accesses distinct regions within the shared third-level cache, reducing contention and improving performance. The L1 caches are private to each processor, allowing for low-latency access to frequently used data while minimizing conflicts. The method ensures that data consistency is maintained across the system, preventing race conditions or stale data reads. By partitioning the third-level cache into distinct areas for each processor, the system optimizes memory access patterns, reducing bottlenecks and improving overall throughput. The approach is particularly useful in high-performance computing environments where multiple processors must operate concurrently without degrading performance due to cache interference. The invention enhances scalability and efficiency in multi-core systems by leveraging hierarchical cache architectures while mitigating the drawbacks of shared cache contention.

Claim 12

Original Legal Text

12. The method according to claim 1 , further comprising detecting that the texture mapping is part of a fullscreen draw, and performing said generating the prefetch request in response to the detecting.

Plain English Translation

This invention relates to texture mapping in computer graphics, specifically optimizing texture prefetching for fullscreen rendering operations. The problem addressed is inefficient texture memory access during fullscreen draws, which can cause performance bottlenecks due to unnecessary prefetching of texture data that may not be used. The method involves detecting when a texture mapping operation is part of a fullscreen draw, where the entire screen is being rendered. Upon detecting this condition, the system generates a prefetch request for the texture data. This selective prefetching ensures that texture data is only prefetched when it will be used, avoiding wasted memory bandwidth and improving rendering efficiency. The detection mechanism identifies fullscreen draws by analyzing rendering commands or framebuffer dimensions to determine when the entire screen is being targeted. The prefetch request generation involves determining the texture coordinates that will be used in the fullscreen draw and issuing requests to load the corresponding texture data into faster memory caches before it is needed. This proactive loading reduces latency during rendering. The method may also include adjusting prefetch parameters based on the detected fullscreen draw, such as prioritizing certain texture regions or modifying prefetch granularity to optimize performance. By selectively prefetching textures only during fullscreen operations, the system avoids unnecessary memory traffic, reducing power consumption and improving rendering throughput. This is particularly beneficial in real-time graphics applications where efficient resource utilization is critical.

Claim 13

Original Legal Text

13. The method according to claim 1 , wherein said storing comprises storing the retrieved all or part of the second block is stored in an area of the second level of memory without being stored in the third level of memory.

Plain English Translation

This invention relates to memory management in computing systems, specifically optimizing data storage and retrieval between different memory levels. The problem addressed is inefficient data handling when transferring information between memory tiers, such as cache levels, which can lead to performance bottlenecks and increased latency. The method involves retrieving a second block of data from a first level of memory and storing it in a second level of memory without passing through a third level of memory. This bypasses intermediate storage steps, reducing latency and improving efficiency. The first level of memory is typically a higher-speed, lower-capacity memory like L1 cache, while the second level is a slightly slower but larger memory like L2 cache. The third level, which is bypassed, could be an even slower memory like L3 cache or main memory. The method ensures that only the necessary portion of the second block is stored in the second level, avoiding unnecessary data transfers. This selective storage further optimizes memory usage and performance. The approach is particularly useful in systems where memory hierarchy management is critical, such as high-performance computing, real-time processing, or embedded systems. By minimizing unnecessary data movement between memory levels, the method enhances overall system efficiency and responsiveness.

Claim 14

Original Legal Text

14. The method according to claim 13 , wherein the retrieved all or part of the second block that is stored in the second level of memory is, in response to a subsequent fetch request from the processor or the another processor, subsequently stored in the third level of memory.

Plain English Translation

This invention relates to a method for managing data retrieval and storage in a multi-level memory hierarchy, particularly in computing systems with at least three levels of memory (e.g., cache levels). The problem addressed is inefficient data access in multi-core or multi-processor systems, where frequently accessed data may be stored in lower-level caches (e.g., L1 or L2) but not in higher-level caches (e.g., L3), leading to performance bottlenecks. The method involves retrieving all or part of a second block of data from a second level of memory (e.g., L2 cache) and storing it in a third level of memory (e.g., L3 cache) in response to a fetch request from a processor or another processor. This ensures that data accessed by one processor is made available to other processors, reducing redundant fetches and improving system efficiency. The method may also involve tracking data usage patterns to determine which blocks should be promoted to higher memory levels. Additionally, the method may include invalidating or updating data in the second level of memory to maintain consistency. The technique optimizes data placement in the memory hierarchy, reducing latency and improving overall system performance.

Claim 15

Original Legal Text

15. A parallel processing system for displaying a scene, comprising: a plurality of processors; a cache hierarchy including at least a first level cache memory and a second level cache memory; a display interface; and a memory interface configured to provide the plurality of processors access to an off-chip memory, wherein the plurality of processors and control circuitry associated with the cache hierarchy are configured to: while performing texture mapping of the scene using a first block of a stored texture, generating a prefetch request, by a first processor from the plurality of processors, to retrieve all or part of a second block of the stored texture from a memory hierarchy which includes the cache hierarchy; retrieving the requested all or part of the second block from the off-chip memory over the memory interface in response to the prefetch request; storing the retrieved all or part of the second block in an area of the second level cache memory, wherein said area of the second level cache memory is accessible by the first processor and a second processor from the plurality of processors, and wherein the second level cache memory is an intermediate level of memory between the off-chip memory and the first level cache memory; performing texture mapping of the scene using the retrieved all or part of the second block; and rendering the scene to a display device over the display interface.

Plain English Translation

This invention relates to parallel processing systems for displaying scenes, particularly focusing on optimizing texture mapping performance. The system addresses the problem of latency in accessing texture data from off-chip memory during rendering, which can degrade performance in graphics processing. The system includes multiple processors, a cache hierarchy with at least first and second-level caches, a display interface, and a memory interface for accessing off-chip memory. During texture mapping, a first processor generates a prefetch request to retrieve a second block of texture data while using a first block. The requested texture data is fetched from off-chip memory and stored in a shared area of the second-level cache, accessible by both the first and a second processor. This allows the second processor to use the prefetched texture data for subsequent texture mapping operations. The system then renders the scene to a display device. By prefetching texture data into a shared cache level, the system reduces memory access latency and improves parallel processing efficiency in graphics rendering.

Claim 16

Original Legal Text

16. The parallel processing system according to claim 15 , wherein the texture mapping of the scene using the first block is performed by the first processor and the texture mapping of the scene using the second block is performed by said second processor.

Plain English Translation

A parallel processing system is designed to enhance the efficiency of texture mapping in computer graphics by distributing the workload across multiple processors. The system addresses the problem of slow rendering times in complex scenes by dividing the scene into multiple blocks and assigning each block to a separate processor for parallel texture mapping. Each processor independently performs texture mapping on its assigned block, allowing the system to process different parts of the scene simultaneously. This parallelization reduces the overall rendering time compared to sequential processing. The system ensures that the texture mapping operations are synchronized to maintain consistency across the entire scene. By leveraging multiple processors, the system improves performance in real-time graphics applications, such as video games or virtual reality, where fast rendering is critical. The division of the scene into blocks and the assignment of these blocks to different processors are managed to optimize load balancing and minimize idle time. The system may also include additional features, such as dynamic adjustment of block sizes or processor allocation, to further enhance efficiency based on the scene's complexity.

Claim 17

Original Legal Text

17. The parallel processing system according to claim 15 , wherein an address for said all or part of the second block is dynamically calculated by the first processor based on a block size value determined from header information of the texture, and wherein said block size value is different from another block size value indicated for another texture.

Plain English Translation

The invention relates to parallel processing systems for handling texture data in graphics processing. The system addresses the challenge of efficiently managing texture data with varying block sizes, which is common in modern graphics applications where different textures may require different memory access patterns for optimal performance. The system includes a first processor that dynamically calculates an address for accessing all or part of a second block of texture data. This address calculation is based on a block size value derived from header information of the texture. The block size value can differ from another block size value used for a different texture, allowing the system to adapt to the specific memory access requirements of each texture. This dynamic addressing mechanism ensures efficient memory access and reduces overhead in graphics processing pipelines, particularly when dealing with textures of varying sizes and formats. The system may also include a second processor that processes the texture data in parallel with the first processor, further enhancing performance. The dynamic calculation of block addresses based on texture-specific header information enables flexible and efficient texture data management in parallel processing environments.

Claim 18

Original Legal Text

18. The parallel processing system according to claim 17 , wherein the address for said all or part of the second block is determined based further on an address of the first block, and wherein the prefetch request includes the determined address of the second block and/or the block size value determined from the header information of the texture.

Plain English Translation

This invention relates to parallel processing systems, specifically improving data access efficiency in graphics processing units (GPUs) or similar architectures. The problem addressed is the latency and inefficiency in fetching texture data during rendering operations, which can bottleneck performance. The system prefetches texture data blocks in parallel with processing to minimize stalls. A first block of texture data is accessed based on a texture request, and header information from this block is analyzed to determine a block size value. This value indicates the size of subsequent texture blocks. The system then generates a prefetch request for a second block of texture data, where the address for this second block is determined using the address of the first block and the block size value. The prefetch request includes the calculated address of the second block and the block size value derived from the header information. This allows the system to predictively fetch subsequent texture data blocks before they are explicitly requested, reducing access latency and improving rendering performance. The method ensures that texture data is prefetched in a manner that aligns with the memory layout and access patterns of the texture, optimizing cache utilization and minimizing redundant fetches.

Claim 19

Original Legal Text

19. The parallel processing system according to claim 18 , wherein the first block and the second block each includes an integer number of texels stored in the off-chip memory in a block linear layout in a same texture map.

Plain English Translation

The invention relates to parallel processing systems for handling texture data in graphics processing. The system addresses the challenge of efficiently managing texture data stored in off-chip memory, particularly when processing tasks require accessing multiple texture blocks in parallel. Traditional approaches often suffer from inefficiencies due to non-optimal data layouts or misaligned memory access patterns, leading to performance bottlenecks. The system includes a parallel processing unit configured to process texture data stored in off-chip memory. The texture data is organized into blocks, each containing an integer number of texels (texture elements) arranged in a block-linear layout within a single texture map. The block-linear layout ensures that texels are stored in a contiguous, predictable manner, optimizing memory access patterns for parallel processing. The system further includes a memory controller that manages the transfer of texture blocks between the off-chip memory and the parallel processing unit, ensuring efficient data retrieval and minimizing latency. The parallel processing unit processes the texture data in parallel, with each processing block handling a distinct portion of the texture map. This parallelization improves throughput and reduces processing time, particularly in graphics rendering tasks where multiple texture samples are required simultaneously. The system is designed to handle large texture maps efficiently, leveraging the block-linear layout to maintain coherence in memory access patterns, thereby enhancing overall system performance.

Claim 20

Original Legal Text

20. The parallel processing system according to claim 15 , further comprising request throttle circuitry connected to the second level cache memory, wherein the throttle circuitry is configured to count a number of outstanding prefetch requests, and if the number of outstanding prefetch requests is less than a threshold transmitting the prefetch request to the off-chip memory via the memory interface, otherwise dropping the prefetch request without transmitting it to the off-chip memory.

Plain English Translation

A parallel processing system includes a multi-core processor with a second-level cache memory and a memory interface for accessing off-chip memory. The system addresses inefficiencies in data prefetching, where excessive prefetch requests can overwhelm memory bandwidth and degrade performance. To mitigate this, the system incorporates request throttle circuitry connected to the second-level cache. This circuitry monitors the number of outstanding prefetch requests—requests that have been issued but not yet completed. If the count of outstanding requests is below a predefined threshold, the prefetch request is transmitted to the off-chip memory via the memory interface. However, if the count exceeds the threshold, the prefetch request is dropped to prevent overloading the memory system. This dynamic throttling mechanism ensures that prefetching remains beneficial without causing bandwidth contention or performance degradation. The threshold can be adjusted based on system requirements or workload characteristics to optimize memory efficiency.

Claim 21

Original Legal Text

21. The parallel processing system according to claim 15 , further comprising request deduplication circuitry, wherein the deduplication circuitry is configured to determine whether the second block is present in the second level cache memory, and, if determined to be present, dropping the prefetch request without transmitting the prefetch request to the off-chip memory.

Plain English Translation

A parallel processing system includes a multi-level cache hierarchy with at least a first-level cache and a second-level cache, where the second-level cache is shared among multiple processing elements. The system prefetches data blocks from off-chip memory into the second-level cache to reduce latency. The system monitors memory access patterns to identify frequently accessed data blocks and issues prefetch requests to proactively load these blocks into the cache. To optimize performance, the system includes request deduplication circuitry that checks whether a prefetched data block is already present in the second-level cache. If the block is found in the cache, the prefetch request is dropped to avoid redundant memory accesses and unnecessary bandwidth usage. This mechanism prevents the system from repeatedly fetching the same data block, improving efficiency and reducing power consumption. The deduplication circuitry operates in parallel with other cache management functions, ensuring low-latency decision-making. The system is particularly useful in high-performance computing environments where minimizing memory access overhead is critical.

Claim 22

Original Legal Text

22. The parallel processing system according to claim 21 , wherein the request deduplication circuitry is connected to the first processor.

Plain English Translation

A parallel processing system includes multiple processors and request deduplication circuitry to improve efficiency by reducing redundant processing. The system operates in a domain where parallel processing is used to handle multiple tasks simultaneously, but redundant requests can waste computational resources. The request deduplication circuitry identifies and eliminates duplicate requests before they are processed by the processors, ensuring that each request is handled only once. This circuitry is connected to at least one processor, allowing it to monitor incoming requests and filter out duplicates in real-time. The system may also include additional components, such as a request queue and a response buffer, to manage the flow of requests and responses efficiently. By preventing redundant processing, the system enhances performance, reduces power consumption, and optimizes resource utilization in parallel computing environments. The deduplication process may involve comparing incoming requests against a history of previously processed requests or using hash-based techniques to detect duplicates. The system is particularly useful in high-throughput applications where minimizing redundant work is critical for maintaining efficiency.

Claim 23

Original Legal Text

23. A system-on-chip (SoC) comprising: at least one central processing unit (CPU); and at least one parallel processing unit (PPU) connected to the CPU, wherein each PPU comprises: a plurality of multiprocessors; a plurality of special function units; a cache hierarchy including at least a first level cache memory and a second level cache memory; and a memory interface configured to provide access to an off-chip memory, wherein the plurality of special function units and control circuitry associated with the cache hierarchy are configured to, in response to an instruction received from one of the multiprocessors, perform: while texture mapping of a scene using a first block of a stored texture, generating a prefetch request, by a first special function unit from the plurality of special function units, to retrieve all or part of a second block of the stored texture from a memory hierarchy which includes the cache hierarchy; retrieving the requested all or part of the second block from the off-chip memory over the memory interface in response to the prefetch request; storing the retrieved all or part of the second block in an area of the second level cache memory, wherein said area of the second level cache memory is accessible by the first special function unit and a second special function unit from the plurality of special function units, and wherein the second level cache memory is an intermediate level of memory between the off-chip memory and the first level cache memory; texture mapping of the scene using the retrieved all or part of the second block; and rendering the scene to a display device over a display interface.

Plain English Translation

A system-on-chip (SoC) integrates at least one central processing unit (CPU) and at least one parallel processing unit (PPU) connected to the CPU. Each PPU includes multiple multiprocessors, special function units, a cache hierarchy with at least first and second level cache memories, and a memory interface for accessing off-chip memory. The special function units and cache control circuitry work together to optimize texture mapping in graphics processing. During texture mapping of a scene using a first texture block, a first special function unit generates a prefetch request to retrieve all or part of a second texture block from the memory hierarchy, which includes the cache levels. The requested data is fetched from off-chip memory via the memory interface and stored in a designated area of the second level cache, accessible by both the first and second special function units. The second level cache serves as an intermediate memory layer between off-chip memory and the first level cache. The system then uses the prefetched texture data to continue texture mapping and renders the scene to a display device. This approach improves performance by reducing memory access latency through intelligent prefetching and efficient cache utilization.

Patent Metadata

Filing Date

Unknown

Publication Date

October 20, 2020

Inventors

Pranava Ajith RAI
Amit JAIN

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, FAQs, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “Techniques for Preloading Textures in Rendering Graphics” (10810784). https://patentable.app/patents/10810784

© 2026 Nomic Interactive Technology LLC. Machine-readable context available at /api/llm-context/10810784. See llms.txt for full attribution policy.