10817012

System, Apparatus And Method For Providing A Local Clock Signal For A Memory Array

PublishedOctober 27, 2020
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
17 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A processor comprising: at least one processor core; and at least one graphics processor, wherein the at least one graphics processor comprises a register file having a plurality of entries, wherein at least a portion of the at least one graphics processor is to operate at a first operating frequency and the register file is to operate at a second operating frequency greater than the first operating frequency, to enable the at least one graphics processor to issue a plurality of write requests to the register file in a single clock cycle at the first operating frequency and receive a plurality of data elements of a plurality of read requests from the register file in the single clock cycle at the first operating frequency.

Plain English Translation

This invention relates to a processor architecture designed to improve graphics processing efficiency by optimizing register file operations. The processor includes at least one processor core and at least one graphics processor. The graphics processor contains a register file with multiple entries, where the register file operates at a higher frequency than the rest of the graphics processor. This allows the graphics processor to issue multiple write requests to the register file and receive multiple data elements from read requests in a single clock cycle of the graphics processor's operating frequency. The higher-frequency register file enables parallel data access, reducing latency and improving throughput for graphics workloads. The architecture is particularly useful in applications requiring high-performance graphics processing, such as gaming, rendering, or real-time visualization, where efficient data handling is critical. By decoupling the register file's operating frequency from the rest of the graphics processor, the design balances power efficiency with performance, allowing the graphics processor to handle complex operations without bottlenecks. The invention addresses the challenge of maintaining high throughput in graphics processing units (GPUs) while minimizing energy consumption.

Claim 2

Original Legal Text

2. The processor of claim 1 , wherein the processor further comprises a clock generator circuit to receive a first clock signal at the first operating frequency and provide a second clock signal to the register file at the second operating frequency, wherein the clock generator circuit is to locally route the second clock signal to the register file.

Plain English Translation

This invention relates to processor architectures, specifically addressing power efficiency and performance optimization in computing systems. The technology focuses on a processor with a clock generator circuit that dynamically adjusts clock frequencies to different components, reducing energy consumption while maintaining computational efficiency. The processor includes a register file that operates at a second frequency, distinct from the first frequency used by other processor components. The clock generator circuit receives a first clock signal at the first operating frequency and generates a second clock signal at the second operating frequency, which is then locally routed directly to the register file. This localized clock distribution minimizes signal propagation delays and power losses associated with global clock networks, improving overall system efficiency. The invention is particularly useful in low-power applications, such as mobile devices or embedded systems, where energy conservation is critical. By decoupling the register file's clock domain from the rest of the processor, the design allows for finer-grained power management and performance tuning, enabling adaptive operation based on workload demands. The localized routing of the second clock signal ensures low-latency access to the register file, enhancing performance without compromising energy efficiency.

Claim 3

Original Legal Text

3. The processor of claim 2 , wherein the clock generator circuit comprises: a first logic circuit having a first input to receive the first clock signal at the first operating frequency and a second input to receive a phase-delayed first clock signal at the first operating frequency, the first logic circuit to output a first interim clock signal; a second logic circuit having a first input to receive the first clock signal at the first operating frequency and a second input to receive the phase-delayed first clock signal at the first operating frequency, the second logic circuit to output a second interim clock signal; and a third logic circuit having a first input to receive the first interim clock signal and a second input to receive the second interim clock signal, the third logic circuit to output the second clock signal at the second operating frequency.

Plain English Translation

A clock generator circuit is designed to convert a first clock signal operating at a first frequency into a second clock signal operating at a higher second frequency. The circuit addresses the challenge of efficiently generating higher-frequency clock signals from lower-frequency inputs without introducing significant phase noise or signal distortion. The clock generator includes three logic circuits arranged in a cascaded configuration. The first logic circuit receives the original first clock signal and a phase-delayed version of the same signal, producing a first interim clock signal. The second logic circuit also receives the original and phase-delayed first clock signals, generating a second interim clock signal. The third logic circuit combines the first and second interim clock signals to produce the final second clock signal at the desired higher frequency. This approach leverages phase-delayed versions of the input signal to achieve frequency multiplication while maintaining signal integrity. The circuit is particularly useful in digital systems requiring precise timing synchronization, such as microprocessors, communication devices, and high-speed data processing units. The use of logic circuits ensures low-power operation and minimal latency, making it suitable for modern integrated circuit designs.

Claim 4

Original Legal Text

4. The processor of claim 1 , wherein the at least one graphics processor comprises: a first clock domain to issue first port requests and second port requests to the register file at the first operating frequency; and a first clock crossing domain including: a selection circuit to receive the first port requests and the second port requests and output a selected request at the first operating frequency; and a first latch circuit to receive the selected request at the first operating frequency and output the selected request to the register file at the second operating frequency.

Plain English Translation

This invention relates to a graphics processing system with a multi-domain clock architecture for efficient register file access. The system addresses the challenge of synchronizing data transfers between different clock domains in a graphics processor, particularly when accessing a shared register file. The register file operates at a second operating frequency, while the graphics processor operates at a first operating frequency, creating a need for reliable clock domain crossing to prevent data corruption or timing violations. The graphics processor includes a first clock domain that issues first and second port requests to the register file at the first operating frequency. These requests are processed by a first clock crossing domain, which ensures proper synchronization between the two clock domains. The clock crossing domain contains a selection circuit that receives and selects between the first and second port requests, outputting a selected request at the first operating frequency. A first latch circuit then receives this selected request and converts it to the second operating frequency before forwarding it to the register file. This design allows the register file to operate independently at its optimal frequency while maintaining data integrity and synchronization with the graphics processor. The system improves performance and reduces power consumption by avoiding unnecessary clock domain conversions for all register file accesses.

Claim 5

Original Legal Text

5. The processor of claim 4 , wherein the at least one graphics processor further comprises: a second clock domain including the register file; and a second clock crossing domain including: a second latch circuit to receive read data from the register file at the second operating frequency and output at least a first portion of the read data at the first operating frequency in response to a first port clock signal at the first operating frequency; and a third latch circuit to receive the read data from the register file at the second operating frequency and output at least a second portion of the read data at the first operating frequency in response to a second port clock signal at the first operating frequency, the second port clock signal having a different phase than the first port clock signal.

Plain English Translation

This invention relates to a graphics processor architecture designed to improve data transfer efficiency between clock domains operating at different frequencies. The problem addressed is the latency and complexity introduced when transferring data between asynchronous clock domains, particularly in high-performance graphics processing units (GPUs) where register files must interface with multiple functional units running at different clock speeds. The graphics processor includes a register file operating in a second clock domain at a second operating frequency. To facilitate data transfer to a first clock domain operating at a first operating frequency, the processor incorporates a second clock crossing domain. This domain contains two latch circuits: a second latch circuit and a third latch circuit. The second latch circuit receives read data from the register file at the second operating frequency and outputs a first portion of the data at the first operating frequency in response to a first port clock signal. The third latch circuit similarly receives the read data at the second operating frequency but outputs a second portion of the data at the first operating frequency in response to a second port clock signal, which has a different phase than the first port clock signal. This dual-latch approach ensures efficient and synchronized data transfer while minimizing latency and maintaining data integrity across clock domains. The architecture is particularly useful in GPUs where multiple functional units require simultaneous access to register file data at different clock frequencies.

Claim 6

Original Legal Text

6. The processor of claim 1 , wherein the at least one graphics processor comprises a streaming multiprocessor comprising: a plurality of arithmetic logic units coupled to the register file; a plurality of texture units coupled to the register file; and a shared memory coupled to the plurality of arithmetic logic units and the plurality of texture units.

Plain English Translation

A graphics processing system includes a streaming multiprocessor designed to enhance parallel processing capabilities for graphics and compute workloads. The streaming multiprocessor contains multiple arithmetic logic units and texture units, both connected to a shared register file, enabling efficient data access and manipulation. Additionally, a shared memory is coupled to both the arithmetic logic units and texture units, facilitating fast data exchange between these components. This architecture improves performance by allowing simultaneous execution of arithmetic operations and texture sampling, reducing bottlenecks in graphics rendering and general-purpose computing tasks. The shared memory further optimizes resource utilization by providing a common storage space for intermediate results and texture data, minimizing redundant data transfers. This design is particularly useful in modern graphics processing units (GPUs) where high-throughput parallel processing is essential for real-time rendering, machine learning, and other computationally intensive applications. The integration of arithmetic and texture units with shared memory enhances flexibility and efficiency, making the system suitable for a wide range of high-performance computing tasks.

Claim 7

Original Legal Text

7. The processor of claim 6 , further comprising scheduler logic to schedule groups of instructions.

Plain English Translation

A system for optimizing instruction processing in a computing device addresses inefficiencies in traditional instruction scheduling methods, which often fail to maximize processor performance due to suboptimal grouping and execution of instructions. The system includes a processor with scheduler logic designed to dynamically group instructions into optimized sets for execution. This logic evaluates dependencies, resource availability, and performance metrics to determine the most efficient grouping strategy. By intelligently organizing instructions, the system reduces pipeline stalls, improves cache utilization, and enhances overall throughput. The scheduler logic may employ predictive algorithms or machine learning models to adapt grouping strategies based on historical performance data and real-time conditions. Additionally, the system may integrate with other processor components, such as execution units and memory controllers, to ensure seamless coordination during instruction execution. This approach ensures that instructions are processed in a manner that minimizes latency and maximizes computational efficiency, particularly in multi-core or heterogeneous computing environments. The solution is applicable to various computing domains, including high-performance computing, embedded systems, and real-time processing applications.

Claim 8

Original Legal Text

8. The processor of claim 6 , wherein the plurality of arithmetic logic units are to perform operations on integer data types.

Plain English Translation

The invention relates to a processor architecture designed for efficient integer arithmetic operations. The processor includes multiple arithmetic logic units (ALUs) specifically configured to handle integer data types, enabling high-performance computation for tasks requiring precise integer calculations. These ALUs are optimized to execute operations such as addition, subtraction, multiplication, and logical operations on integer values, ensuring fast and accurate results. The architecture may also include additional components, such as registers and control logic, to manage data flow and instruction execution. This design is particularly useful in applications where integer arithmetic is dominant, such as cryptography, digital signal processing, and embedded systems, where processing efficiency and speed are critical. By dedicating ALUs to integer operations, the processor avoids the overhead of floating-point or other non-integer computations, improving overall performance and energy efficiency. The system may further integrate with other processor elements to support complex workloads while maintaining low-latency integer processing.

Claim 9

Original Legal Text

9. The processor of claim 6 , further comprising at least one memory unit.

Plain English Translation

A system for processing data includes a processor configured to execute instructions and at least one memory unit coupled to the processor. The memory unit stores data and instructions that the processor accesses during operation. The processor is designed to perform specific computational tasks, such as data analysis, signal processing, or control operations, depending on the stored instructions. The memory unit may include volatile or non-volatile storage, such as RAM, ROM, or flash memory, to retain data and program code. The processor and memory unit work together to execute tasks efficiently, ensuring data integrity and processing speed. This configuration is useful in computing devices, embedded systems, or specialized hardware where reliable data storage and processing are required. The system may also include additional components like input/output interfaces or communication modules to enhance functionality. The memory unit ensures that the processor has access to necessary data and instructions, enabling seamless execution of operations. This design addresses the need for efficient data handling and processing in various technological applications.

Claim 10

Original Legal Text

10. The processor of claim 9 , wherein the memory unit comprises a load and store unit.

Plain English Translation

A system for processing data includes a processor with a memory unit that handles data loading and storage operations. The memory unit is designed to manage the transfer of data between the processor and external memory, optimizing performance and efficiency. The processor further includes a data processing unit that executes instructions and performs arithmetic and logical operations on the data. The memory unit is integrated with the processor to reduce latency and improve throughput, ensuring fast access to frequently used data. This design is particularly useful in high-performance computing environments where rapid data access and processing are critical. The system may also include additional components such as a cache memory to further enhance performance by storing frequently accessed data closer to the processor. The memory unit's load and store operations are optimized to minimize delays, ensuring that data is quickly retrieved from or written to memory as needed. This architecture supports efficient data handling, reducing bottlenecks and improving overall system performance. The processor's design allows for seamless integration with various computing systems, making it suitable for applications requiring high-speed data processing and low-latency memory access.

Claim 11

Original Legal Text

11. The processor of claim 1 , wherein the at least one processor core and the at least one graphics processor are integrated in a single package.

Plain English Translation

This invention relates to integrated circuit design, specifically addressing the challenge of efficiently combining processor cores and graphics processors within a single package to improve performance, power efficiency, and thermal management. The invention describes a system-on-chip (SoC) architecture where at least one processor core and at least one graphics processor are integrated into a unified package. This integration reduces communication latency between the processor and graphics components by eliminating the need for external interconnects, leading to faster data transfer and lower power consumption. The design also simplifies thermal management by consolidating heat-generating components into a single package, reducing the complexity of cooling solutions. Additionally, the unified package minimizes physical space requirements, making it suitable for compact devices such as smartphones, tablets, and embedded systems. The graphics processor may include dedicated hardware for rendering, shading, or other graphics-related tasks, while the processor core handles general computing functions. The integration ensures seamless coordination between the two components, enhancing overall system performance for tasks requiring both computational and graphical processing, such as gaming, video editing, and real-time rendering applications.

Claim 12

Original Legal Text

12. A graphics processor comprising: a streaming multiprocessor comprising: a register file comprising a register file having a plurality of entries, wherein at least a portion of the graphics processor is to operate at a first operating frequency and the register file is to operate at a second operating frequency greater than the first operating frequency, to enable the graphics processor to issue a plurality of write requests to the register file in a single clock cycle at the first operating frequency and receive a plurality of data elements of a plurality of read requests from the register file in the single clock cycle at the first operating frequency; a plurality of arithmetic logic units coupled to the register file; a plurality of texture units coupled to the register file; and a shared memory coupled to the plurality of arithmetic logic units and the plurality of texture units.

Plain English Translation

A graphics processor includes a streaming multiprocessor with a register file, arithmetic logic units, texture units, and shared memory. The register file operates at a higher frequency than the rest of the graphics processor, allowing it to handle multiple write and read requests in a single clock cycle at the processor's lower operating frequency. This design enables efficient data processing by supporting concurrent access to the register file, improving throughput for graphics computations. The arithmetic logic units and texture units are coupled to the register file, facilitating fast data exchange for operations like arithmetic calculations and texture sampling. The shared memory provides additional storage accessible by both the arithmetic logic units and texture units, further enhancing performance. The higher-frequency register file ensures that the graphics processor can sustain high data throughput without bottlenecks, particularly in tasks requiring frequent register access, such as shader computations or texture mapping. This architecture optimizes performance by decoupling the register file's operating frequency from the rest of the processor, allowing the system to scale efficiently with varying workload demands.

Claim 13

Original Legal Text

13. The graphics processor of claim 12 , wherein the plurality of arithmetic logic units are to perform operations on integer data types.

Plain English Translation

The invention relates to a graphics processor designed to enhance computational efficiency in graphics processing tasks. Traditional graphics processors often struggle with handling integer arithmetic operations efficiently, which are critical for tasks such as vertex processing, pixel shading, and other rendering operations. This inefficiency can lead to bottlenecks in performance, particularly in applications requiring precise integer calculations. The graphics processor includes multiple arithmetic logic units (ALUs) specifically configured to perform operations on integer data types. These ALUs are optimized to execute integer arithmetic, logical, and bitwise operations with high throughput and low latency. By dedicating hardware resources to integer operations, the processor avoids the overhead associated with converting between floating-point and integer formats, improving overall performance. The ALUs may also support parallel processing of integer data, allowing multiple operations to be executed simultaneously, further enhancing efficiency. Additionally, the graphics processor may include other components, such as a control unit to manage the execution of integer operations and a memory interface to handle data transfer between the ALUs and memory. The processor may also support various integer data types, including signed and unsigned integers of different bit widths, ensuring compatibility with a wide range of applications. This design enables the graphics processor to efficiently handle tasks that require precise integer calculations, such as geometric transformations, rasterization, and post-processing effects, without compromising performance.

Claim 14

Original Legal Text

14. The graphics processor of claim 12 , further comprising at least one special function unit.

Plain English Translation

A graphics processor is designed to accelerate rendering tasks in computer graphics applications. Traditional graphics processors often struggle with efficiently handling specialized mathematical operations, such as trigonometric, logarithmic, or exponential functions, which are common in graphics computations. These operations typically require additional processing cycles or dedicated hardware, leading to inefficiencies in performance and power consumption. To address this, a graphics processor includes at least one special function unit. This unit is specifically optimized to perform complex mathematical operations, such as sine, cosine, square root, or reciprocal calculations, with high precision and low latency. By integrating this specialized hardware, the graphics processor can offload these computationally intensive tasks from the general-purpose processing units, improving overall throughput and reducing energy consumption. The special function unit may be designed as a standalone component or integrated into existing processing pipelines, ensuring seamless operation within the graphics processor's architecture. This enhancement allows for faster rendering times and better performance in real-time graphics applications, such as gaming, virtual reality, and scientific visualization.

Claim 15

Original Legal Text

15. The graphics processor of claim 12 , wherein the graphics processor is to couple to a host processor via a high speed interconnect.

Plain English Translation

A graphics processor is designed to enhance computational efficiency in graphics processing tasks by integrating specialized hardware components. The processor includes a plurality of execution units configured to perform parallel processing of graphics data, such as vertex shading, fragment shading, and texture mapping. These execution units are optimized for high-throughput operations, reducing latency in rendering pipelines. The processor also features a memory management unit that dynamically allocates memory resources to different processing tasks, improving overall system performance. Additionally, the processor incorporates a cache hierarchy to minimize access delays to frequently used data, further optimizing performance. The graphics processor is specifically designed to interface with a host processor through a high-speed interconnect, enabling rapid data transfer and synchronization between the two processors. This interconnect supports low-latency communication, allowing the graphics processor to efficiently receive instructions and data from the host while returning processed results in real-time. The high-speed interconnect ensures that the graphics processor can operate at peak efficiency, reducing bottlenecks in graphics-intensive applications. This architecture is particularly beneficial in systems requiring real-time rendering, such as gaming, virtual reality, and high-performance computing environments. The integration of these components allows the graphics processor to deliver superior performance in graphics processing tasks while maintaining compatibility with existing host processor architectures.

Claim 16

Original Legal Text

16. The graphics processor of claim 15 , further comprising a single package comprising the host processor and the graphics processor.

Plain English Translation

A graphics processor is integrated with a host processor into a single package to improve performance and reduce latency in computing systems. The system addresses the challenge of high latency and inefficient data transfer between separate host and graphics processors, which can degrade performance in applications requiring rapid data exchange, such as real-time rendering, machine learning, and high-performance computing. The integrated package eliminates the need for external communication interfaces, reducing latency and power consumption while increasing bandwidth. The graphics processor includes a memory controller for managing data transfers between the host processor and the graphics processor, ensuring efficient access to shared memory resources. The system may also include a cache hierarchy to further optimize data access and reduce bottlenecks. The integration of the host and graphics processors in a single package enhances computational efficiency, enabling faster processing of graphics and parallel workloads. This design is particularly beneficial for applications requiring low-latency, high-bandwidth communication between the host and graphics processors, such as gaming, virtual reality, and AI-driven graphics processing.

Claim 17

Original Legal Text

17. The graphics processor of claim 12 , further comprising a clock generator circuit to receive a first clock signal at the first operating frequency and provide a second clock signal to the register file at the second operating frequency, wherein the clock generator circuit is to locally route the second clock signal to the register file.

Plain English Translation

This invention relates to graphics processing and addresses the challenge of efficiently managing clock signals in a graphics processor to optimize performance and power consumption. The invention involves a graphics processor with a register file that operates at a second frequency, different from the first frequency of the main clock signal. A clock generator circuit is integrated into the processor to receive the first clock signal and generate a second clock signal at the second frequency, which is then locally routed directly to the register file. This localized routing minimizes signal propagation delays and reduces power consumption by avoiding unnecessary clock distribution across the entire processor. The register file, which stores intermediate data during graphics processing, benefits from this dedicated clocking scheme, allowing it to operate at an optimal frequency independent of the main processor clock. This design improves efficiency by dynamically adjusting the register file's clock frequency based on workload demands, ensuring balanced performance and energy usage. The invention is particularly useful in modern graphics processors where power efficiency and high-speed data processing are critical.

Patent Metadata

Filing Date

Unknown

Publication Date

October 27, 2020

Inventors

Iqbal R. Rajwani
Altug Koker
Bhushan M. Borole
Kamal Sinha
Abhishek R. Appu
Anupama A. Thaploo
Sunil Nekkanti
Wenyin Fu

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, FAQs, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “System, Apparatus And Method For Providing A Local Clock Signal For A Memory Array” (10817012). https://patentable.app/patents/10817012

© 2026 Nomic Interactive Technology LLC. Machine-readable context available at /api/llm-context/10817012. See llms.txt for full attribution policy.

System, Apparatus And Method For Providing A Local Clock Signal For A Memory Array