US-11461257

Digital signal processing circuit and corresponding method of operation

PublishedOctober 4, 2022

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

An embodiment circuit comprises a plurality of processing units, a plurality of data memory banks configured to store data, and a plurality of coefficient memory banks configured to store twiddle factors for fast Fourier transform processing. The processing units are configured to fetch, at each of the FFT computation stages, input data from the data memory banks with a burst read memory transaction, fetch, at each of the FFT computation cycles, different twiddle factors in a respective set of the twiddle factors from different coefficient memory banks of the coefficient memory banks, process the input data and the set of twiddle factors to generate output data, and store, at each of the FFT computation stages, the output data into the data memory banks with a burst write memory transaction.

Patent Claims

10 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 5

Original Legal Text

5. The circuit of claim 4, wherein the multiplexer circuits arranged in the ordered sequence are selectively couplable in groups of daisy-chained multiplexer circuits, the groups having a cardinality that is a function of the current FFT computation stage.

Plain English Translation

This invention relates to digital signal processing, specifically to a circuit architecture for fast Fourier transform (FFT) computations. The problem addressed is the efficient implementation of FFT algorithms in hardware, particularly in reducing latency and improving throughput by optimizing the data flow through multiplexer circuits. The circuit includes a plurality of multiplexer circuits arranged in an ordered sequence, where each multiplexer circuit is configured to selectively route data between input and output ports based on control signals. The multiplexer circuits are further arranged to form groups of daisy-chained multiplexer circuits, where the size of each group (cardinality) dynamically adjusts according to the current stage of the FFT computation. This grouping allows for parallel processing of data at different stages, reducing the overall computation time. The circuit also includes control logic to manage the routing of data through the multiplexer circuits, ensuring that data is processed in the correct sequence for the FFT algorithm. The dynamic grouping of multiplexer circuits optimizes the data flow, minimizing bottlenecks and improving efficiency. The invention is particularly useful in high-performance computing applications where low-latency FFT processing is required.

Claim 6

Original Legal Text

6. The circuit of claim 5, wherein the cardinality of the groups of daisy-chained multiplexer circuits is equal to 2stage, wherein stage is a progressive number indicative of the current FFT computation stage, a first FFT computation stage being identified by numeral zero, the cardinality of the groups of daisy-chained multiplexer circuits being limited to the number P of processing units.

Plain English Translation

This invention relates to digital signal processing, specifically to the architecture of Fast Fourier Transform (FFT) computation circuits. The problem addressed is optimizing the hardware implementation of FFT algorithms by efficiently managing data routing through multiplexer circuits during different computation stages. The circuit includes groups of daisy-chained multiplexer circuits, where the number of groups (cardinality) is determined by the current FFT computation stage. The stage is represented by a progressive number starting from zero for the first stage. The cardinality of these multiplexer groups is constrained by the number of available processing units (P), ensuring that the hardware resources are utilized efficiently without exceeding the system's processing capacity. Each multiplexer group routes data between processing units during FFT computations, with the daisy-chained configuration allowing sequential data propagation. The stage-based cardinality adjustment ensures that the multiplexer network adapts dynamically to the computational requirements of each FFT stage, reducing unnecessary data routing overhead and improving processing efficiency. This approach is particularly useful in high-performance FFT implementations where hardware resource allocation must be carefully managed to balance speed and power consumption.

Claim 7

Original Legal Text

7. The circuit of claim 4, wherein the respective power-of-2 counter circuit is configured to update a respective counter register value at each FFT computation cycle, wherein updating the respective counter register value comprises adding to a previously stored counter register value an offset value computed as a function of the current FFT computation cycle.

Plain English translation pending...

Claim 8

Original Legal Text

8. The circuit of claim 1, wherein a burst length of the burst read memory transactions and the burst write memory transactions is equal to N/2P, and a burst stride of the burst read memory transactions and the burst write memory transactions is computed at each FFT computation stage as a function of the number P of processing units.

Plain English Translation

This invention relates to memory transaction optimization in digital signal processing systems, particularly for fast Fourier transform (FFT) computations. The problem addressed is inefficient memory access patterns during FFT processing, which can lead to performance bottlenecks due to suboptimal burst lengths and strides in memory transactions. The circuit includes multiple processing units configured to perform FFT computations on input data. The memory transactions for reading and writing data during these computations are organized as burst transactions, where each burst has a defined length and stride. The burst length is set to N/2P, where N is the total number of data points in the FFT computation and P is the number of processing units. The burst stride is dynamically computed at each FFT computation stage based on the number of processing units, ensuring efficient memory access patterns that align with the parallel processing capabilities of the system. By optimizing the burst length and stride in this manner, the circuit improves memory bandwidth utilization and reduces latency in FFT computations. The dynamic adjustment of the burst stride at each computation stage ensures that memory accesses are aligned with the data dependencies of the FFT algorithm, further enhancing performance. This approach is particularly beneficial in systems where FFT computations are performed on large datasets or in real-time applications where low latency is critical.

Claim 9

Original Legal Text

9. The circuit of claim 1, wherein each of the coefficient memory banks comprises a number N/2P of rows, and wherein a number N/2 of the twiddle factors are stored without repetition in the plurality of coefficient memory banks according to a low-order interleaving scheme or a standard interleaving scheme.

Plain English translation pending...

Claim 10

Original Legal Text

10. The circuit of claim 9, wherein a row having index j of a respective coefficient memory bank having index i has stored therein a twiddle factor having index i+jP.

Plain English Translation

A digital signal processing system includes a plurality of coefficient memory banks for storing twiddle factors used in fast Fourier transform (FFT) computations. Each memory bank is associated with a specific index i, and each row within a memory bank is associated with a row index j. The system is designed to optimize memory access patterns during FFT computations by storing twiddle factors in a structured manner. Specifically, a twiddle factor with index i+jP is stored in the row with index j of the coefficient memory bank with index i. This arrangement allows for efficient retrieval of twiddle factors during FFT operations, reducing memory access latency and improving computational efficiency. The system may include multiple coefficient memory banks, each storing a subset of the total twiddle factors required for the FFT computation. The structured storage method ensures that twiddle factors are accessed in a predictable and optimized sequence, enhancing performance in real-time signal processing applications. The system may be implemented in hardware, such as an application-specific integrated circuit (ASIC) or field-programmable gate array (FPGA), to support high-speed FFT computations in communication systems, radar, or other digital signal processing applications.

Claim 11

Original Legal Text

11. The circuit of claim 1, wherein the plurality of data memory banks comprises a number of data memory banks equal to twice the number P of processing units.

Plain English translation pending...

Claim 16

Original Legal Text

16. The method of claim 15, wherein the cardinality of the groups of daisy-chained multiplexer circuits is equal to 2stage, where stage is a progressive number indicative of the current FFT computation stage, a first FFT computation stage being identified by numeral zero, the cardinality of the groups of daisy-chained multiplexer circuits being limited to a number P of processing units.

Plain English translation pending...

Claim 20

Original Legal Text

20. The method of claim 19, wherein a row having index j of a respective coefficient memory bank having index i has stored therein a twiddle factor having index i+jP.

Plain English Translation

The invention relates to digital signal processing, specifically to efficient storage and retrieval of twiddle factors in fast Fourier transform (FFT) computations. The problem addressed is optimizing memory access patterns to reduce latency and improve computational efficiency in FFT algorithms, particularly in hardware implementations. The method involves organizing twiddle factors in a memory system with multiple banks. Each bank is indexed by an integer i, and each row within a bank is indexed by an integer j. The key innovation is the specific addressing scheme used to store twiddle factors, where a twiddle factor with index i+jP is stored in row j of bank i. This arrangement enables parallel access to multiple twiddle factors during FFT computations, reducing memory access bottlenecks. The method further includes selecting a bank index i and a row index j based on the FFT computation stage and the input data being processed. The twiddle factor is then retrieved from the memory location determined by the addressing scheme. This approach minimizes memory conflicts and maximizes throughput by allowing simultaneous access to different memory banks. The addressing scheme is particularly useful in pipelined FFT architectures, where multiple stages of the FFT computation are processed concurrently. By distributing twiddle factors across multiple memory banks and using the described indexing method, the system achieves efficient memory utilization and reduces the overall computation time. The method is applicable to both radix-2 and mixed-radix FFT implementations.

Claim 21

Original Legal Text

21. The method of claim 12, the plurality of data memory banks comprising a number of data memory banks equal to twice a number P of processing units.

Plain English translation pending...

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G06F

Patent Metadata

Filing Date

June 4, 2021

Publication Date

October 4, 2022

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search