An embodiment circuit comprises a plurality of processing units, a plurality of data memory banks configured to store data, and a plurality of coefficient memory banks configured to store twiddle factors for fast Fourier transform processing. The processing units are configured to fetch, at each of the FFT computation stages, input data from the data memory banks with a burst read memory transaction, fetch, at each of the FFT computation cycles, different twiddle factors in a respective set of the twiddle factors from different coefficient memory banks of the coefficient memory banks, process the input data and the set of twiddle factors to generate output data, and store, at each of the FFT computation stages, the output data into the data memory banks with a burst write memory transaction.
Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
5. The circuit of claim 4, wherein the multiplexer circuits arranged in the ordered sequence are selectively couplable in groups of daisy-chained multiplexer circuits, the groups having a cardinality that is a function of the current FFT computation stage.
This invention relates to digital signal processing, specifically to a circuit architecture for fast Fourier transform (FFT) computations. The problem addressed is the efficient implementation of FFT algorithms in hardware, particularly in reducing latency and improving throughput by optimizing the data flow through multiplexer circuits. The circuit includes a plurality of multiplexer circuits arranged in an ordered sequence, where each multiplexer circuit is configured to selectively route data between input and output ports based on control signals. The multiplexer circuits are further arranged to form groups of daisy-chained multiplexer circuits, where the size of each group (cardinality) dynamically adjusts according to the current stage of the FFT computation. This grouping allows for parallel processing of data at different stages, reducing the overall computation time. The circuit also includes control logic to manage the routing of data through the multiplexer circuits, ensuring that data is processed in the correct sequence for the FFT algorithm. The dynamic grouping of multiplexer circuits optimizes the data flow, minimizing bottlenecks and improving efficiency. The invention is particularly useful in high-performance computing applications where low-latency FFT processing is required.
6. The circuit of claim 5, wherein the cardinality of the groups of daisy-chained multiplexer circuits is equal to 2stage, wherein stage is a progressive number indicative of the current FFT computation stage, a first FFT computation stage being identified by numeral zero, the cardinality of the groups of daisy-chained multiplexer circuits being limited to the number P of processing units.
This invention relates to digital signal processing, specifically to the architecture of Fast Fourier Transform (FFT) computation circuits. The problem addressed is optimizing the hardware implementation of FFT algorithms by efficiently managing data routing through multiplexer circuits during different computation stages. The circuit includes groups of daisy-chained multiplexer circuits, where the number of groups (cardinality) is determined by the current FFT computation stage. The stage is represented by a progressive number starting from zero for the first stage. The cardinality of these multiplexer groups is constrained by the number of available processing units (P), ensuring that the hardware resources are utilized efficiently without exceeding the system's processing capacity. Each multiplexer group routes data between processing units during FFT computations, with the daisy-chained configuration allowing sequential data propagation. The stage-based cardinality adjustment ensures that the multiplexer network adapts dynamically to the computational requirements of each FFT stage, reducing unnecessary data routing overhead and improving processing efficiency. This approach is particularly useful in high-performance FFT implementations where hardware resource allocation must be carefully managed to balance speed and power consumption.
7. The circuit of claim 4, wherein the respective power-of-2 counter circuit is configured to update a respective counter register value at each FFT computation cycle, wherein updating the respective counter register value comprises adding to a previously stored counter register value an offset value computed as a function of the current FFT computation cycle.
8. The circuit of claim 1, wherein a burst length of the burst read memory transactions and the burst write memory transactions is equal to N/2P, and a burst stride of the burst read memory transactions and the burst write memory transactions is computed at each FFT computation stage as a function of the number P of processing units.
This invention relates to memory transaction optimization in digital signal processing systems, particularly for fast Fourier transform (FFT) computations. The problem addressed is inefficient memory access patterns during FFT processing, which can lead to performance bottlenecks due to suboptimal burst lengths and strides in memory transactions. The circuit includes multiple processing units configured to perform FFT computations on input data. The memory transactions for reading and writing data during these computations are organized as burst transactions, where each burst has a defined length and stride. The burst length is set to N/2P, where N is the total number of data points in the FFT computation and P is the number of processing units. The burst stride is dynamically computed at each FFT computation stage based on the number of processing units, ensuring efficient memory access patterns that align with the parallel processing capabilities of the system. By optimizing the burst length and stride in this manner, the circuit improves memory bandwidth utilization and reduces latency in FFT computations. The dynamic adjustment of the burst stride at each computation stage ensures that memory accesses are aligned with the data dependencies of the FFT algorithm, further enhancing performance. This approach is particularly beneficial in systems where FFT computations are performed on large datasets or in real-time applications where low latency is critical.
9. The circuit of claim 1, wherein each of the coefficient memory banks comprises a number N/2P of rows, and wherein a number N/2 of the twiddle factors are stored without repetition in the plurality of coefficient memory banks according to a low-order interleaving scheme or a standard interleaving scheme.
10. The circuit of claim 9, wherein a row having index j of a respective coefficient memory bank having index i has stored therein a twiddle factor having index i+jP.
A digital signal processing system includes a plurality of coefficient memory banks for storing twiddle factors used in fast Fourier transform (FFT) computations. Each memory bank is associated with a specific index i, and each row within a memory bank is associated with a row index j. The system is designed to optimize memory access patterns during FFT computations by storing twiddle factors in a structured manner. Specifically, a twiddle factor with index i+jP is stored in the row with index j of the coefficient memory bank with index i. This arrangement allows for efficient retrieval of twiddle factors during FFT operations, reducing memory access latency and improving computational efficiency. The system may include multiple coefficient memory banks, each storing a subset of the total twiddle factors required for the FFT computation. The structured storage method ensures that twiddle factors are accessed in a predictable and optimized sequence, enhancing performance in real-time signal processing applications. The system may be implemented in hardware, such as an application-specific integrated circuit (ASIC) or field-programmable gate array (FPGA), to support high-speed FFT computations in communication systems, radar, or other digital signal processing applications.
11. The circuit of claim 1, wherein the plurality of data memory banks comprises a number of data memory banks equal to twice the number P of processing units.
16. The method of claim 15, wherein the cardinality of the groups of daisy-chained multiplexer circuits is equal to 2stage, where stage is a progressive number indicative of the current FFT computation stage, a first FFT computation stage being identified by numeral zero, the cardinality of the groups of daisy-chained multiplexer circuits being limited to a number P of processing units.
20. The method of claim 19, wherein a row having index j of a respective coefficient memory bank having index i has stored therein a twiddle factor having index i+jP.
The invention relates to digital signal processing, specifically to efficient storage and retrieval of twiddle factors in fast Fourier transform (FFT) computations. The problem addressed is optimizing memory access patterns to reduce latency and improve computational efficiency in FFT algorithms, particularly in hardware implementations. The method involves organizing twiddle factors in a memory system with multiple banks. Each bank is indexed by an integer i, and each row within a bank is indexed by an integer j. The key innovation is the specific addressing scheme used to store twiddle factors, where a twiddle factor with index i+jP is stored in row j of bank i. This arrangement enables parallel access to multiple twiddle factors during FFT computations, reducing memory access bottlenecks. The method further includes selecting a bank index i and a row index j based on the FFT computation stage and the input data being processed. The twiddle factor is then retrieved from the memory location determined by the addressing scheme. This approach minimizes memory conflicts and maximizes throughput by allowing simultaneous access to different memory banks. The addressing scheme is particularly useful in pipelined FFT architectures, where multiple stages of the FFT computation are processed concurrently. By distributing twiddle factors across multiple memory banks and using the described indexing method, the system achieves efficient memory utilization and reduces the overall computation time. The method is applicable to both radix-2 and mixed-radix FFT implementations.
21. The method of claim 12, the plurality of data memory banks comprising a number of data memory banks equal to twice a number P of processing units.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
June 4, 2021
October 4, 2022
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.