Patentable/Patents/US-11520582
US-11520582

Carry chain for SIMD operations

PublishedDecember 6, 2022
Assigneenot available in USPTO data we have
Inventorsnot available in USPTO data we have
Technical Abstract

Examples of a carry chain for performing an operation on operands each including elements of a selectable size is provided. Advantageously, the carry chain adapts to elements of different sizes. The carry chain determines a mask based on a selected size of an element. The carry chain selects, based on the mask, whether to carry a partial result of an operation performed on corresponding first portions of a first operand and a second operand into a next operation. The next operation is performed on corresponding second portions of the first operand and the second operand, and, based on the selection, the partial result of the operation. The carry chain stores, in a memory, a result formed from outputs of the operation and the next operation.

Patent Claims
16 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 2

Original Legal Text

2. The method of claim 1, further comprising determining a carry-indicator based on the first portion of the first operand and the first portion of the second operand, wherein the carry-indicator indicates one of: i) the carry-out is generated in the first slice, and ii) the carry-out is propagated to the first slice as a carry-in from a slice that precedes the first slice.

Plain English Translation

This invention relates to digital arithmetic processing, specifically to methods for handling carry propagation in parallel arithmetic operations. The problem addressed is the inefficiency in carry propagation during multi-bit arithmetic operations, such as addition or subtraction, where delays in carry propagation can bottleneck performance. The method involves processing operands in slices, where each slice handles a portion of the operands. A carry-indicator is determined based on the first portions of the first and second operands. This carry-indicator signals whether a carry-out is generated within the first slice or if it is propagated into the first slice as a carry-in from a preceding slice. This allows for optimized carry handling, reducing delays and improving computational efficiency. The method includes generating a carry-out from the first slice based on the first portions of the operands and the carry-indicator. The carry-out is then propagated to a subsequent slice for further processing. The carry-indicator ensures that the carry-out is correctly managed, whether it originates within the first slice or is received from an earlier slice. This approach enhances the speed and accuracy of arithmetic operations in digital circuits.

Claim 3

Original Legal Text

3. The method of claim 2, further comprising storing the first partial result and the carry-indicator in a register of the pipelined carry chain.

Plain English Translation

The invention relates to digital signal processing, specifically to efficient computation in pipelined arithmetic circuits. The problem addressed is the delay and complexity in performing multi-bit arithmetic operations, such as addition, in high-speed processing systems. Traditional carry chains introduce latency due to sequential propagation, which limits performance in pipelined architectures. The invention improves this by implementing a pipelined carry chain that processes partial results and carry indicators in parallel stages. The method involves computing a first partial result of an arithmetic operation, such as addition, and generating a carry-indicator that represents the carry-out from the operation. The partial result and carry-indicator are then stored in a register within the pipelined carry chain. This allows subsequent stages to process the next set of bits while the previous stage's carry is resolved, reducing overall latency. The pipelined structure enables higher throughput by overlapping operations across multiple stages, making it suitable for high-performance computing applications. The invention optimizes arithmetic operations by minimizing carry propagation delays and improving efficiency in digital signal processing systems.

Claim 4

Original Legal Text

4. The method of claim 1, further comprising determining a second partial result based on the first partial result and a carry-in from a slice that precedes the first slice.

Plain English Translation

This invention relates to digital signal processing, specifically to methods for performing arithmetic operations in a pipelined or parallel processing architecture. The problem addressed is the efficient computation of arithmetic results in systems where operations are divided into slices or stages, particularly when carry propagation between slices must be managed to ensure accurate results. The method involves computing a first partial result for a given slice of an arithmetic operation, such as addition or multiplication. This partial result is derived from input data processed within that slice. To improve accuracy, the method further includes determining a second partial result by incorporating the first partial result along with a carry-in value from a preceding slice. This carry-in value represents the overflow or borrow from the prior stage, ensuring that the arithmetic operation accounts for intermediate carry propagation. The second partial result is then used to refine the final output, allowing for correct computation even in systems where operations are distributed across multiple slices. This approach is particularly useful in high-performance computing environments, such as digital signal processors or application-specific integrated circuits (ASICs), where pipelining or parallel processing is employed to enhance throughput. By explicitly handling carry propagation between slices, the method ensures numerical correctness while maintaining efficiency. The technique can be applied to various arithmetic operations, including addition, subtraction, and multiplication, where carry or borrow values must be propagated across multiple processing stages.

Claim 5

Original Legal Text

5. The method of claim 1, further comprising selecting a second partial result from a plurality of partial results using a carry-in from a slice that precedes the first slice.

Plain English Translation

The invention relates to digital signal processing, specifically to methods for performing arithmetic operations in a pipelined or parallel processing system. The problem addressed is the efficient handling of carry values in modular arithmetic operations, such as those used in cryptographic algorithms, where intermediate results must be processed in stages while maintaining accuracy. The method involves selecting a second partial result from a plurality of partial results based on a carry-in value from a preceding processing stage or slice. This selection is used to correct or adjust the final result of an arithmetic operation, ensuring that the carry propagation is handled correctly across multiple processing stages. The method is particularly useful in systems where operations are divided into slices or stages, such as in hardware accelerators or parallel processors, where carry values must be managed to avoid errors in the final output. The selection process may involve comparing the carry-in value to a threshold or using it to index into a lookup table of possible partial results. The method ensures that the arithmetic operation produces an accurate result even when intermediate stages generate carry values that affect the final computation. This approach improves efficiency and reliability in modular arithmetic operations, which are critical in applications like encryption, decryption, and digital signatures.

Claim 7

Original Legal Text

7. The method of claim 1, wherein the first portion of the first operand comprises bits comprising the least significant bit of the first operand, and wherein the first portion of the second operand comprises bits comprising the least significant bit of the second operand.

Plain English Translation

This invention relates to a method for processing operands in a computing system, specifically focusing on the handling of least significant bits (LSBs) during arithmetic or logical operations. The method addresses the challenge of efficiently managing bit-level operations, particularly when dealing with partial operands, to optimize performance and reduce computational overhead. The method involves selecting a first portion of a first operand and a first portion of a second operand, where each first portion includes the least significant bit (LSB) of its respective operand. These portions are then processed in a manner that leverages their positional significance, such as in bitwise operations, arithmetic computations, or parallel processing tasks. By isolating and prioritizing the LSBs, the method ensures accurate and efficient handling of low-order bits, which are often critical in operations like modular arithmetic, error detection, or data alignment. The approach may be applied in various computational contexts, including digital signal processing, cryptographic algorithms, or hardware acceleration, where precise bit manipulation is essential. The method can be implemented in software, firmware, or hardware, depending on the specific application requirements. The focus on LSBs allows for optimized performance, reduced latency, and improved resource utilization in systems where bit-level precision is paramount.

Claim 8

Original Legal Text

8. The method of claim 1, wherein the selector mask comprises: a first bit value corresponding to the first slice, and a second bit value corresponding to the second slice.

Plain English Translation

This invention relates to a method for processing data in a computing system, specifically for managing data slices within a memory or storage system. The problem addressed is the efficient selection and manipulation of specific data slices to optimize performance, reduce latency, or improve resource utilization. The method involves using a selector mask to control access to multiple data slices. The selector mask includes a first bit value associated with a first data slice and a second bit value associated with a second data slice. These bit values determine whether each corresponding slice is selected or deselected for an operation. The selector mask can be used to enable or disable access to individual slices, allowing for fine-grained control over data processing tasks. The method may also involve additional steps such as reading, writing, or modifying the data slices based on the selector mask. The selector mask can be dynamically updated to adjust which slices are active or inactive, enabling flexible and adaptive data management. This approach is particularly useful in systems where different slices may require different processing priorities or where certain slices need to be isolated for security or performance reasons. The use of a selector mask with distinct bit values for each slice provides a scalable and efficient way to manage data access in large-scale computing environments.

Claim 10

Original Legal Text

10. The method of claim 1, wherein said selecting is performed by a multiplexer of the pipelined carry chain, and wherein the multiplexer is coupled between the first adder and the second adder.

Plain English Translation

This invention relates to pipelined arithmetic circuits, specifically improving carry chain selection in high-speed adders. The problem addressed is the inefficiency and latency in traditional carry chain designs, which can bottleneck performance in pipelined arithmetic operations. The invention describes a method for selecting carry values in a pipelined carry chain using a multiplexer positioned between two adders. The multiplexer dynamically routes carry signals based on operational requirements, optimizing data flow and reducing latency. The first adder generates partial sums and carry values, which are then processed by the multiplexer to determine the correct carry propagation path. The second adder receives the selected carry values to complete the arithmetic operation. This configuration enhances throughput by minimizing carry propagation delays and improving synchronization between pipeline stages. The multiplexer's placement between the adders ensures that carry selection is performed at an optimal point in the pipeline, balancing load distribution and reducing contention. The method is particularly useful in high-performance computing applications where low-latency arithmetic operations are critical. By integrating the multiplexer into the carry chain, the invention provides a more efficient and scalable solution compared to traditional carry selection techniques.

Claim 12

Original Legal Text

12. The system of claim 11, wherein the first adder also determines a carry-indicator based on the first portion of the first operand and the first portion of the second operand, wherein the carry-indicator indicates one of: i) the carry-out is generated in the first slice, and ii) the carry-out is propagated to the first slice as a carry-in from a slice that precedes the first slice.

Plain English Translation

This invention relates to a digital arithmetic system designed to improve the efficiency and accuracy of carry propagation in multi-slice adder circuits. The system addresses the challenge of managing carry signals in large-scale arithmetic operations, where delays and inaccuracies can occur due to the sequential nature of carry propagation across multiple slices. The system includes a first adder configured to process a first portion of a first operand and a first portion of a second operand. The first adder generates a sum output and a carry-out signal based on these portions. Additionally, the first adder determines a carry-indicator that distinguishes between two scenarios: either the carry-out is generated within the first slice itself, or the carry-out is propagated to the first slice as a carry-in from a preceding slice. This carry-indicator enhances the system's ability to track and manage carry signals, reducing latency and improving computational accuracy. The system may also include additional adders and logic circuits to handle subsequent operand portions, ensuring seamless carry propagation across multiple slices. The invention is particularly useful in high-performance computing applications where efficient arithmetic operations are critical.

Claim 13

Original Legal Text

13. The system of claim 12, wherein the first partial result and the carry-indicator are stored in a register of the pipelined carry chain.

Plain English Translation

The system relates to a pipelined carry chain used in digital arithmetic circuits, particularly for high-speed addition operations. The problem addressed is the efficient handling of carry propagation delays in multi-bit arithmetic operations, which can limit performance in digital processors and other computing systems. Traditional carry chains suffer from long propagation delays, which slow down operations in critical paths. The system includes a pipelined carry chain that processes partial results and carry-indicators in stages, reducing overall latency. The carry chain is designed to store intermediate results, including a first partial result and a carry-indicator, in a register within the pipelined structure. This allows the system to break down the addition process into smaller, manageable steps, enabling higher throughput and improved performance. The register storage ensures that intermediate values are preserved between pipeline stages, preventing data loss and ensuring correct computation. The pipelined approach minimizes the impact of carry propagation delays by processing multiple bits in parallel, while the register storage maintains data integrity across pipeline stages. This design is particularly useful in high-performance computing applications where fast arithmetic operations are critical.

Claim 14

Original Legal Text

14. The system of claim 12, wherein the determination logic also determines a second partial result based on the first partial result and a carry-in from the slice that precedes the first slice.

Plain English Translation

The invention relates to a system for performing arithmetic operations, specifically focusing on efficient computation of partial results in a multi-slice processing architecture. The problem addressed is the need to improve computational efficiency and accuracy in systems where arithmetic operations are divided across multiple slices, particularly when handling carry-in values from preceding slices. The system includes determination logic that computes a first partial result for a given slice based on input data. The logic further determines a second partial result by incorporating the first partial result and a carry-in value from the slice that precedes the first slice. This ensures that intermediate results are accurately propagated across slices, reducing errors and improving overall computational integrity. The system may be part of a larger arithmetic processing unit, such as a processor or a specialized accelerator, where operations like addition, subtraction, or multiplication are performed in a pipelined or parallel manner. The determination logic may use combinational or sequential logic to process the inputs and generate the partial results. The carry-in value is dynamically adjusted based on the preceding slice's output, ensuring that the second partial result reflects the correct arithmetic progression. This approach minimizes latency and enhances throughput by avoiding unnecessary delays in carry propagation. The system is particularly useful in high-performance computing environments where rapid and accurate arithmetic operations are critical.

Claim 15

Original Legal Text

15. The system of claim 12, wherein the determination logic also selects a second partial result from a plurality of predetermined partial results using a carry-in from the slice that precedes the first slice.

Plain English Translation

The invention relates to a system for performing arithmetic operations, specifically focusing on efficient computation of partial results in a multi-slice processing architecture. The problem addressed is the need to optimize carry propagation between adjacent slices during arithmetic operations, such as addition or multiplication, to improve computational efficiency and reduce latency. The system includes multiple processing slices, each configured to compute a partial result of an arithmetic operation. Each slice receives input data and generates a partial result based on its position in the sequence of slices. The determination logic within the system selects a first partial result from a set of predetermined partial results based on the input data and the slice's position. To enhance accuracy and efficiency, the determination logic also selects a second partial result from another set of predetermined partial results, using a carry-in value from the preceding slice. This carry-in value ensures proper propagation of intermediate results between slices, allowing the system to handle multi-bit arithmetic operations seamlessly. The system is designed to minimize delays associated with carry propagation by precomputing partial results and selecting the appropriate ones based on the carry-in from adjacent slices. This approach reduces the need for real-time carry calculations, improving overall performance. The invention is particularly useful in high-speed computing environments where rapid arithmetic operations are critical, such as in digital signal processing or cryptographic applications.

Claim 16

Original Legal Text

16. The system of claim 15, wherein the determination logic comprises a multiplexer that uses the carry-in from the slice that precedes the first slice to select the second partial result from the plurality of partial results.

Plain English Translation

The system relates to digital signal processing, specifically to a configurable arithmetic logic unit (ALU) for performing operations such as addition, subtraction, or other arithmetic functions. The problem addressed is the need for efficient and flexible computation in digital circuits, particularly in scenarios where multiple partial results are generated and must be selected based on carry-in signals from preceding computational slices. The system includes a configurable ALU with multiple slices, each generating a partial result. A determination logic module is integrated to select a specific partial result from the plurality of partial results generated by the slices. The determination logic includes a multiplexer that uses the carry-in signal from the slice preceding the first slice in the sequence to determine which partial result to select. This ensures that the correct partial result is chosen based on the carry-in value, improving computational accuracy and efficiency. The system is designed to be modular, allowing for scalable and reconfigurable arithmetic operations in digital circuits. The use of a multiplexer in the determination logic enables dynamic selection of partial results, enhancing flexibility in arithmetic computations.

Claim 17

Original Legal Text

17. The system of claim 15, wherein the plurality of partial results is determined using the first partial result without using the carry-in and is stored in memory coupled to the pipelined carry chain.

Plain English Translation

The invention relates to a pipelined carry chain system for arithmetic operations, particularly in digital signal processing or high-performance computing where fast carry propagation is critical. The problem addressed is the inefficiency in traditional carry chain designs, which often require full carry propagation through multiple stages, leading to latency and power consumption. The system includes a pipelined carry chain that processes arithmetic operations in stages, where each stage generates partial results. The key improvement involves determining a plurality of partial results using a first partial result without relying on a carry-in signal. This reduces dependency on carry propagation, improving speed and efficiency. The partial results are stored in memory coupled to the pipelined carry chain, allowing for faster access and further processing. The system may also include a carry propagation unit that generates a carry-out signal for subsequent stages, ensuring correct arithmetic operations. The pipelined carry chain may be configured to perform addition, subtraction, or other arithmetic operations, with the partial results being used to compute a final result. The memory coupled to the carry chain stores intermediate data, enabling efficient pipelining and reducing latency. This design is particularly useful in high-speed computing applications where minimizing carry propagation delays is essential.

Claim 18

Original Legal Text

18. The system of claim 11, wherein the first portion of the first operand comprises bits comprising the least significant bit of the first operand, and wherein the first portion of the second operand comprises bits comprising the least significant bit of the second operand.

Plain English Translation

This invention relates to a system for processing operands in a computing environment, specifically addressing the handling of least significant bits (LSBs) in arithmetic or logical operations. The system is designed to improve efficiency in operations where the LSBs of operands play a critical role, such as in modular arithmetic, cryptographic functions, or low-precision computations. The system includes a processing unit configured to perform operations on a first operand and a second operand, each divided into at least two portions. The first portion of each operand includes the least significant bit (LSB) of that operand, while the remaining bits form the second portion. By isolating the LSBs, the system can optimize operations that prioritize or depend on these bits, reducing computational overhead and improving performance. The processing unit may execute operations such as addition, subtraction, or bitwise operations, where the LSBs are processed separately from the remaining bits. This separation allows for specialized handling of the LSBs, which may be critical in applications like error detection, parity checks, or certain encryption algorithms. The system may also include logic to combine the results of the LSB operations with the results of the remaining bits to produce a final output. The invention aims to enhance computational efficiency by focusing on the LSBs, which are often the most significant in determining the outcome of certain operations. This approach is particularly useful in low-power or resource-constrained environments where minimizing computational complexity is essential.

Claim 19

Original Legal Text

19. The system of claim 11, wherein the selector mask comprises: a first bit value corresponding to the first slice, and a second bit value corresponding to the second slice.

Plain English Translation

The system relates to data processing, specifically to a method of managing data slices within a memory system. The problem addressed is efficiently selecting and accessing specific data slices in a memory architecture where data is divided into multiple slices for parallel processing or storage optimization. The system includes a selector mask that enables precise control over which slices are active or accessible during operations. The selector mask contains a first bit value corresponding to a first data slice and a second bit value corresponding to a second data slice. These bit values determine whether the respective slices are selected or deselected for operations such as reading, writing, or processing. The system may also include a memory controller that interprets the selector mask to route data to or from the selected slices, ensuring efficient and targeted data access. The selector mask can be dynamically updated to adapt to changing data access patterns, improving performance and resource utilization. This approach is particularly useful in high-performance computing, where selective slice access can reduce latency and power consumption.

Claim 20

Original Legal Text

20. The system of claim 11, wherein the a priori carry-in is set to a first value when the operation is an absolute value operation on positive-signed operands, and wherein the a priori carry-in is set to a second value different from the first value when the operation is an absolute value operation on negative-signed operands.

Plain English Translation

This invention relates to digital signal processing systems, specifically optimizing arithmetic operations for absolute value calculations. The system improves efficiency in computing absolute values of signed operands by dynamically adjusting the carry-in input to an arithmetic logic unit (ALU) based on the operand's sign. The ALU performs the absolute value operation by subtracting the operand from zero, where the carry-in value determines the correct result. For positive-signed operands, the carry-in is set to a first value (e.g., 0) to ensure the ALU outputs the operand unchanged. For negative-signed operands, the carry-in is set to a second value (e.g., 1) to invert the operand's bits, effectively converting it to its positive counterpart. The system includes a sign detection circuit that identifies the operand's sign and controls the carry-in input accordingly. This approach eliminates the need for additional logic or conditional branching, reducing latency and power consumption in digital signal processing applications. The invention is particularly useful in hardware accelerators for signal processing, where fast and energy-efficient absolute value operations are critical.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

Patent Metadata

Filing Date

October 13, 2020

Publication Date

December 6, 2022

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, FAQs, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “Carry chain for SIMD operations” (US-11520582). https://patentable.app/patents/US-11520582

© 2026 Nomic Interactive Technology LLC. Machine-readable context available at /api/llm-context/US-11520582. See llms.txt for full attribution policy.