US-9594557

Floating point execution unit for calculating packed sum of absolute differences

PublishedMarch 14, 2017

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

A method provides support for packed sum of absolute difference operations in a floating point execution unit, e.g., a scalar or vector floating point execution unit. Existing adders in a floating point execution unit may be utilized along with minimal additional logic in the floating point execution unit to support efficient execution of a fixed point packed sum of absolute differences instruction within the floating point execution unit, often eliminating the need for a separate vector fixed point execution unit in a processor architecture, and thereby leading to less logic and circuit area, lower power consumption and lower cost.

Patent Claims

18 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A method of performing a packed sum of absolute differences operation, the method comprising: in a processing unit, receiving an instruction associated with a packed sum of absolute differences operation using first and second operands; and performing the packed sum of differences operation using the first and second operands in a floating point execution unit coupled to the processing unit, wherein the floating point execution unit includes exponential logic configured to perform an exponent calculation associated with a floating point operation and fractional logic configured to perform a significand calculation associated with the floating point operation, and wherein performing the packed sum of differences operation includes performing at least one absolute difference calculation for the packed sum of absolute differences operation using the exponential logic and performing at least one absolute difference calculation for the packed sum of absolute differences operation using the fractional logic.

Plain English Translation

A processing unit performs a packed sum of absolute differences operation. It receives an instruction for this operation, along with two input operands. A floating-point execution unit, connected to the processing unit, carries out the packed sum of absolute differences. This execution unit normally handles floating-point calculations, using exponential and fractional logic. To perform the packed sum of absolute differences, the floating-point execution unit uses its existing exponential logic to calculate at least one absolute difference, and its existing fractional logic to calculate at least one other absolute difference required for the final result.

Claim 2

Original Legal Text

2. The method of claim 1 , wherein the floating point execution unit performs the packed sum of absolute differences operation in a single pass.

Plain English Translation

The method from the previous description enhances the packed sum of absolute differences calculation by ensuring that the floating-point execution unit computes the entire operation in a single pass through its logic. This means all necessary absolute differences and their summation are calculated without requiring iterative processing or multiple passes through the execution unit, improving performance and reducing latency.

Claim 3

Original Legal Text

3. The method of claim 1 , wherein the processing unit does not include a vector fixed point execution unit.

Plain English Translation

The method from the first description is performed on a processing unit that does not include a separate vector fixed-point execution unit. The packed sum of absolute differences is performed within the floating-point execution unit, removing the need for dedicated fixed-point hardware. This reduces the overall complexity and cost of the processor.

Claim 4

Original Legal Text

4. The method of claim 1 , wherein the floating point execution unit is a vector floating point execution unit including a plurality of processing lanes, wherein each of the first and second operands includes a plurality of operand words, and wherein each of the processing lanes performs at least one packed sum of absolute differences operation between corresponding operand words in the first and second operands.

Plain English Translation

In the method from the first description, the floating-point execution unit is a vector unit with multiple processing lanes. The two input operands each contain multiple "operand words". Each processing lane in the vector unit independently calculates at least one packed sum of absolute differences between corresponding operand words from the two input operands, enabling parallel processing of the data.

Claim 5

Original Legal Text

5. The method of claim 4 , wherein each operand word includes 32 bits of pixel data in an 8R8B8G8A format, wherein each operand includes four words, wherein the vector floating point execution unit includes four processing lanes, and wherein the vector floating point execution unit is configured to perform four packed sum of absolute differences operations in parallel.

Plain English Translation

Building on the method with a vector floating-point unit with multiple processing lanes, each operand word contains 32 bits of pixel data in the 8R8B8G8A format (8 bits for Red, Blue, Green, and Alpha). Each operand is made up of four such words. The vector floating-point execution unit has four processing lanes, allowing it to perform four packed sum of absolute differences calculations simultaneously in parallel.

Claim 6

Original Legal Text

6. The method of claim 4 , wherein each operand word includes four 8-bit pixels, wherein each operand includes four words, wherein the vector floating point execution unit includes four processing lanes, and wherein the vector floating point execution unit is configured to perform sixteen packed sum of absolute differences operations in parallel.

Plain English Translation

Expanding on the method utilizing a vector floating-point unit, each operand word consists of four 8-bit pixels, and each operand contains four words. Given that the vector floating-point execution unit includes four processing lanes, it can perform sixteen packed sum of absolute differences operations concurrently.

Claim 7

Original Legal Text

7. The method of claim 1 , wherein the floating point execution unit includes at least one multiplexer that repurposes the exponential logic to perform at least one absolute difference calculation for the packed sum of absolute differences operation.

Plain English Translation

In the method from the first description, the floating-point execution unit includes multiplexers. These multiplexers repurpose the existing exponential logic within the floating point execution unit to perform at least one absolute difference calculation needed for the packed sum of absolute differences. This re-use of existing hardware avoids the need for dedicated absolute difference calculation units.

Claim 8

Original Legal Text

8. The method of claim 7 , wherein the exponential logic includes at least one adder configured to perform an addition operation with an exponent of a floating point operand when performing the floating point operation, and wherein the at least one multiplexer includes a first multiplexer having an output coupled to an input of the adder, the first multiplexer including first and second inputs, the first input configured to receive the exponent from the floating point operand and the second input configured to receive at least a portion of the first operand associated with the instruction, wherein the multiplexer is configured to pass the exponent from the floating point operand to the adder when performing the floating point operation, and pass the portion of the first operand to the adder when performing the packed sum of absolute differences operation.

Plain English Translation

Expanding on the floating point unit that repurposes exponential logic with multiplexers, the exponential logic contains an adder used for floating-point exponent calculations. A first multiplexer, placed before the adder, has two inputs: one receives the exponent from a floating-point operand, and the other receives at least a portion of the first operand associated with the packed sum of absolute differences instruction. During floating-point operations, the multiplexer passes the exponent to the adder. During the packed sum of absolute differences, the multiplexer passes the portion of the first operand to the adder, thereby re-using the adder for absolute difference calculations.

Claim 9

Original Legal Text

9. The method of claim 7 , wherein the exponential logic includes a multiply exponent adder, wherein the at least one multiplexer includes first and second multiplexers coupled to the multiply exponent adder, the first multiplexer configured to select between an exponent of a first floating point operand and a first portion of the first operand associated with the instruction, and the second multiplexer configured to select between an exponent of a second floating point operand and a first portion of the second operand associated with the instruction.

Plain English Translation

Building upon the method that repurposes exponential logic using multiplexers, the exponential logic contains a multiply exponent adder. The multiplexer system consists of a first and second multiplexer, both coupled to the multiply exponent adder. The first multiplexer selects between the exponent of a first floating-point operand and a first portion of the first operand associated with the instruction. The second multiplexer selects between the exponent of a second floating-point operand and a first portion of the second operand associated with the instruction, allowing the adder to calculate an absolute difference.

Claim 10

Original Legal Text

10. The method of claim 9 , wherein the exponential logic further includes an operand exponent unbiasing adder, wherein the at least one multiplexer further includes third and fourth multiplexers coupled to the operand exponent unbiasing adder, the third multiplexer configured to select between an exponent of a third floating point operand and a second portion of one of the first and second operands associated with the instruction, and the fourth multiplexer configured to select between an output of the multiply exponent adder and a second portion of an other of the first and second operands associated with the instruction.

Plain English Translation

Further detailing the exponential logic repurposing approach, the logic also includes an operand exponent unbiasing adder. A third and fourth multiplexer are coupled to this adder. The third multiplexer chooses between the exponent of a third floating-point operand and a second portion of one of the first or second operands. The fourth multiplexer chooses between the output of the multiply exponent adder and a second portion of the other of the first or second operands.

Claim 11

Original Legal Text

11. The method of claim 10 , wherein the exponential logic further includes a result exponent rebiasing adder, wherein the at least one multiplexer further includes fifth and sixth multiplexers coupled to the result exponent rebiasing adder, the fifth multiplexer configured to select between a bias and a third portion of one of the first and second operands associated with the instruction, and the sixth multiplexer configured to select between an output of the operand exponent unbiasing adder and a third portion of an other of the first and second operands associated with the instruction.

Plain English Translation

Building on the description of the exponential logic repurposing with multiplexers, a result exponent rebiasing adder is included. Fifth and sixth multiplexers are coupled to this adder. The fifth multiplexer selects between a bias value and a third portion of one of the first and second operands. The sixth multiplexer selects between the output of the operand exponent unbiasing adder and a third portion of the other of the first and second operands, allowing the adder to perform an absolute difference calculation.

Claim 12

Original Legal Text

12. The method of claim 11 , wherein the exponential logic further includes a fourth adder configured to receive fourth portions of the first and second operands associated with the instruction, and wherein each of the multiply exponent adder, operand exponent unbiasing adder, result exponent rebiasing adder, and fourth adder is configured to calculate an absolute difference between the respective first, second, third and fourth portions of the first and second operands associated with the instruction.

Plain English Translation

The method, expanding upon the design of exponential logic, incorporates a fourth adder specifically purposed to receive fourth portions of the first and second operands. The multiply exponent adder, operand exponent unbiasing adder, result exponent rebiasing adder, and the newly incorporated fourth adder are each specifically configured to independently compute the absolute difference between their respective first, second, third, and fourth portions of the first and second operands that are associated with the instruction.

Claim 13

Original Legal Text

13. The method of claim 12 , wherein the floating point execution unit is further configured to sum the absolute differences calculated by the multiply exponent adder, operand exponent unbiasing adder, result exponent rebiasing adder, and fourth adder.

Plain English Translation

Expanding on the method, the floating-point execution unit is configured to sum the individual absolute differences that are calculated by the multiply exponent adder, operand exponent unbiasing adder, result exponent rebiasing adder, and the fourth adder. This summation step combines the results of the absolute difference calculations to produce a final result for the packed sum of absolute differences operation.

Claim 14

Original Legal Text

14. The method of claim 13 , wherein the exponential logic further includes a compressor configured to receive at least three absolute differences calculated by at least a subset of the multiply exponent adder, operand exponent unbiasing adder, result exponent rebiasing adder, and fourth adder.

Plain English Translation

The method further enhances the exponential logic by including a compressor. This compressor is designed to receive at least three of the absolute differences computed by a subset of the adders: the multiply exponent adder, the operand exponent unbiasing adder, the result exponent rebiasing adder, and the fourth adder. The compressor reduces the number of partial sums that must be added together, improving efficiency.

Claim 15

Original Legal Text

15. The method of claim 14 , wherein the fractional logic includes at least one adder, and wherein the floating point execution unit includes at least one multiplexer coupled to an input of the adder and configured to repurpose the adder in the fractional logic to sum an output of the compressor with at least one absolute difference calculated by the exponential logic.

Plain English Translation

Building on the method with the compressor in the exponential logic, the fractional logic includes at least one adder. The floating-point execution unit incorporates at least one multiplexer that directs inputs to this adder. The multiplexer reuses the adder in the fractional logic to sum the compressed output from the exponential logic's compressor with at least one absolute difference, calculated directly by the exponential logic, producing a final packed sum of absolute differences result.

Claim 16

Original Legal Text

16. The method of claim 1 , wherein the instruction is a packed sum of absolute differences instruction defined in an instruction set for the processing unit.

Plain English Translation

Within the initial method for performing a packed sum of absolute differences, the instruction issued to the processing unit is a specifically defined "packed sum of absolute differences instruction." This instruction is part of the instruction set architecture (ISA) of the processing unit, allowing developers to directly invoke this operation using a dedicated instruction.

Claim 17

Original Legal Text

17. The method of claim 1 , wherein the floating point execution unit includes at least one multiplexer configured to repurpose the fractional logic to perform at least a portion of the packed sum of absolute differences operation.

Plain English Translation

In the initial method description, the floating-point execution unit utilizes at least one multiplexer to repurpose the existing fractional logic. This allows the fractional logic to perform at least a portion of the packed sum of absolute differences operation, reducing the need for dedicated hardware and increasing the utilization of existing resources within the floating-point unit.

Claim 18

Original Legal Text

18. The method of claim 1 , wherein the first and second operands are respectively stored in first and second registers in a register file, and wherein the instruction identifies the first and second registers.

Plain English Translation

As defined in the first method description, the first and second operands used in the packed sum of absolute differences operation are stored in first and second registers, respectively. These registers reside within a register file. The instruction for the operation explicitly identifies the first and second registers containing the operands, enabling the processor to fetch the necessary data for computation.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G06F

Patent Metadata

Filing Date

March 18, 2016

Publication Date

March 14, 2017

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search