A method and apparatus are disclosed for providing motion estimation (ME) for large-size blocks of image data during image processing using small-size block processing logic. An embodiment method includes obtaining a large-size block for ME processing and dividing the large-size block into a plurality of small-size blocks. The large-size block comprises an integer multiple of the small-size blocks. The small-size blocks are then processed in parallel using a small-size block ME processing algorithm. An embodiment apparatus includes a processor configured to implement the method for large-size block ME processing using small-size block ME processing logic, and a shared memory register for storing at different times the 16×16 blocks.
Legal claims defining the scope of protection, as filed with the USPTO.
1. A method for motion estimation (ME) for a large-size block of image data, the method comprising: obtaining a large-size block for ME processing; dividing the large-size block into a plurality of small-size blocks, wherein the small-size blocks comprise M×M blocks of data bytes, wherein M is an integer; processing each of the small-size blocks in parallel using a small-size block ME processing algorithm using M clock cycles for M line motion searches; and processing a total number of M of the M×M blocks using M×M clock cycles, wherein the large-size block comprises an integer multiple of the small-size blocks.
2. The method of claim 1 , wherein the small-size blocks are 16×16 blocks of data bytes.
3. The method of claim 1 , further comprising combining the processed small-size blocks into a processed large-size block corresponding to the large-size block.
4. The method of claim 1 , wherein the small-size blocks combined comprise a same image data of the large-size block.
5. The method of claim 1 , wherein the small-size blocks are processed using a single shared register that stores each one of the small-size blocks at a time.
6. The method of claim 1 , wherein processing the small-size blocks in parallel comprises processing the small-size blocks at about a same time using time division multiplexing.
7. The method of claim 1 , wherein the large-size block is a 64×64 block, and wherein the 64×64 block is divided into 16 of the small-size blocks.
8. The method of claim 7 , wherein the small-size block ME processing algorithm is a current standard 16×16 block ME processing algorithm.
9. An apparatus for implementing motion estimation (ME) for a large-size block of image data, the apparatus comprising: a processor configured to: obtain a 64×64 block of bytes of image data for ME processing; divide the 64×64 block into a plurality of 16×16 blocks of data bytes; and process the 16×16 blocks in parallel using a ME processing algorithm for 16×16 blocks, wherein the processor is configured to process each of the 16×16 blocks using 16 clock cycles for 16 line motion searches and process a total number of 16 of the 16×16 blocks using 256 clock cycles.
10. The apparatus of claim 9 , wherein the processor is configured to process each of the 16×16 blocks using 64 clock cycles for 64 line motion searches and processes a total number of 16 of the 16×16 blocks using 1024 clock cycles.
11. The apparatus of claim 9 , wherein the processor is configured to use a maximum number of clock cycles for ME processing that includes a plurality of first clock cycles for line motion searches for the 16×16 blocks and a plurality of second clock cycles for actual motion search calculation.
12. The apparatus of claim 9 , wherein the processor is based on a 1080P60 HD format and is configured to use a maximum number of 6,400 clock cycles for ME processing.
13. The apparatus of claim 9 further comprising a shared memory register for storing the 16×16 blocks at different times, wherein the shared memory register is configured to store the 16×16 blocks using time division multiplexing.
14. The apparatus of claim 13 , wherein the memory register is a 16×16 8-bit register that stores a total of 2048 bits.
15. A network component for video coding, the network component comprising: a processor configured to: obtain a large-size block of bytes of image data for motion estimation (ME); divide the large-size block into a plurality of small-size blocks of bytes that comprise a same data, wherein the small-size blocks comprise M×M blocks of data bytes, wherein M is an integer; process each of the small-size blocks for ME individually and in parallel using a small-size block ME processing algorithm using M clock cycles for M line motion searches; process a total number of M of the M×M blocks using M×M clock cycles; and a single shared register for storing at different times the small-size blocks.
16. The network component of claim 15 , wherein the processor is configured to process the small-size blocks individually using the small-size block ME processing algorithm to reduce a number of clock cycles of the processor by 75% in comparison to processing the large-size block using a large-size block ME processing algorithm.
17. The network component of claim 16 , wherein the processor is configured to reduce the number of clock cycles to improve performance of ME and actual motion search calculation.
18. The network component of claim 16 , wherein the processor is configured to reduce the number of clock cycles to simplify logic and cost of the processor.
19. The network component of claim 15 , wherein a size of the shared register for storing at different times the small-size blocks is reduced in comparison to a second register for storing the large-size block.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
October 2, 2012
February 23, 2016
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.