Patentable/Patents/US-8880829
US-8880829

Method and apparatus for efficient, low-latency, streaming memory copies

PublishedNovember 4, 2014
Assigneenot available in USPTO data we have
Inventorsnot available in USPTO data we have
Technical Abstract

Systems, methods, and apparatus with improved techniques for copying data from a source memory location to a destination memory location are disclosed. An exemplary method includes receiving a source address that indicates the source memory location, a destination address that indicates the destination memory location, and receiving a size indicator that indicates the size of the data. When the size is less than a threshold size, a particular pointer in a jump table is accessed, based upon the size that points to particular load and store instructions. The jump table includes a plurality of pointers that point to a corresponding one of a plurality of load and store instructions. The particular load-store instructions are then executed with a processor of the computing device to copy the data from the source memory location to the destination memory location. Several other efficiency-improvement aspects are also disclosed that may be used in connection with these steps to further improve copy efficiencies.

Patent Claims
33 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1

1. A method for copying data from a source memory location to a destination memory location on a computing device, the method comprising: receiving a source address indicating the source memory location; receiving a destination address indicating the destination memory location; receiving a size indicator indicating a copy size of the data; accessing, when the copy size is less than a threshold size, a particular pointer in a jump table that points to particular load and store instructions based upon the copy size of the data, the jump table including a plurality of pointers, each of the plurality of pointers pointing to a corresponding one of a plurality of load and store instructions; and executing the particular load-store instructions on the computing device to copy the data from the source memory location to the destination memory location.

2

2. The method of claim 1 , including: utilizing single-lane Neon instructions for copy sizes less than a second threshold.

3

3. The method of claim 1 including utilizing a mix of ARM and Neon instructions.

4

4. The method of claim 1 including, when the copy size is greater than a threshold size: preloading the source address; and pushing a destination pointer register and a reserved register onto a stack memory.

5

5. The method of claim 4 including performing initial pump priming based upon the source pointer if the copy size of the data does not exceed a second threshold.

6

6. The method of claim 5 , including selecting a preload distance based upon at least one of hardware characteristics of the computing device and experimental results.

7

7. The method of claim 4 , including performing a big block handling routine if the copy size of the data exceeds a second threshold.

8

8. The method of claim 1 , wherein each of the plurality of the load and store instructions corresponding to a data copy size consume a variable number of bytes and each of the plurality of pointers to the load and store instructions consume a fixed number of bytes such that the particular pointer for the copy size of the data can be accessed by adding a current program counter with a product of the copy size of the data times a number of bytes consumed by the particular pointer.

9

9. A method for copying data from a source memory location to a destination memory location on a computing device, the method comprising: receiving a source address indicating the source memory location; receiving a destination address indicating the destination memory location; receiving a size indicator indicating a copy size of the data; calculating, when the copy size is less than a threshold size, a pointer to a particular set of load and store instructions in one particular entry of a plurality of function table entries, each of the function table entries is a same fixed size, and the calculating of the pointer is based upon the copy size of the data and the fixed size of the function table entries; executing the particular set of load-store instructions on the computing device to copy the data from the source memory location to the destination memory location; and jumping, if the particular set of load and store instructions does not fit within the fixed size of the particular entry, to a remainder of the load and store instructions to complete the copy of the data from the source memory location to the destination memory location.

10

10. A computing apparatus comprising: at least one processor; memory to store data that is processed by the processor; a plurality of load/store instruction sets, each of the plurality of load/store instruction sets, when executed, transfer a particular number of bytes; a jump table including pointers to each of the plurality of load/store instruction sets; and a memory copy component that receives a source address, a destination address, and an indicator of a copy size of data to be copied and utilizes the jump table to initiate execution of a particular load/store instruction set based upon the copy size of the data to be copied to copy the data from the source address in the memory to the destination address in the memory.

11

11. The computing apparatus of claim 10 , wherein the memory copy component utilizes single-lane Neon instructions for copy sizes less than a second threshold.

12

12. The computing apparatus of claim 10 , wherein the memory copy component utilizes a mix of ARM and Neon instructions.

13

13. The computing apparatus of claim 10 , wherein the memory copy component preloads, when the copy size is greater than a threshold size, the source address, and pushes a destination pointer register and a reserved register onto a stack memory.

14

14. The computing device of claim 13 , wherein the memory copy component performs initial pump priming based upon the source pointer if the copy size of the data does not exceed a second threshold.

15

15. The computing device of claim 14 , wherein the memory copy component selects a preload distance based upon at least one of hardware characteristics of the computing device and experimental results.

16

16. The computing device of claim 13 , wherein the memory copy component performs a big block handling routine if the copy size of the data exceeds a second threshold.

17

17. The computing device of claim 10 , wherein each of the plurality of the load/store instruction sets consume a variable number of bytes and each of the pointers consume a fixed number of bytes such that each of the pointers is accessed by adding a current program counter with a product of the copy size of the data times a number of bytes consumed by each corresponding pointer.

18

18. A computing apparatus comprising: means for receiving a source address indicating the source memory location; means for receiving a destination address indicating the destination memory location; means for receiving a size indicator indicating a copy size of the data; means for accessing, when the copy size is less than a threshold size, a particular pointer in a jump table that points to particular load and store instructions based upon the copy size of the data, the jump table including a plurality of pointers, each of the plurality of pointers pointing to a corresponding one of a plurality of load and store instructions; and means for executing the particular load-store instructions on the computing device to copy the data from the source memory location to the destination memory location.

19

19. The computing apparatus of claim 18 , including: means for utilizing single-lane Neon instructions for copy sizes less than a second threshold.

20

20. The computing apparatus of claim 18 including means for utilizing a mix of ARM and Neon instructions.

21

21. The computing apparatus of claim 18 including, when the copy size is greater than a threshold size: means for preloading the source address; and means for pushing a destination pointer register and a reserved register onto a stack memory.

22

22. The computing apparatus of claim 21 including means for performing initial pump priming based upon the source pointer if the copy size of the data does not exceed a second threshold.

23

23. The computing apparatus of claim 22 , including means for selecting a preload distance based upon at least one of hardware characteristics of the computing device and experimental results.

24

24. The computing apparatus of claim 21 , including means for performing a big block handling routine if the copy size of the data exceeds a second threshold.

25

25. The computing apparatus of claim 18 , wherein each of the plurality of the load and store instructions corresponding to a data copy size consume a variable number of bytes and each of the plurality of pointers to the load and store instructions consume a fixed number of bytes such that the particular pointer for the copy size of the data can be accessed by adding a current program counter with a product of the copy size of the data times a number of bytes consumed by the particular pointer.

26

26. A non-transitory, tangible computer readable storage medium, encoded with processor readable instructions to perform a method for copying data from a source memory location to a destination memory location on a computing device, the method comprising: receiving a source address indicating the source memory location; receiving a destination address indicating the destination memory location; receiving a size indicator indicating a copy size of the data; accessing, when the copy size is less than a threshold size, a particular pointer in a jump table that points to particular load and store instructions based upon the copy size of the data, the jump table including a plurality of pointers, each of the plurality of pointers pointing to a corresponding one of a plurality of load and store instructions; and executing the particular load-store instructions on the computing device to copy the data from the source memory location to the destination memory location.

27

27. The non-transitory, tangible computer readable storage medium of claim 26 , the method including: utilizing single-lane Neon instructions for copy sizes less than a second threshold.

28

28. The non-transitory, tangible computer readable storage medium of claim 26 , the method including utilizing a mix of ARM and Neon instructions.

29

29. The non-transitory, tangible computer readable storage medium of claim 26 , the method including, when the copy size is greater than a threshold size: preloading the source address; and pushing a destination pointer register and a reserved register onto a stack memory.

30

30. The non-transitory, tangible computer readable storage medium of claim 29 , the method including performing initial pump priming based upon the source pointer if the copy size of the data does not exceed a second threshold.

31

31. The non-transitory, tangible computer readable storage medium of claim 30 , the method including selecting a preload distance based upon at least one of hardware characteristics of the computing device and experimental results.

32

32. The non-transitory, tangible computer readable storage medium of claim 29 , the method including performing a big block handling routine if the copy size of the data exceeds a second threshold.

33

33. The non-transitory, tangible computer readable storage medium of claim 26 , wherein each of the plurality of the load and store instructions corresponding to a data copy size consume a variable number of bytes and each of the plurality of pointers to the load and store instructions consume a fixed number of bytes such that the particular pointer for the copy size of the data can be accessed by adding a current program counter with a product of the copy size of the data times a number of bytes consumed by the particular pointer.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

Patent Metadata

Filing Date

November 19, 2012

Publication Date

November 4, 2014

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “Method and apparatus for efficient, low-latency, streaming memory copies” (US-8880829). https://patentable.app/patents/US-8880829

© 2026 Patentable. All rights reserved.

Patentable is a research and drafting-assistant tool, not a law firm, and does not provide legal advice. Documents we generate are drafts for review by a licensed patent attorney.