10545867

Device and Method for Enhancing Item Access Bandwidth and Atomic Operation

PublishedJanuary 28, 2020
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
19 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A device for improving an item access bandwidth and atomic operation, comprising: a memory storing processor-executable instructions; and a processor arranged to execute the stored processor-executable instructions to perform steps of: after a lookup request is received from a service side, determining whether an address pointed to by the lookup request is identical to an item address stored in a cache; if they are identical, and a valid identifier vld is currently valid, directly returning item data stored in the cache to the service side without initiating a request for looking up an off-chip memory, so as to reduce accessing the off-chip memory; and if they are not identical, initiating a request for looking up the off-chip memory, and process, according to a preset rule, item data returned by the off-chip memory in such a way that an atomic operation existed in item updating can realize a seamless and faultless lookup in an item lookup process, wherein the preset rule is used for determining whether the address pointed to by the lookup request is identical to the item address stored in the cache, comprising any one of the following ways: way 1: if a vld corresponding to a low first-threshold-M bit address is completely valid, and a high second-threshold-N bit address is identical to the item address stored in the cache, returning data in the cache to the service side, and not updating the data in the cache, where if the addresses are not identical, not updating the data in the cache, and sending the data returned by the off-chip memory to the service side; way 2: if the vld corresponding to the low first-threshold-M bit address is partially valid, not updating the data in the cache, and sending the data returned by the off-chip memory to the service side; and way 3: if the vld corresponding to the low first-threshold-M bit address is invalid, updating the data in the cache, and sending the data returned by the off-chip memory to the service side, and wherein both M and N are natural numbers, and a sum of M and N is a bit width requested by the service side.

Plain English Translation

The invention relates to a device for improving item access bandwidth and atomic operations in memory systems. The device addresses the problem of inefficient memory access and potential data inconsistencies during item lookups, particularly when dealing with off-chip memory accesses. The device includes a memory storing processor-executable instructions and a processor that executes these instructions to perform specific steps. When a lookup request is received from a service side, the device checks if the address in the request matches an item address stored in a cache. If they match and a valid identifier (vld) is valid, the device returns the cached item data directly to the service side, avoiding unnecessary off-chip memory access. If the addresses do not match, the device initiates an off-chip memory lookup and processes the returned data according to preset rules to ensure atomic operations during item updates, allowing seamless and faultless lookups. The preset rules include three ways to handle the vld and address comparison. In the first way, if the low M-bit address is fully valid and the high N-bit address matches the cached item address, the cached data is returned without updating the cache. If the addresses do not match, the cache is not updated, and the off-chip memory data is sent to the service side. In the second way, if the low M-bit address is partially valid, the cache is not updated, and the off-chip memory data is sent to the service side. In the third way, if the low M-bit address is invalid, the cache is updated, and the off-chip memory data is sent to the service side. M and N are natural numbers, and their sum equals the bit width requested by the service side. This approach optimizes memory access efficiency and ensures data consistency duri

Claim 2

Original Legal Text

2. The device according to claim 1 , wherein the processor is arranged to execute the stored processor-executable instructions to further perform steps of: configuring a service item, and for the case of single-burst item update, giving an instruction of writing a single-burst item; and after mediation by a first mediation module, writing a high second-threshold-N bit address of the service item or the item data into the cache by taking low first-threshold-M bit address as an address, setting a vld register corresponding to the address to 1 through a control module, and giving an instruction of updating the off-chip memory to complete the item update.

Plain English Translation

This invention relates to a data processing device with a processor and memory management system, specifically addressing efficient data updates in a multi-threshold address space. The device includes a processor executing instructions to manage service items, particularly for single-burst item updates. When updating an item, the processor configures the service item and issues a write instruction for a single-burst operation. A first mediation module processes this request, handling the address translation. The high second-threshold-N bits of the service item or its data are written into a cache using the low first-threshold-M bits as the address. A control module sets a validity (vld) register corresponding to this address to 1, marking the data as valid. Finally, the processor issues an instruction to update the off-chip memory, ensuring data consistency between the cache and external memory. This approach optimizes memory access by leveraging multi-threshold addressing and controlled cache updates, reducing latency and improving efficiency in data processing systems. The system ensures proper synchronization between cache and off-chip memory, preventing data corruption during updates.

Claim 3

Original Legal Text

3. The device according to claim 2 , wherein the processor is arranged to execute the stored processor-executable instructions to further perform steps of: for the case of single-burst item, determining whether the vld corresponding to the low first-threshold-M bit address of the lookup request is valid, and if it is valid, initiating a lookup of the cache by using a low first-threshold-M bit address of the lookup request, and obtaining a lookup result; and parsing the lookup result, comparing a found address with the high second-threshold-N bit address of the lookup request, and if they are identical, directly returning the result from cache lookup to the service side through a distribution module, not initiating the request for looking up the off-chip memory, and reading and discarding data in a lookup information storage module.

Plain English Translation

This invention relates to a cache lookup device designed to optimize memory access efficiency in computing systems. The device addresses the problem of redundant memory access operations, particularly when handling single-burst items, by reducing unnecessary off-chip memory lookups. The system includes a processor, a cache, an off-chip memory, a distribution module, and a lookup information storage module. The processor executes instructions to determine whether a valid (vld) flag corresponding to the low first-threshold-M bit address of a lookup request is valid. If valid, the processor initiates a cache lookup using the low first-threshold-M bit address and obtains a result. The lookup result is parsed, and the found address is compared with the high second-threshold-N bit address of the lookup request. If the addresses match, the cache lookup result is directly returned to the service side via the distribution module, bypassing the need to request data from off-chip memory. Additionally, any data stored in the lookup information storage module is read and discarded to prevent stale data from being used. This approach minimizes latency and improves system performance by avoiding redundant memory access operations.

Claim 4

Original Legal Text

4. The device according to claim 3 , wherein the processor is arranged to execute the stored processor-executable instructions to further perform steps of: when the lookup request matches none of the addresses in the cache, initiating the request for looking up the off-chip memory, and after the item data is returned, taking the item address and multiple-burst information out from the lookup information storage module; for the case of single-burst item, determining through the control module whether a vld corresponding to a low first-threshold-M bit address of an address is valid, and if it is valid, reading the cache after mediation of a second mediation module, comparing high second-threshold-N bits of an acquired address with high second-threshold-N bits of the item taken-out address; and if they match, replacing data of a corresponding address with the item data returned from the off-chip memory and writing data back into the cache, and returning data to the service side through the distribution module.

Plain English Translation

This invention relates to a memory access optimization system for improving data retrieval efficiency in computing devices. The system addresses the problem of latency in accessing off-chip memory by implementing a multi-level caching mechanism with intelligent data replacement policies. The device includes a processor, a cache memory, and a lookup information storage module. When a lookup request does not match any addresses in the cache, the processor initiates a request to off-chip memory. Upon receiving the item data, the processor retrieves the item address and multiple-burst information from the lookup information storage module. For single-burst items, the system checks the validity of a vld (validity flag) corresponding to the low first-threshold-M bits of the address. If valid, the system reads the cache through a second mediation module and compares the high second-threshold-N bits of the acquired address with the high second-threshold-N bits of the item address. If they match, the system replaces the data at the corresponding cache address with the item data from off-chip memory, writes the updated data back into the cache, and returns the data to the service side through a distribution module. This approach reduces access latency by minimizing unnecessary off-chip memory fetches and optimizing cache utilization.

Claim 5

Original Legal Text

5. The device according to claim 1 , wherein the processor is arranged to execute the stored processor-executable instructions to further perform steps of: configuring a service item, and for the case of multiple-burst item update, giving an instruction of writing a multiple-burst item; after mediation by a first mediation module, writing a high second-threshold-N bit address of the multiple-burst item or the item data into the cache by taking a value obtained by left shifting a low first-threshold-M bit address for 2{circumflex over ( )}S bits as a first particular address, setting a vld corresponding to the first particular address to 0 through a control module, and not giving an instruction of updating the off-chip memory; for a second burst, writing the high second-threshold-N bit address of the multiple-burst item or the item data into the cache by taking a value obtained by left shifting the low first-threshold-M bit address for 2{circumflex over ( )}S bits plus 1 as a second particular address, setting a vld corresponding to the second particular address to 0 through the control module, not giving the instruction of updating the off-chip memory, at the same time, set a vld of a first burst to 1, and give an instruction of updating a vld item; and by analogy, when an address of a penultimate burst returned by the off-chip memory matches an address, obtained by left shifting a low first-threshold-M bit address for 2{circumflex over ( )}S bit, +S−2, setting an vld corresponding to a last burst to 1, and giving an instruction of updating the off-chip memory to complete the item update.

Plain English Translation

This invention relates to a data processing system for efficiently managing multiple-burst item updates in a cache memory. The system addresses the challenge of optimizing memory access and reducing latency when updating large data items that require multiple bursts of data transfer between the cache and off-chip memory. The system includes a processor, a cache memory, and a control module. The processor executes instructions to configure a service item and handle multiple-burst item updates. For each burst, the processor calculates a particular address by left-shifting a low first-threshold-M bit address by 2^S bits, then increments this value for subsequent bursts. The high second-threshold-N bit address of the item or the item data is written into the cache at these calculated addresses. A validity flag (vld) is set to 0 for each new burst, preventing immediate updates to off-chip memory. For the second burst, the vld of the first burst is set to 1, and an instruction is given to update the vld item. This process repeats for each subsequent burst. When the address of the penultimate burst matches a calculated address (left-shifted by 2^S bits plus S-2), the vld for the last burst is set to 1, and an instruction is given to update the off-chip memory, completing the item update. This method ensures efficient cache utilization and minimizes unnecessary off-chip memory updates during multi-burst operations.

Claim 6

Original Legal Text

6. The device according to claim 5 , wherein the processor is arranged to execute the stored processor-executable instructions to further perform steps of: for the case of multiple-burst item, and when there are 2{circumflex over ( )}S multiple-burst items, determining through the control module whether vlds corresponding to 2{circumflex over ( )}S contiguous addresses after a low first-threshold-M bit address of the lookup request is left shifted for S bits are valid, and if all of them are valid, continuously initiating 2{circumflex over ( )}S requests for looking up the cache after the low first-threshold-M bit address of the lookup request is left shifted for S bits, and obtaining a lookup result; and parsing the lookup result, compare a found address with a high second-threshold-N bit address of the lookup request, and if they are identical, directly returning spliced results from cache lookup to the service side through a distribution module, not initiating the request for looking up the off-chip memory, and reading and discarding data in a lookup information storage module, wherein S is a natural number.

Plain English Translation

This invention relates to a cache lookup optimization technique for handling multiple-burst memory access requests in computing systems. The problem addressed is the inefficiency in processing contiguous memory access requests, particularly when dealing with multiple-burst items, which require multiple sequential memory accesses. The solution involves a processor executing instructions to optimize cache lookups for such requests. For a lookup request involving multiple-burst items, the system checks the validity of cache entries corresponding to 2^S contiguous addresses after left-shifting the low first-threshold-M bits of the request address by S bits. If all these entries are valid, the system initiates 2^S parallel cache lookup requests for these addresses. The results are then parsed, and the high second-threshold-N bits of the found addresses are compared with the high bits of the original request. If they match, the spliced results from the cache are directly returned to the service side without accessing off-chip memory. Additionally, any data in the lookup information storage module is read and discarded. This approach reduces latency and improves efficiency by minimizing off-chip memory access when possible. The parameter S is a natural number defining the shift amount for address processing.

Claim 7

Original Legal Text

7. The device according to claim 6 , wherein the processor is arranged to execute the stored processor-executable instructions to further perform steps of: when the lookup request matches none of the addresses in the cache, initiating the request for looking up the off-chip memory, and after the item data is returned, taking the item address and multiple-burst information out from the lookup information storage module; for the case of multiple-burst item, firstly determining through the control module whether the vlds corresponding to 2{circumflex over ( )}S contiguous addresses after a low first-threshold-M bit address of an address is left shifted for S bits are valid, and if all of them are valid, reading the cache after mediation of a second mediation module, and comparing high second-threshold-N bits of an acquired address with high second-threshold-N bits of the taken-out item address; and if they match, replacing data of a corresponding address with the item data returned from the off-chip memory and writing data back into the cache, and returning data to the service side through the distribution module.

Plain English Translation

This invention relates to a memory access optimization system for improving data retrieval efficiency in computing devices. The system addresses the problem of latency in accessing off-chip memory by implementing a multi-level caching mechanism with intelligent data prefetching and validation checks. The device includes a processor, a cache memory, and a lookup information storage module. When a lookup request fails to match any addresses in the cache, the processor initiates a request to off-chip memory. Upon receiving the item data, the processor retrieves the item address and multiple-burst information from the storage module. For multiple-burst items, the system checks the validity of contiguous addresses derived from the item address by left-shifting it by S bits and verifying the validity of corresponding vlds (validity flags). If all flags are valid, the system reads the cache through a second mediation module and compares the high second-threshold-N bits of the acquired address with the high second-threshold-N bits of the retrieved item address. If they match, the system replaces the corresponding cache data with the off-chip memory data, writes it back to the cache, and returns the data to the service side via a distribution module. This approach reduces memory access latency by minimizing unnecessary off-chip memory fetches and optimizing cache updates.

Claim 8

Original Legal Text

8. The device according to claim 7 , wherein the processor is arranged to execute the stored processor-executable instructions to further perform steps of: when the lookup request is received, according to a multiple-burst identifier carried in the lookup request, determining through the control module whether vld corresponding to 2{circumflex over ( )}S contiguous requests after a low first-threshold-M bit address of a service request is left shifted for S bits is valid, and if it is valid, reading data of a corresponding cache, and determining whether a high second-threshold-N bit address of the service request matches the address in an cache; and if they match, directly returning data to the service side, and if they do not match, initiating the request for looking up the off-chip memory.

Plain English Translation

This invention relates to a data processing device with an improved cache lookup mechanism for handling multiple-burst memory requests efficiently. The problem addressed is the inefficiency in traditional cache systems when processing contiguous memory access requests, particularly in scenarios requiring high-speed data retrieval. The device includes a processor, a control module, and a cache memory. The processor executes instructions to manage data lookup operations. When a lookup request is received, the processor analyzes a multiple-burst identifier within the request. The control module checks the validity of a data segment (vld) corresponding to 2^S contiguous requests after a low first-threshold-M bit address of the service request is left-shifted by S bits. If the data segment is valid, the processor reads the corresponding cache data. The system then compares the high second-threshold-N bit address of the service request with the address stored in the cache. If they match, the data is directly returned to the service side, bypassing off-chip memory access. If they do not match, the system initiates a lookup in the off-chip memory. This mechanism optimizes cache efficiency by reducing unnecessary off-chip memory accesses for contiguous data requests, improving overall system performance. The invention is particularly useful in high-performance computing environments where rapid data retrieval is critical.

Claim 9

Original Legal Text

9. The device according to claim 8 , wherein the processor is arranged to execute the stored processor-executable instructions to further perform steps of: after the item data is returned, reading the lookup information storage module to acquire a lookup request address and a multiple-burst identifier; determining through the control module whether the vld corresponding to 2{circumflex over ( )}S contiguous requests after the low first-threshold-M bit address of the service request is left shifted for S bits is valid, and if it is valid, reading the data of the corresponding cache, and determining whether the high second-threshold-N bit address of the service request matches a service address returned to the cache; if they match, returning the item data in the cache to the service side through the distribution module, not update the item data in the cache, and if they do not match, directly returning the item data in the off-chip memory to the service side through the distribution module, and updating the item data in the cache; and if a vld corresponding to multiple bursts is partially valid, which indicates that the item update is not completed, returning the item data in the off-chip memory to the service side through the distribution module, and not updating the item data in the cache.

Plain English Translation

This invention relates to a data processing device designed to optimize memory access efficiency in computing systems, particularly for handling service requests involving cache memory and off-chip memory. The device addresses the problem of latency and data consistency when retrieving or updating data in a memory hierarchy, where cache hits and misses must be managed efficiently to minimize delays and ensure accurate data retrieval. The device includes a processor that executes instructions to process service requests for item data. After retrieving item data, the processor reads lookup information to obtain a request address and a multiple-burst identifier. It then checks the validity of a control signal (vld) associated with contiguous requests following a modified address derived from the service request. If the vld is valid, the device reads the corresponding cache data and verifies whether the high-order bits of the service request match a service address in the cache. If they match, the cached data is returned to the service side without updating the cache. If they do not match, the device retrieves the data directly from off-chip memory, returns it to the service side, and updates the cache. If the vld is only partially valid, indicating an incomplete item update, the device retrieves the data from off-chip memory without updating the cache. This approach ensures efficient data retrieval while maintaining consistency between cache and off-chip memory.

Claim 10

Original Legal Text

10. A method for improving an item access bandwidth and atomic operation, the method comprising: after a lookup request is received from a service side, determining whether an address pointed to by the lookup request is identical to an item address stored in a cache; if they are identical, and a valid identifier vld is currently valid, directly returning item data stored in the cache to the service side without initiating a request for looking up an off-chip memory, so as to reduce accessing the off-chip memory; and if they are not identical, initiating a request for looking up the off-chip memory, and processing, according to a preset rule, item data returned by the off-chip memory in such a way that an atomic operation existed in item updating can realize a seamless and faultless lookup in an item lookup process, wherein the preset rule is used for determining whether the address pointed to by the lookup request is identical to the item address stored in the cache, comprising any one of the following ways: way 1: if a vld corresponding to a low first-threshold-M bit address is completely valid, and a high second-threshold-N bit address is identical to the item address stored in the cache, returning data in the cache to the service side, and not updating the data in the cache; if the addresses are not identical, not updating the data in the cache, and sending the data returned by the off-chip memory to the service side; way 2: if the vld corresponding to the low first-threshold-M bit address is partially valid, not updating the data in the cache, and sending the data returned by the off-chip memory to the service side; and way 3: if the vld corresponding to the low first-threshold-M bit address is invalid, updating the data in the cache, and sending the data returned by the off-chip memory to the service side, and wherein both M and N are natural numbers, and a sum of M and N is a bit width requested by the service side.

Plain English Translation

This invention relates to a method for improving item access bandwidth and atomic operations in memory systems, particularly addressing the inefficiency and potential inconsistencies in accessing data from off-chip memory. The method optimizes data retrieval by leveraging a cache to reduce unnecessary off-chip memory accesses, thereby enhancing performance and ensuring seamless atomic operations during item updates. When a lookup request is received, the method checks if the requested address matches the address stored in the cache. If they match and the cache's validity identifier (vld) is valid, the cached data is returned directly to the service side, bypassing the need to access off-chip memory. This reduces latency and improves bandwidth efficiency. If the addresses do not match, the system initiates a lookup in the off-chip memory and processes the returned data according to predefined rules to ensure atomicity and consistency during updates. The preset rules determine whether the cache should be updated or bypassed based on the validity of the vld and the address comparison. If the vld for the lower M-bit address is fully valid and the higher N-bit address matches the cached address, the cached data is returned without updates. If the addresses differ, the off-chip data is sent directly to the service side without updating the cache. If the vld is partially valid, the off-chip data is sent without cache updates. If the vld is invalid, the cache is updated with the off-chip data before sending it to the service side. The values of M and N are natural numbers whose sum equals the bit width requested by the service side, allowing flexible configuration for different memory architectures. This approach ensures efficient data access while maintaining consistency in

Claim 11

Original Legal Text

11. The method according to claim 10 , further comprising: configuring, by a central processing unit, a service item, and for the case of single-burst item update, giving an instruction of writing a single-burst item; and after mediation by a first mediation module, writing a high second-threshold-N bit address of the service item or the item data into the cache by taking the low first-threshold-M bit address as an address, setting a vld register corresponding to the address to 1 through a control module, and giving an instruction of updating the off-chip memory to complete the item update.

Plain English Translation

This invention relates to a method for updating service items in a computing system, particularly addressing efficient data management between cache and off-chip memory. The method involves configuring a service item and, for single-burst updates, instructing the system to write a single-burst item. A central processing unit (CPU) handles the configuration, while a first mediation module mediates the update process. The system writes the high second-threshold-N bit address of the service item or its data into the cache using the low first-threshold-M bit address as the cache address. A control module sets a validity (vld) register corresponding to this address to 1, indicating valid data. Finally, the system issues an instruction to update the off-chip memory, completing the item update. This approach ensures synchronized data consistency between the cache and off-chip memory, optimizing performance for single-burst updates. The method leverages address partitioning and validation registers to streamline the update process, reducing latency and improving efficiency in memory management.

Claim 12

Original Legal Text

12. The method according to claim 11 , further comprising: for the case of single-burst item, determining, by a comparison module, whether the vld corresponding to the low first-threshold-M bit address of the lookup request is valid, if it is valid, initiating a lookup of the cache by using a low first-threshold-M bit address of the lookup request, and obtaining a lookup result; parsing the lookup result, and comparing a found address with the high second-threshold-N bit address of the lookup request; and if they are identical, directly returning the result from cache lookup to the service side through a distribution module, not initiating the request for looking up the off-chip memory, and reading and discarding data in a lookup information storage module.

Plain English Translation

This invention relates to a method for optimizing memory access in a computing system, specifically addressing the inefficiency of frequently accessing off-chip memory for single-burst data requests. The problem arises when a system must repeatedly fetch data from slower off-chip memory instead of leveraging faster on-chip cache memory, leading to performance bottlenecks. The method involves a comparison module that checks the validity of a cache entry for a single-burst item. The lookup request is divided into two address portions: a low first-threshold-M bit address and a high second-threshold-N bit address. The comparison module first verifies if the valid (vld) bit corresponding to the low first-threshold-M bit address is valid. If valid, the system initiates a cache lookup using this address and retrieves a result. The result is then parsed, and the found address is compared with the high second-threshold-N bit address of the original request. If they match, the system directly returns the cached data to the service side via a distribution module, bypassing the need to access off-chip memory. Additionally, any outdated data in the lookup information storage module is read and discarded to maintain cache consistency. This approach reduces latency and improves system efficiency by minimizing unnecessary off-chip memory accesses.

Claim 13

Original Legal Text

13. The method according to claim 12 , further comprising: when the lookup request matches none of the addresses in the cache, initiating, by the comparison module, the request for looking up the off-chip memory, and after the item data is returned, taking the item address and multiple-burst information out from the lookup information storage module; for the case of single-burst item, determining through the control module whether a vld corresponding to a low first-threshold-M bit address of an address is valid; if it is valid, reading the cache after mediation of a second mediation module, and comparing high second-threshold-N bits of an acquired address with high second-threshold-N bits of the taken-out item address; and if they match, replacing data of a corresponding address with the item data returned from the off-chip memory and writing data back into the cache, and returning data to the service side through the distribution module.

Plain English Translation

This invention relates to a method for optimizing memory access in a computing system, specifically addressing the inefficiency of cache misses when accessing off-chip memory. The method improves data retrieval by reducing latency and enhancing cache utilization during memory lookups. The method involves a comparison module that processes lookup requests for data stored in a cache. When a requested address does not match any entries in the cache, the comparison module initiates a request to off-chip memory. Once the requested data is retrieved, the system extracts the item address and associated multiple-burst information from a lookup information storage module. For single-burst items, a control module checks the validity of a vld (valid) flag corresponding to the low first-threshold-M bits of the address. If valid, the system reads the cache through a second mediation module and compares the high second-threshold-N bits of the acquired address with the high second-threshold-N bits of the retrieved item address. If they match, the system replaces the corresponding cache data with the newly retrieved item data, writes it back to the cache, and returns the data to the service side via a distribution module. This ensures efficient cache updates and minimizes redundant off-chip memory accesses, improving overall system performance.

Claim 14

Original Legal Text

14. The method according to claim 10 , further comprising: configuring, by a central processing unit, a service item, and for the case of multiple-burst item update, giving an instruction of writing a multiple-burst item; after mediation of by a first mediation module, writing, by a first burst, a high second-threshold-N bit address of the multiple-burst item or the item data into the cache by taking a value obtained by left shifting a low first-threshold-M bit address for 2{circumflex over ( )}S bits as a first particular address, setting the vld corresponding to the first particular address to 0 through a control module, and not giving the instruction of updating the off-chip memory; for a second burst, writing the high second-threshold-N bit address of the multi-burst item or the item data in the cache by taking a value obtained by left shifting the low first-threshold-M bit address for 2{circumflex over ( )}S bits plus 1 as a second particular address, setting a vld corresponding to the second particular address to 0 through the control module, and not giving the instruction of updating the off-chip memory; at the same time, setting a vld of the first burst to 1, and giving an instruction of updating a vld item; and by analogy, when an address of a penultimate burst returned by the off-chip memory matches an address, obtained by left shifting a low first-threshold-M bit address for 2{circumflex over ( )}S bit, +S−2, setting an vld corresponding to a last burst to 1, and giving the instruction of updating the off-chip memory to complete the item update.

Plain English Translation

This invention relates to a method for efficiently updating multiple-burst items in a cache memory system, particularly addressing the challenge of managing large data transfers between cache and off-chip memory. The method involves configuring a service item and handling multi-burst updates by dividing the process into sequential bursts. For each burst, a high N-bit address or item data is written to the cache using a calculated address derived from a low M-bit address shifted left by 2^S bits. A control module sets a validity flag (vld) for each burst address to 0 initially, preventing off-chip memory updates. Subsequent bursts follow a similar process, with each new burst address incremented by 1. The validity flag of the previous burst is set to 1, and an update instruction is issued for the vld item. This continues until the penultimate burst, where matching addresses trigger the final burst's vld to be set to 1, and an instruction is given to update the off-chip memory, completing the item update. The method optimizes cache memory management by minimizing off-chip memory updates during multi-burst operations, improving efficiency and reducing latency.

Claim 15

Original Legal Text

15. The method according to claim 14 , further comprising: for the case of multiple-burst item, and when there are 2{circumflex over ( )}S multiple-burst items, determining through the control module, by a comparison module, whether vlds corresponding to 2{circumflex over ( )}S contiguous addresses after a low first-threshold-M bit address of the lookup request is left shifted for S bits are valid; if all of them are valid, continuously initiating 2{circumflex over ( )}S requests for looking up the cache after left shifting the low first-threshold-M bit address of the lookup request for S bits, and obtaining a lookup result; parsing the lookup result, and comparing a found address with a high second-threshold-N bit address of the lookup request; and if they are identical, directly returning spliced results from cache lookup to the service side, not initiating the request for looking up the off-chip memory, and reading and discarding the data in a lookup information storage module, wherein S is a natural number.

Plain English Translation

This invention relates to a method for optimizing cache lookup operations in a computing system, particularly for handling multiple-burst memory access patterns. The problem addressed is the inefficiency in traditional cache lookup mechanisms when dealing with contiguous memory accesses, which often require multiple separate lookups and may unnecessarily access off-chip memory, increasing latency and power consumption. The method involves a control module that manages cache lookups for memory requests. For a multiple-burst item, when there are 2^S such items, a comparison module checks the validity (vlds) of 2^S contiguous addresses derived by left-shifting the low first-threshold-M bits of the lookup request address by S bits. If all these addresses are valid, the system initiates 2^S parallel cache lookup requests using the shifted address. The results are parsed, and the found address is compared with the high second-threshold-N bits of the original request. If they match, the system directly returns the spliced cache results to the service side, bypassing the need to access off-chip memory. The data in the lookup information storage module is read and discarded to free resources. This approach reduces latency and energy consumption by minimizing off-chip memory accesses for contiguous memory patterns, where S is a natural number defining the shift magnitude.

Claim 16

Original Legal Text

16. The method according to claim 15 , further comprising: when the lookup request matches none of the addresses in the cache, initiating, by the comparison module, the request for looking up the off-chip memory, and after the item data is returned, taking the item address and multiple-burst information out from the lookup information storage module; for the case of multiple-burst item, first determining through the control module whether the vlds corresponding to 2{circumflex over ( )}S contiguous addresses after a low first-threshold-M bit address of an address is left shifted for S bits are valid; if all of them are valid, reading the cache after mediation of a second mediation module, and comparing high second-threshold-N bits of an acquired address with high second-threshold-N bits of the taken-out address; and if they match, replacing data of a corresponding address with the item data returned from the off-chip memory and writing data back into the cache, and returning data to the service side through a distribution module.

Plain English Translation

This invention relates to a method for optimizing memory access in a computing system, particularly for handling cache misses and multiple-burst data transfers. The problem addressed is inefficient data retrieval when a requested address is not found in the cache, leading to delays and increased power consumption. The method involves a comparison module that checks if a lookup request matches any addresses in the cache. If no match is found, the system initiates a request to off-chip memory. Once the requested data is retrieved, the system extracts the item address and multiple-burst information from a lookup information storage module. For multiple-burst items, a control module determines whether the valid (vlds) flags for 2^S contiguous addresses after a low first-threshold-M bit address (left-shifted by S bits) are valid. If all flags are valid, the system reads the cache through a second mediation module and compares the high second-threshold-N bits of the acquired address with the high second-threshold-N bits of the retrieved address. If they match, the system replaces the corresponding cache data with the off-chip memory data, writes it back to the cache, and returns the data to the service side via a distribution module. This approach reduces latency and improves efficiency in memory access operations.

Claim 17

Original Legal Text

17. The method according to claim 16 , further comprising: when the lookup request is received, determining through the control module, by the comparison module, whether vld corresponding to 2{circumflex over ( )}S contiguous requests after a low first-threshold-M bit address of a service request is left shifted for S bits is valid according to a multiple-burst identifier carried in the lookup request; if it is valid, reading data of a corresponding cache, and determining whether a high second-threshold-N bit address of the service request matches an address in the cache; and if they match, directly returning data to the service side; and if they do not match, initiating the request for looking up the off-chip memory.

Plain English Translation

This invention relates to a method for optimizing data access in a memory system, particularly for handling lookup requests in a cache memory. The problem addressed is the inefficiency in traditional cache systems when processing multiple burst requests, where repeated lookups and memory accesses can slow down performance. The method involves a control module that processes lookup requests by first determining the validity of a multiple-burst identifier carried in the request. The system checks whether a specific bit pattern (derived by left-shifting a low-order address portion of the request by a defined number of bits) is valid for a sequence of contiguous requests. If valid, the system reads data from the corresponding cache and compares a high-order address portion of the request with addresses stored in the cache. If there is a match, the data is returned directly to the service side, bypassing slower off-chip memory access. If no match is found, the system initiates a lookup in the off-chip memory. This approach reduces latency by minimizing unnecessary off-chip memory accesses when data is already available in the cache, improving overall system efficiency. The method is particularly useful in high-performance computing environments where rapid data retrieval is critical.

Claim 18

Original Legal Text

18. The method according to claim 17 , further comprising: after the item data is returned, reading, by the comparison module, the lookup information storage module to acquire a lookup request address and a multiple-burst identifier; determining through the control module whether vld corresponding to 2{circumflex over ( )}S contiguous requests after the low first-threshold-M bit address of the service request is left shifted for S bits is valid; if all of them are valid, reading the data of the corresponding cache; determining whether the high second-threshold-N bit address of the service request matches a service address returned to the cache; if they match, returning the item data in the cache to the service side through the distribution module, and not updating the item data in the cache; if they do not match, directly returning the item data in the off-chip memory to the service side through the distribution module, and updating the item data in the cache; and if a vld corresponding to multiple-bursts is partially valid, which indicates that the item update is not completed, returning the item data in the off-chip memory to the service side through the distribution module, and not updating the item data in the cache.

Plain English Translation

This invention relates to a method for optimizing data retrieval in a memory system, particularly addressing the challenge of efficiently accessing and updating cached data while minimizing latency and ensuring data consistency. The method involves a comparison module that reads lookup information to acquire a request address and a multiple-burst identifier after item data is initially returned. A control module then checks the validity of contiguous requests by left-shifting the low-order bits of the service request address and verifying if the corresponding validity flags (vld) are valid. If all flags are valid, the system reads data from the cache. The high-order bits of the service request address are then compared to the cache's service address. If they match, the cached data is returned without updating, ensuring consistency. If they do not match, the system retrieves data directly from off-chip memory, updates the cache, and returns the data. If the validity flags are partially valid, indicating an incomplete update, the system retrieves data from off-chip memory without updating the cache. This approach ensures efficient data access while maintaining consistency during concurrent updates.

Claim 19

Original Legal Text

19. A non-transitory computer storage medium having stored therein computer executable instructions arranged to perform a method for improving an item access bandwidth and atomic operation, the method comprising: after a lookup request is received from a service side, determining whether an address pointed to by the lookup request is identical to an item address stored in a cache; if they are identical, and a valid identifier vld is currently valid, directly returning item data stored in the cache to the service side without initiating a request for looking up an off-chip memory, so as to reduce accessing the off-chip memory; if they are not identical, initiating a request for looking up the off-chip memory, and processing, according to a preset rule, item data returned by the off-chip memory in such a way that an atomic operation existed in item updating can realize a seamless and faultless lookup in an item lookup process, wherein the preset rule is used for determining whether the address pointed to by the lookup request is identical to the item address stored in the cache, comprising any one of the following ways: way 1: if a vld corresponding to a low first-threshold-M bit address is completely valid, and a high second-threshold-N bit address is identical to the item address stored in the cache, returning data in the cache to the service side, and not updating the data in the cache; if the addresses are not identical, not updating the data in the cache, and sending the data returned by the off-chip memory to the service side; way 2: if the vld corresponding to the low first-threshold-M bit address is partially valid, not updating the data in the cache, and sending the data returned by the off-chip memory to the service side; and way 3: if the vld corresponding to the low first-threshold-M bit address is invalid, updating the data in the cache, and sending the data returned by the off-chip memory to the service side, and wherein both M and N are natural numbers, and a sum of M and N is a bit width requested by the service side.

Plain English Translation

This invention relates to a method for improving item access bandwidth and atomic operations in a computing system, particularly focusing on reducing access to off-chip memory by optimizing cache lookups. The problem addressed is the inefficiency of frequently accessing off-chip memory for item lookups, which can slow down system performance. The method involves a non-transitory computer storage medium with executable instructions that perform a lookup process. When a lookup request is received, the system checks if the address in the request matches an item address stored in the cache. If they match and a valid identifier (vld) is valid, the system directly returns the cached item data to the service side without accessing off-chip memory, reducing memory access latency. If the addresses do not match, the system initiates a request to the off-chip memory and processes the returned data according to preset rules to ensure atomic operations during item updates, allowing seamless and faultless lookups. The preset rules determine whether to update the cache based on the validity of the vld and the address match. If the vld for a low first-threshold-M bit address is fully valid and the high second-threshold-N bit address matches the cached item address, the cached data is returned without updating. If the addresses do not match, the cached data is not updated, and the off-chip memory data is sent to the service side. If the vld is partially valid, the cached data is not updated, and the off-chip memory data is sent. If the vld is invalid, the cache is updated, and the off-chip memory data is sent. M and N are natural numbers, and their sum equals the bit width requested by the service side. This approach optimizes cache usage and reduces unnecessary off-chip memory access

Patent Metadata

Filing Date

Unknown

Publication Date

January 28, 2020

Inventors

Chuang Bao
Zhenlin Yan
Chunhui Zhang
Kang An

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, FAQs, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “DEVICE AND METHOD FOR ENHANCING ITEM ACCESS BANDWIDTH AND ATOMIC OPERATION” (10545867). https://patentable.app/patents/10545867

© 2026 Nomic Interactive Technology LLC. Machine-readable context available at /api/llm-context/10545867. See llms.txt for full attribution policy.

DEVICE AND METHOD FOR ENHANCING ITEM ACCESS BANDWIDTH AND ATOMIC OPERATION