Patentable/Patents/US-12585589-B2
US-12585589-B2

Cache updating from multiple sources

PublishedMarch 24, 2026
Assigneenot available in USPTO data we have
Inventorsnot available in USPTO data we have
Technical Abstract

A peripheral device includes a processor, a memory interface, a host interface and a cache controller. The processor executes software code. The cache memory caches a portion of the software code. The memory interface communicates with a NVM storing a replica of the software code. The host interface communicates with hosts storing additional replicas of the software code. The cache controller is to determine whether each host is allocated for code fetching, to receive a request from the processor for a segment of the software code, when available in the cache memory to fetch the segment from the cache memory, when unavailable in the cache memory and at least one host is allocated, to fetch the segment from the hosts that are allocated, when unavailable in the cache memory and no host is allocated, to fetch the segment from the NVM, and to serve the fetched segment to the processor.

Patent Claims

Legal claims defining the scope of protection, as filed with the USPTO.

1

. A peripheral device, comprising:

2

. The peripheral device according to, wherein the one or more hosts comprise multiple hosts, and wherein the cache controller is to select a host from among the multiple hosts using an arbitration scheme, and to fetch the segment from the selected host.

3

. The peripheral device according to, wherein the cache controller is to determine, for each host, whether the host is to be allocated or not for code fetching, based on a specified allocation criterion, and, when the segment is unavailable in the cache memory, to fetch the segment from one of the hosts that are allocated.

4

. The peripheral device according to, wherein the cache controller is to monitor an operability status of a given host, which is not allocated for code fetching, and in response to detecting that the given host becomes operable, to allocate the given host for code fetching.

5

. The peripheral device according to, wherein the cache controller is to monitor an operability status of a given host, which is allocated for code fetching, and in response to detecting that the given host becomes inoperable, to deallocate the given host so as not to be used for code fetching.

6

. The peripheral device according to, wherein the cache controller is to have access to a validated hash value precalculated by applying a hash function to the segment, and, upon fetching the segment, to verify the segment by applying the hash function to the fetched segment to produce a hash result, and to compare between the hash result and the validated hash value.

7

. The peripheral device according to, wherein the cache controller is to receive a group of hash values and a corresponding cryptographic signature calculated over the group of hash values using a private key, to validate the group of hash values based on the received group of hash values and the cryptographic signature using a public key corresponding to the private key, and to lock the validated hash values as unmodifiable.

8

. The peripheral device according to, wherein the cache controller is to hold an authentication scheme specifying a hierarchy of multiple authentication layers, and to use a hash value in a given authentication layer to validate the hash values in another authentication layer lower in the hierarchy.

9

. The peripheral device according to, wherein the segment has an accumulated size of multiple cache lines of the cache memory.

10

. The peripheral device according to, wherein a first latency incurred in fetching the segment from the NVM is longer than a second latency incurred in fetching the segment from the one or more hosts.

11

. The peripheral device according to, wherein the cache controller is to store the replica in a given host by reading segments of the replica from the NVM, authenticating the read segments, and sending the authenticated segments for storage in the given host.

12

. The peripheral device according to, wherein the cache controller is to store the replica in the given host after the peripheral device is reset, or after detecting that the given host has become operable.

13

. A method for updating cached code, comprising:

14

. The method according to, wherein the one or more hosts comprise multiple hosts, and comprising selecting a host from among the multiple hosts using an arbitration scheme, and fetching the segment from the selected host.

15

. The method according to, wherein the cache controller having access to a validated hash value precalculated by applying a hash function to the segment, and comprising, upon fetching the segment, verifying the segment by applying the hash function to the fetched segment to produce a hash result, and comparing between the hash result and the validated hash value.

16

. The method according to, and comprising, receiving a group of hash values and a corresponding cryptographic signature calculated over the group of hash values using a private key, validating the group of hash values based on the received group of hash values and the cryptographic signature using a public key corresponding to the private key, and locking the validated hash values as unmodifiable.

17

. The method according to, and comprising, holding an authentication scheme specifying a hierarchy of multiple authentication layers, and using a hash value in a given authentication layer to validate the hash values in another authentication layer lower in the hierarchy.

18

. The method according to, wherein the segment has an accumulated size of multiple cache lines of the cache memory.

19

. The method according to, wherein a first latency incurred in fetching the segment from the NVM is longer than a second latency incurred in fetching the segment from the one or more hosts.

20

. The method according to, and comprising storing the replica in a given host by reading segments of the replica from the NVM, authenticating the read segments, and sending the authenticated segments for storage in the given host.

21

. A computing system, comprising:

22

. The computing system according to, wherein the hosts and the peripheral device are comprised in a data center.

23

. The computing system according to, wherein the peripheral device comprises a Network Interface Controller (NIC).

24

. The computing system according to, wherein the peripheral device comprises a Graphic Processing Unit (GPU).

Detailed Description

Complete technical specification and implementation details from the patent document.

This application is a continuation of U.S. patent application Ser. No. 18/456,536, filed Aug. 28, 2023, whose disclosure is incorporated herein by reference.

Embodiments described herein relate generally to updating cached software code, and particularly but not exclusively to methods and systems for secure cache updating from multiple sources.

Various secured devices are required to cryptographically authenticate and check the integrity of software code fetched into a cache memory of the device.

Methods for authenticating software code are known in the art. For example, U.S. Pat. No. 10,824,501 describes an apparatus having a firmware memory storing firmware, a cache memory loading at least part of the firmware for execution by a processor, and a firmware checking engine having a defined syndrome storage location and performing the following iteratively on cache line entries associated with the firmware stored in the cache memory: choose a cache line entry; verify that an address mapped in the cache line entry maps to an address in the firmware memory, and when the cache line entry is locked and the address mapped in the cache line entry maps to an address in the firmware memory, compare a content of the cache line entry to a content of a corresponding address in the firmware stored in the firmware memory, and produce an integrity result indicating whether integrity of the apparatus has been compromised.

An embodiment that is described herein provides a peripheral device that includes a processor, a memory interface, a host interface and a cache controller. The processor is to execute software code. The cache memory is to cache a portion of the software code. The memory interface is to communicate with a non-volatile memory (NVM) that stores a replica of the software code. The host interface is to communicate over a peripheral bus with one or more hosts that store additional respective replicas of the software code. The cache controller is to determine for each of the one or more hosts, whether the host is to be allocated or not for code fetching, based on a specified allocation criterion, to receive a request from the processor for a segment of the software code, when the segment is available in the cache memory, to fetch the segment from the cache memory, when the segment is unavailable in the cache memory and at least one of the hosts is allocated, to fetch the segment from the at least one of the hosts that are allocated via the host interface, when the segment is unavailable in the cache memory and none of the one or more hosts is allocated, to fetch the segment from the NVM via the memory interface, and to serve the fetched segment to the processor.

In some embodiments, the one or more hosts include multiple hosts, and the cache controller is to identify that two or more of the hosts are allocated for code fetching, to select a host from among the two or more of the hosts using an arbitration scheme, and to fetch the segment from the selected host. In other embodiments, the allocation criterion is based on one or more criteria belonging to a list including at least (i) operability statuses of the hosts, and (ii) workload levels of the hosts. In yet other embodiments, the cache controller is to monitor an operability status of a given host, which is not allocated for code fetching, and in response to detecting that the given host becomes operable, to allocate the given host for code fetching.

In an embodiment, the cache controller is to monitor an operability status of a given host, which is allocated for code fetching, and in response to detecting that the given host becomes inoperable, to deallocate the given host so as not to be used for code fetching. In another embodiment, the cache controller is to have access to a validated hash value precalculated by applying a hash function to the segment, and, upon fetching the segment, to verify the segment by applying the hash function to the fetched segment to produce a hash result, and to compare between the hash result and the validated hash value. In yet another embodiment, the cache controller is to receive a group of hash values and a corresponding cryptographic signature calculated over the group of hash values using a private key, to validate the group of hash values based on the received group of hash values and the cryptographic signature using a public key corresponding to the private key, and to lock the validated hash values as unmodifiable.

In some embodiments, the cache controller is to hold an authentication scheme specifying a hierarchy of multiple authentication layers, and to use a hash value in a given authentication layer to validate the hash values in another authentication layer lower in the hierarchy. In other embodiments, the segment has an accumulated size of multiple cache lines of the cache memory. In yet other embodiments, a first latency incurred in fetching the segment from the NVM is longer than a second latency incurred in fetching the segment from the one or more hosts.

In an embodiment, the cache controller is to store the replica in a given host by reading segments of the replica from the NVM, authenticating the read segments, and sending the authenticated segments for storage in the given host. In another embodiment, the cache controller is to store the replica in the given host after the peripheral device is reset, or after detecting that the given host has become operable.

There is additionally provided, in accordance with an embodiment that is described herein, a method for updating cached code, including, in a peripheral device that includes a processor executing software code, a cache memory caching a portion of the software code, and a cache controller communicating with a non-volatile memory (NVM) that stores a replica of the software code, and further communicating over a peripheral bus with one or more hosts that store additional respective replicas of the software code, determining, by the cache controller, for each of the one or more hosts, whether the host is to be allocated or not for code fetching, based on a specified allocation criterion. A request for a segment of the software code is received from the processor. When the segment is available in the cache memory, the segment is fetched from the cache memory. When the segment is unavailable in the cache memory and at least one of the hosts is allocated, the segment is fetched from the at least one of the hosts that are allocated. When the segment is unavailable in the cache memory and none of the one or more hosts is allocated, the segment is fetched from the NVM. The fetched segment is served to the processor.

These and other embodiments will be more fully understood from the following detailed description of the embodiments thereof, taken together with the drawings in which:

Embodiments that are described herein provide improved methods and systems for efficiently and securely updating cached code from multiple sources.

In various types of electronic devices, a processor runs software code from a local memory of the device. One problem with this scheme is that in modern applications the code size is typically large and may even exceed the storage space available within the device. To accommodate storage space shortage, the whole code may be stored in a Nonvolatile Memory (NVM) external to the device and parts of the code could be fetched into a local cache memory, on demand.

Cache memory is typically handled in chunks referred to as “cache lines”. Code (or data) may be fetched into the cache memory in blocks having a size of one or more cache lines. For a given application, the block size fetched is typically fixed. A disadvantage of fetching code from an external NVM is that NVMs typically have long accessing latencies, resulting in limited code updating rate and degraded runtime performance.

As will be described below, in some disclosed embodiments, replicas of the software code are stored in one or more hosts, and in an external NVM as a fallback. The replicas may contain the full software code or part of the full software code. Since fetching a code segment from a host is typically much faster than fetching the same code segment from the NVM, the cache controller may first attempt fetching a requested code segment from one of the hosts that are operable. If all the hosts are inoperable, the cache controller fetches the requested code segment from the NVM.

Some applications the electronic device is required to be secured. In accordance with one security aspect, the code fetched for caching needs to be authenticated as originating from the external NVM, and its content has not been modified. Accordingly, a code segment fetched from an external memory (e.g., host or NVM) needs to be authenticated before being authorized for execution. In the present context it is assumed that an authentication scheme validates both the authenticity and integrity of the fetched software code.

In principle, a code segment fetched from an external source could be authenticated using a cryptographic authentication scheme. Asymmetric authentication schemes, e.g., based on the Rivest-Shamir-Adleman (RSA) scheme, are highly secured cryptographically, but are typically very slow and would incur unacceptable latencies in code updating (e.g., on the order of several milliseconds per operation). Symmetric authentication schemes such as, for example, the Advanced Encryption Standard (AES) Galois/Counter Mode (GCM), are typically considered insufficiently strong cryptographically, at least in modern applications. Other security solutions such as using an external DRAM (instead of the slow NVM) or using Physically Unclonable Functions (PUFs) are typically costly or otherwise unsuitable.

Consider a peripheral device comprising a processor, a cache memory, a memory interface, a host interface, and a cache controller. The processor executes software code. The cache memory caches a portion of the software code. The memory interface communicates with a non-volatile memory (NVM) that stores a replica of the software code. The host interface communicates over a peripheral bus with one or more hosts that store additional respective replicas of the software code. The cache controller is to: determine for each of the one or more hosts, whether the host is to be allocated or not for code fetching, based on a specified allocation criterion, to receive a request from the processor for a segment of the software code, to fetch the code segment as follows: (i) when the segment is available in the cache memory, fetch the segment from the cache memory, (ii) when the segment is unavailable in the cache memory and at least one of the hosts is allocated, fetch the segment from the at least one of the hosts that are allocated, via the host interface, and (iii) when the segment is unavailable in the cache memory and none of the one or more hosts are allocated, fetch the segment from the NVM via the memory interface, and, to serve the fetched segment to the processor.

The cache controller may use any suitable allocation criterion. In an example embodiment, the allocation criterion is based on one or more criteria belonging to a list comprising at least (i) operability statuses of the hosts, and (ii) workload levels of the hosts.

In some configurations, the peripheral device is coupled via the host interface to multiple hosts. In such configurations, when the cache controller identifies that two or more of the hosts are allocated for code fetching, the cache controller selects a host from among the two or more of the hosts using n arbitration scheme, and fetches the segment from the selected host.

The operability status of each of the hosts may change over time, which may affect host allocation for code fetching. For example, the cache controller monitors the operability status of a given host, which is not allocated for code fetching, and in response to detecting that the given host becomes operable, allocates the given host for code fetching. As another example, the cache controller monitors the operability status of a given host, which is allocated for code fetching, and in response to detecting that the given host becomes inoperable, deallocates the given host so as not to be used for code fetching.

In some embodiments, the cache controller has access to a validated hash value precalculated by applying a hash function to the segment. Upon fetching the segment, the cache controller verifies the segment by applying the hash function to the fetched segment to produce a hash result and compares between the hash result and the validated hash value.

In some embodiments, the cache controller holds a group of locked hash values that were validated as follows: the cache controller receives (e.g., from the NVM) a group of hash values and a corresponding cryptographic signature calculated over the group of hash values using a private key. The cache controller validates the group of hash values based on the received group of hash values and the cryptographic signature using a public key corresponding to the private key, and locks the validated hash values as unmodifiable.

In an embodiment, the cache controller holds an authentication scheme specifying a hierarchy of multiple authentication layers, and uses a hash value in a given authentication layer to validate the hash values in another authentication layer lower in the hierarchy.

In some embodiments, to reduce the storage space of the validated and locked hash values, the segment size is set to an accumulated size of multiple cache lines of the cache memory.

In the disclosed techniques, replicas of software code are stored in an NVM and in one or more hosts. Since the latency incurred in fetching a code segment from the NVM is longer than the latency incurred in fetching the segment from the one or more hosts, when a cache read miss event occurs, the cache controller preferably fetches the requested segment from a host rather than from the NVM. The cache controller monitors the operability status of each host and allocates or deallocates hosts for code fetching based at least on the operability statuses of the hosts. During runtime, fetched code segments are authenticated based on reference hash values that were validated using an asymmetric signature or using a higher authentication layer.

In some embodiments, the cache controller stores the replica in a given host by reading segments of the replica from the NVM, authenticating the read segments, and sending the authenticated segments for storage in the given host. The cache controller may store the replica in the given host (or in multiple hosts) after the peripheral device is reset. Alternatively or additionally, the cache controller may store the replica in a given host after detecting that the given host has become operable, This scheme may be used for updating the replica in the given host before allocating it for code fetching.

Using the disclosed embodiments, code segments are fetched with low latency and authenticated efficiently using strong cryptographic hash-based methods, therefore improving runtime performance of the software code and security level of the updated code. Moreover, the disclosed embodiments are applicable in a multi-host platform, so that arbitrating among multiple hosts allows for high availability of the hosts, and high resiliency when a host becomes inoperable or overloaded.

is a block diagram that schematically illustrates a computer systemsupporting secure updating of cached code, in accordance with an embodiment that is described herein.

Computer systemcomprises hostsserved by a common peripheral device. In the example of, computer systemcomprises multiple hostsdenoted HOST_1 . . . HOST_N. In an alternative configuration, the computer system may comprise a single host. Peripheral devicecomprises a processor, which communicates via a host interfaceand over a peripheral buswith hosts. In some embodiments e functionality of processormay be implemented using two or more separate processors. Peripheral busmay comprise, for example, the Peripheral Component Interconnect Express (PCIe) bus, or any other suitable type of a peripheral bus. Each of hoststypically runs one or more application programs (not shown), possibly in coordination with other hosts and/or processor, for performing complex tasks.

Computer systemmay be used in various applications such as, for example, in a data center or a High-Performance Computing (HPC) system, in Artificial Intelligence (AI) computing, or in a computer based appliance. Peripheral devicemay comprise any suitable electronic device such as, for example a Network Interface Controller (NIC), a Graphic Processing Unit (GPU), and the like.

In serving hostsand for other purposes, processorruns software code. In the present context, the term “software code” refers to any software program (or programs) suitable to be run by the processor, e.g., a software program and/or firmware.

Peripheral devicecomprises a cache controllerthat manages access to a local cache memory. In describing computer system, it is assumed that the size of the underlying software code exceeds the storage space available in the cache memory, and thus the cache memory stores only part of the software code () at any given time.

To run a certain code segment, processorrequests that code segment from the cache controller. The cache controller fetches the code segment as described herein and serves the fetched code segment to the processor. When the requested code segment is available in the cache memory, the cache controller fetches the code segment from the cache memory. Otherwise, the requested code segment is currently unavailable in the cache memory, and the cache controller fetches the code segment from a memory residing externally to the peripheral device, as will be described below.

In computer system, replicasof the full software code (or part thereof) are stored (i) in a memoryin each of the hosts (or at least in some of the hosts), and (ii) in an external Nonvolatile Memory (NVM). Memorymay comprise, for example, a Dynamic Random Access Memory (DRAM). NVMmay comprise, for example, a Flash memory and may be accessible to the cache controller via a memory interfacecomprising, for example, a Serial Peripheral Interface (SPI) bus or an Inter-Integrated Circuit (IC) bus. Alternatively, other suitable types of NVMs and memory interfaces can also be used.

At any given time, each of hostsmay be operable or inoperable. In the present context, hostis considered operable when the cache controller can retrieve from that host any code segment of the code replica stored in that host. A host may become “inoperable” for various reasons such as, for example, when the host enters a low power state, during host reset or boot, or when the host is too busy to handle code reads for the cache controller. In some embodiments, the cache controller monitors respective operability statuses of hosts, and selects one of the operable hosts from which to fetch a requested code segment.

The cache controller can fetch a requested code segment from a host or from the NVM. Since communication with the NVM is typically much slower than with a host, in some embodiments the cache controller first attempts fetching the requested code segment from one of the operable hosts, if any. When all the hosts are inoperable, the cache controller fetches the requested code segment from the NVM. In some embodiments, the peripheral device comprises an on-chip memory (not shown), e.g., for temporarily storing data fetched from the NVM or hosts.

In some embodiments, the cache controller allocates hosts for the purpose of code fetching, e.g., based on respective operability statuses of the hosts or based on respective workload levels of the hosts. The cache controller then selects a host from which to fetch a code segment only from among the hosts currently allocated.

In some embodiments, peripheral deviceis a secured device, which is allowed to run only code that has been authenticated successfully. To this end, cache controllercomprises a cryptographic enginesupporting various cryptographic functions and schemes. For example, cryptographic enginesupports an asymmetric signature scheme and a hash-based authentication scheme. In some embodiments, the cryptographic engine comprises symmetric encryption/decryption schemes, e.g., for decrypting fetched encrypted code. When verification of the signature using the public key succeeds, the cache controller stores validated hash valueslocally and locks the validated hash values, meaning that these hash values are unmodifiable and are therefore considered securely validated within the peripheral device.

During runtime, the cache controller authenticates fetched code segments based on corresponding locked hash values, as will be described in detail below.

In the present context, authentication of fetched code segments is based on a cryptographic hash function that maps input data of arbitrary length into respective hash values of fixed length. The space of input values is typically larger than the space of resulting hash values. The hash function may comprise, for example, a suitable Message Authentication Code (MAC) function. The hash function may also take as input a secret key, in an embodiment, e.g., an HMAC function. A relevant key-based hash function is specified, for example, in the National Institute of Standards and Technology (NIST) standard FIPS 198-1.

The computer system and peripheral device configurations ofare given by way of example, and other suitable computer system and peripheral device configurations can also be used.

Some elements of peripheral device, e.g., elements of cache controllersuch as crypto engine, may be implemented in hardware, e.g., in one or more Application-Specific Integrated Circuits (ASICs) or Field-Programmable Gate Arrays (FPGAs). Additionally or alternatively, some elements of the peripheral device can be implemented using software, or using a combination of hardware and software elements.

Elements that are not necessary for understanding the principles of the present application, such as various interfaces, addressing circuits, timing and sequencing circuits and debugging circuits, have been omitted fromfor clarity.

In some embodiments, some of the functions of cache controllermay be carried out by a general-purpose processor, which is programmed in software to carry out the functions described herein. The software may be downloaded to the processor in electronic form, over a network, for example, or it may, alternatively or additionally, be provided and/or stored on non-transitory tangible media, such as magnetic, optical, or electronic memory.

NVMmay comprise any suitable type of a nonvolatile memory such, for example, as a Flash memory. Alternatively, other suitable types of NVMs such as a One Time Programmable (OTP) memory, e.g., a Read Only Memory (ROM) can also be used.

Memoryof hostmay comprise any suitable type of a memory such as, for example, a Dynamic Random Access Memory (DRAM). Alternatively, other memory types, e.g., a Static Random Access Memory (SRAM) can also be used.

is a flow chart that schematically illustrates a method for updating cached code from multiple sources, in accordance with an embodiment that is described herein.

The method will be described as executed by cache controllerof peripheral device. In describing the method ofit is assumed that replicasof the software code have been stored in memoriesof hostsand in NVM.

The method begins, at a host allocation step, with cache controllerdetermining for each hostwhether it is to be allocated or not for code fetching. The cache controller thus manages a pool of allocated hosts. At any given time, the pool may contain one or more hosts that are allocated for code fetching, or may be empty. The cache controller may allocate hosts for code fetching using any suitable criterion such as, for example, based on respective operability statuses of the hosts. In accordance with this criterion, a host is allocated for code fetching when the host is operable, and not allocated for code fetching when the host is inoperable. The cache controller may monitor the operability status of a host, e.g., by receiving from the host a suitable message, by querying the host, by detecting that the host is unreachable for communication, or using any other suitable method.

Patent Metadata

Filing Date

Unknown

Publication Date

March 24, 2026

Inventors

Unknown

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “Cache updating from multiple sources” (US-12585589-B2). https://patentable.app/patents/US-12585589-B2

© 2026 Patentable. All rights reserved.

Patentable is a research and drafting-assistant tool, not a law firm, and does not provide legal advice. Documents we generate are drafts for review by a licensed patent attorney.