Patentable/Patents/US-20260140888-A1

US-20260140888-A1

Cache Eviction in a Multi-Tenant Architecture

PublishedMay 21, 2026

Assigneenot available in USPTO data we have

InventorsHui LI

Technical Abstract

Systems and methods include storage of a plurality of objects in a cache, determination, for each of the plurality of objects, of a value based on a quality of service level associated with a tenant to which the object belongs and on at least one time at which the object was accessed from the cache, determination of one of the plurality of objects to delete from the cache based on the determined values, and deletion of the determined one of the plurality of objects from the cache.

Patent Claims

Legal claims defining the scope of protection, as filed with the USPTO.

a cache storing a plurality of objects; a memory storing executable program code; and one or more processing units to execute the executable program code to cause the system to: determine, for each of the plurality of objects, a value based on a quality of service level associated with a tenant to which the object belongs and on an access to the object in the cache; determine one of the plurality of objects to evict from the cache based on the determined values; and evict the determined one of the plurality of objects from the cache. . A system comprising:

claim 1 after the determined one of the plurality of objects is evicted from the cache, determine a second one of the plurality of objects to evict from the cache based on the determined values; and evict the determined second one of the plurality of objects from the cache. . The system of, the one or more processing units to execute the executable program code to cause the system to:

claim 2 determine, for each of the plurality of objects stored in the cache, a second value based on a quality of service level associated with the tenant to which the object belongs and on an access to the object in the cache; determine one of the plurality of objects to evict from the cache based on the determined second values; and evict the determined one of the second plurality of objects from the cache. . The system of, the one or more processing units to execute the executable program code to cause the system to:

claim 3 . The system of, wherein the value for each of the plurality of objects is determined based on a quality of service level associated with a tenant to which the object belongs and on a time since a last access of the object.

claim 4 . The system of, wherein the value for each of the plurality of objects is determined based on a quality of service level associated with a tenant to which the object belongs, on a time since a last access of the object, and on a frequency of accesses of the object.

claim 1 determine, for each of a second plurality of objects stored in the cache, a second value based on a quality of service level associated with a tenant to which the object belongs and on an access to the object in the cache; determine one of the second plurality of objects to evict from the cache based on the determined values; and evict the determined one of the second plurality of objects from the cache. . The system of, the one or more processing units to execute the executable program code to cause the system to:

claim 1 wherein the accesses are performed by a plurality of application instances. . The system of, wherein the value for each of the plurality of objects is determined based on a quality of service level associated with a tenant to which the object belongs and on accesses to the object in the cache, and

claim 1 . The system of, wherein the value for each of the plurality of objects is determined based on a quality of service level associated with a tenant to which the object belongs, on a time since a last access of the object, and on a frequency of accesses of the object.

storing a plurality of objects in a cache; determining, for each of the plurality of objects, a value based on a quality of service level associated with a tenant to which the object belongs and on at least one time at which the object was accessed from the cache; determining one of the plurality of objects to delete from the cache based on the determined values; and deleting the determined one of the plurality of objects from the cache. . A method comprising:

claim 9 after deleting the determined one of the plurality of objects from the cache, determining a second one of the plurality of objects to evict from the cache based on the determined values; and deleting the determined second one of the plurality of objects from the cache. . The method of, further comprising:

claim 10 determining, for each of the plurality of objects stored in the processor memory, a second value based on a quality of service level associated with the tenant to which the object belongs and on at least one time at which the object was accessed from the cache; determining one of the plurality of objects to deleted from the cache based on the determined second values; and deleting the determined one of the second plurality of objects from the cache. . The method of, further comprising:

claim 11 . The method of, wherein the value for each of the plurality of objects is determined based on a quality of service level associated with a tenant to which the object belongs and on a time since a last access of the object.

claim 12 . The method of, wherein the value for each of the plurality of objects is determined based on a quality of service level associated with a tenant to which the object belongs, on a time since a last access of the object, and on a frequency of accesses of the object.

claim 9 determining, for each of a second plurality of objects stored in the cache, a second value based on a quality of service level associated with a tenant to which the object belongs and on at least one time at which the object was accessed from the cache; determining one of the second plurality of objects to delete from the cache based on the determined values; and deleting the determined one of the second plurality of objects from the cache. . The method of, further comprising:

claim 9 wherein the plurality of accesses are performed by a plurality of application instances. . The method of, wherein the value for each of the plurality of objects is determined based on a quality of service level associated with a tenant to which the object belongs and on a plurality of times at which the object was accessed, and

claim 9 . The method of, wherein the value for each of the plurality of objects is determined based on a quality of service level associated with a tenant to which the object belongs, on a time since a last access of the object, and on a frequency of accesses of the object.

claim 17 determining, for each of the plurality of objects stored in the processor memory, a second value based on a quality of service level associated with the tenant to which the object belongs and on at least one time at which the object was accessed from the cache; determining one of the plurality of objects to delete from the cache based on the determined second values; and deleting the determined one of the second plurality of objects from the cache. . The one or more computer-readable media of, further comprising:

claim 17 . The one or more computer-readable media of, wherein the value for each of the plurality of objects is determined based on a quality of service level associated with a tenant to which the object belongs, on a time since a last access of the object, and on a frequency of accesses of the object.

claim 17 wherein the plurality of accesses are performed by a plurality of application instances. . The one or more computer-readable media of, wherein the value for each of the plurality of objects is determined based on a quality of service level associated with a tenant to which the object belongs and on a plurality of times at which the object was accessed, and

Detailed Description

Complete technical specification and implementation details from the patent document.

Multi-tenancy is a software architecture pattern which facilitates the sharing of computing resources (e.g., processor cycles, memory) among disparate groups of users. For example, a single multi-tenant application may serve requests received from several independent tenants (e.g., customers) each consisting of multiple end users. Such an application may use a much smaller computing resource footprint than would be required to provision one application per tenant.

Multi-tenant applications use various hardware and/or software-driven schemes to allow the sharing of computing resources between tenants while maintaining tenant-specific data isolation. Multi-tenant applications can be cloud-based (e.g., a Software-as-a-Service (SaaS) application) in order to take advantage of the resource elasticity, redundancy, economies of scale and other benefits provided by cloud platforms.

Different tenants sign different SLAs (Service Level Agreements) with application providers. The different SLAs may guarantee different levels of QoS (Quality of service) for the different tenants. For example, a tenant which has been guaranteed a higher level of QoS should receive better service from the application than a tenant which has been guaranteed a lower level of QoS. Service quality may be measured in terms of throughput, availability, response time lag, etc.

Since all tenants of a multi-tenant application share computing resources, it can be difficult to provide the tenants with different levels of QoS. Systems are desired to efficiently provide different QoS levels to the different tenants of a multi-tenant application.

The following description is provided to enable any person in the art to make and use the described embodiments. Various modifications, however, will remain readily-apparent to those in the art.

The present inventor has identified the shared cache as an important performance bottleneck in a multi-tenant system. Increasing the cache size to remove this bottleneck is typically not viable in view of the high cost of the memory used for caches.

Generally, an application requiring an object attempts to fetch the object from a cache. If the object is stored in the cache, it is returned to the program therefrom. If the cache does not store the object, an error (i.e., a cache miss) is returned and the application fetches the object from another storage (e.g., a database, a file system, or a remote source). The application then attempts to store the object in the cache. If the cache has insufficient available capacity, objects are evicted from the cache until enough capacity is available to store the object. The objects to be evicted from the cache may be determined based on known eviction algorithms such as but not limited to Least-Recently Used, Least-Frequently Used, and First-In-First-Out.

Embodiments implement a cache eviction algorithm which results in the provision of different QoS levels to different tenants. A cache eviction algorithm according to some embodiments improves performance for higher-level QoS tenants by providing those tenants with a cache hit rate (i.e., a percentage of objects requested from the cache which are actually stored in cache) that is higher than a cache hit rate of tenants having a lower QoS level. Read/write operations on objects stored in a cache are much faster than read/write operation on objects stored on disk and other storage.

A cache eviction algorithm according to some embodiments identifies object to evict from a cache based on respective costs calculated for each object in the cache. The cost of an object may depend in part on a QoS level associated with a tenant to which the object belongs. The cost may also depend, in some embodiments, on a time since the object was last accessed, an access frequency of the object, a time required to load the object into the cache from other storage and/or other factors.

1 FIG. 1 FIG. 100 110 120 illustrates a system according to some embodiments. The illustrated components ofmay be implemented using any suitable combinations of computing hardware and/or software that are or become known. The components of systemmay be on-premise, cloud-based (e.g., in which computing resources are virtualized and allocated elastically), distributed (e.g., with distributed storage and/or compute nodes) and/or deployed in any other suitable manner. Each of serversandmay comprise one or more servers, virtual machines, clusters of a container orchestration system and any other combination that is or becomes known. All or a part of each system may utilize Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS) and/or Software-as-a-Service (SaaS) offerings owned and managed by one or more different entities as is known in the art.

110 112 110 112 Application servermay provide an operating system, services, I/O, storage, libraries, frameworks, etc. to applications executing therein. Applicationmay comprise program code executable by a processing unit of application serverto provide functions based on coded logic and data. Applicationmay provide any computing functions that are or become known.

110 114 116 114 116 116 110 116 116 Application serveralso includes cache managerand cache. Cache managermay comprise program code which may be executed to, among other functions, determine objects to evict from cache. Cachemay be implemented by any one or more memory devices from which a processing unit of servermay quickly access data, for example due to the inherent speed of the memory device, the speed of a connection between the processing unit and the memory device, and/or the distance between the processing unit and the memory device. Examples of cacheinclude but are not limited to high-speed static random-access memory. Cachemay include L1, L2 and/or L3 processor memory.

114 116 115 114 115 116 112 115 116 114 116 Cache managermay determine objects to evict from cachebased on statistics. Cache managermay determine and store statisticsbased on the storage of objects in cacheduring operation of application. Statisticsmay identify objects currently stored in cacheand may specify, for each stored object, a time at which the object was last accessed, a frequency with which the object has been recently accessed, a time required to load the object into the cache from other storage, and a weighted cost of the object. The weighted cost of an object depends in part on a QoS level associated with a tenant to which the object belongs. Cache managermay determine the weighted costs periodically and use the weighted costs to determine objects to evict from cacheas will be described below.

112 122 120 122 124 126 126 110 126 124 126 110 Applicationaccesses databaseof database serverduring operation. Databasestores metadatawhich describes the structure and interrelationships (i.e., the schema) of data. Datamay comprise tenant data as well as other data used by application. Datamay comprise tabular data stored in a columnar or row-based format, object data or any other type of data that is or becomes known. An object as described herein may comprise any logical group of data, including object data and data of one or more table rows. Metadataand datamay be stored by application serverin some embodiments.

122 122 112 Databasemay be multi-tenant aware, serving requests based on the tenant associated with the request. In a case that databaseis not multi-tenant aware, one schema of a single instance may be used for all tenants, where the data of each tenant is partitioned via a discriminating column. Multi-tenant applicationis therefore responsible for tracking and managing the data in a tenant-aware manner, for example by using the values of the discriminating column to identify the data belonging to specific tenants.

122 122 Databasemay be implemented using one or more storage systems, each of which may be standalone or distributed, on-premise or cloud-based. Databasemay comprise any type of database, data warehouse, object store, or other storage system that is or becomes known.

132 134 136 112 112 132 134 136 112 112 Users,,may operate respective user devices (not shown) to interact with application. Such user devices may execute Web browsers which request user interfaces from applicationand present the user interfaces to users,,. The user devices may access applicationvia a gateway (not shown) which routes requests to applicationand may also provide authentication, authorization, and load balancing.

132 134 136 112 Users,,may be associated with different respective tenants. Each tenant may be a party to a distinct subscription/agreement/contract with a provider of application. Each tenant may therefore be associated with a different QoS level.

2 FIG. 200 123 56 221 132 123 134 56 136 221 200 116 134 is a tabular representation of tenant-specific QoS informationaccording to some embodiments. Each of tenants T, Tand Tis associated with a different level (or grade) of QoS. The levels may be determined based on an agreement between the tenants and the application provider as is known in the art. Embodiments are not limited to any particular number or gradations of levels. For purposes of the present example, it will be assumed that userbelongs to tenant T, userbelongs to tenant T, and userbelongs to tenant T. Informationmay be stored in cache, dataand/or in any other suitable memory.

112 112 122 116 112 116 122 Applicationserves incoming requests based on the tenants underlying the requests. In particular, applicationserves a request received from a user of a particular tenant based on the tenant's data stored in database(and/or cache) and in view of the QoS level associated with the tenant. If an object is requested, applicationdetermines whether the object is in cacheand, if not, retrieves the object from databaseand stores the object in the cache. The object remains in the cache until it is evicted.

114 115 112 115 115 116 112 116 115 114 3 FIG. Cache managermaintains and updates statisticsduring operation of application.is a tabular representation of cache statisticsaccording to some embodiments. Each object of cache statisticsmay be currently stored in cacheand may belong to any tenant served by application. For each object in cache, statisticsstores a time at which the object was last accessed (LastAccess), a frequency with which the object has been accessed within a recent time window (AccessFreq), a time required to load the object into the cache from its persistent storage (LoadTime), and a weighted cost of the object (WeightedCost). Each of these values is monitored and/or calculated by cache manager.

116 115 w The weighted costs of each object in cachemay be periodically determined based on the other fields of statisticsand on the QoS level associated with the tenant to which the object belongs. According to some embodiments, the weighted cost costis determined as:

122 where qos is the QoS level associated with the tenant to which the object belongs. MAX_QOS is a positive constant and is the maximum QoS level provided by the system, and qos may comprise an integer in between [1, MAX_QOS]. Δtime is the time difference between the current timestamp and the timestamp at the last access of the object (i.e., LastAccess). LoadTime represents the time required to load the object from its persistent storage (e.g., database). LoadTime can be determined from historical operation data. freq is the number of times the object has been fetched in the prior in the current time window, where the current time window is an adjustable parameter. MAX_IDL_TIME and γ are also adjustable parameters, in which MAX_IDL_TIME is a maximum time for which an object should remain in the cache without being fetched and γ is a QoS importance factor with γ≥0. The bigger γ is, the more important the QoS level is to the weighted cost.

γ×(qos-MAX_QOS) w w eis the weighting factor for QoS in the above formula for cost. When qos=MAX_QOS, the weighting factor has its maximum value of 1. costis directly related to the values of freq, loadTime and qos, and inversely related to Δtime.

115 1 2 i 2 In one example, statisticsassociate two objects with the same values of freq, Δtime, and loadTime. One of the objects belongs to tenant1 with QoS level qosand the other object belongs to tenant2 with QoS level qos, and Δqos=qos−qos>0. The ratio between the weighted cost of tenant1's object and the weighted cost of tenant2's object is

1,2 w w As γ increases, ratioincreases. Accordingly, QoS levels become more important in determining costas γ increases. If γ=0, the QoS level is not relevant to the determination of cost.

w Embodiments are not limited to the above calculation of cost. For example, a calculation which does not account for LoadTime may be:

The influence of freq may also be ignored in some embodiments as follows:

4 FIG. 400 400 is a flow diagram of processto provide tenant-based cache eviction according to some embodiments. Processand the other processes described herein may be performed using any suitable combination of hardware and software. Program code embodying these processes may be stored by any non-transitory tangible medium, including a fixed disk, a volatile or non-volatile random-access memory, a DVD-ROM, a Flash drive, a magnetic tape, and solid-state Random-Access Memory (RAM) or Read Only Memory (ROM) storage, and may be executed by any number of processing units, including but not limited to processors, processor cores, and processor threads. Such processors, processor cores, and processor threads may be implemented by a virtual machine provisioned in a cloud-based architecture. Embodiments are not limited to the examples described below.

410 Initially, at S, a request associated with an object is received by an application. In one example, a user operates a client device (e.g., a desktop computer) to execute a Web browser application. The user may select or otherwise input a Uniform Resource Locator (URL) associated with a cloud-based application, causing the Web browser to send a request to a cloud gateway corresponding to the URL. The request may include a security token and the cloud gateway may perform authentication and authorization using the token. For example, the gateway may authenticate the user as belonging to a particular tenant.

400 The received request is associated with an object. For example, the request may comprise a request to read an object or update an object. The request may be associated with more than one object but processwill be described with respect to one object for clarity.

420 430 480 430 In response to the request, the object is fetched from the cache at S. Fetching the object from the cache may comprise sending a request for the object to a cache manager. If the object is currently stored in the cache, a cache hit returning the object is detected at Sand the object is returned to the application at S. If the object is not currently stored in the cache, a cache miss is detected at S.

440 450 470 480 In response to the cache miss, the object is fetched from other storage at S. The other storage may comprise a persistent database storage, in-memory storage which is slower to access than the cache, a remote storage system, or the like. Next, at S, it is determined (e.g., by the cache manager) whether the cache includes enough available space to store the fetched object. If so, the object is stored in the cache at Sand is returned to the application at S.

450 460 460 460 Flow proceeds from Sto Sif the cache does not include enough available space to store the fetched object. At S, an object which is stored in the cache and which is associated with the smallest weighted cost of all objects currently stored in the cache is evicted from the cache. Stherefore includes determination of the object which is associated with the smallest weighted cost of all objects currently stored in the cache, and eviction of the object from the cache.

460 115 460 115 The cache manager may perform the determination of Sby referring to statistics such as statistics. The determination of Smay comprise identifying an object in statisticswhich is associated with the smallest weighted cost. It is noted that the cache manager need not consider to which tenants the cached objects belong or the respective QoS levels of the tenants, since the weighted cost of each cached object factors in such information.

460 450 440 450 460 450 470 480 Eviction at Smay comprise deletion, deallocation and/or any process which frees the cache space used by the identified object for use in storing another object. Flow then returns to Sto again determine whether the cache includes enough available space to store the object fetched at S. Flow therefore cycles between Sand S, identifying and evicting object from the cache until it is determined at Sthat the cache includes enough available space to store the fetched object. At this point, the object is stored in the cache at Sand then returned to the application at S.

The following are examples of weighted cost determinations according to some embodiments. The determinations assume two tenants, Tenant1 associated with QoS=4 and Tenant2 associated with QoS=3. Moreover, MAX_QOS=5, MAX_IDL_TIME=300 s, and γ=0.5.

w1 w2 w2 w1 460 In the first example, cached object obj1 belongs to Tenant1 and is associated with metrics Δtime1=51 s, freq1=49, and loadTime1=1 s. Using the formula above, cost=247. Cached object obj2 belongs to Tenant2, is stored in the cache is after obj1 is stored in the cache, and is associated with metrics Δtime2=50 s, freq2=50, LoadTime2=1 s, resulting in cost=153. Accordingly, between the two objects, obj2 would be selected for eviction at Ssince cost<cost.

Using FIFO, LRU, or LFU algorithms, obj1 may be selected for eviction over obj2. However, the above example illustrates a case in which the difference in tenant QoS levels overrides the difference in the usage metrics of the cached objects. As a result, the object belonging to the tenant associated with the greater QoS level is maintained in the cache, possibly providing improved performance to that tenant.

w w2 460 In another example, cached object obj3 belongs to Tenant1 and is associated with metrics Δtime3=100 s, freq3=50, and loadTime3=1 s. Using the formula above, cost3=202. Cached object obj4 belongs to Tenant2, is stored in the cache is after obj1 is stored in the cache, and is associated with metrics Δtime4=50 s, freq2=100, LoadTime2=1 s, resulting in cost=307. Due to these weighted costs, obj3 would be selected over obj4 for eviction at S. LRU and LFU algorithms may also select obj3 for eviction, since the difference in tenant QoS is not sufficient to override the difference in the usage metrics of the cached objects.

5 FIG. illustrates an orchestration platform-based multi-tenant system providing tenant-based cache eviction according to some embodiments.

510 522 524 526 510 530 532 534 536 532 534 536 540 542 544 546 As mentioned above, gatewaymay receive requests from users,andof disparate tenants. The requests may be associated with a multi-tenant application. Gatewaydirects the requests to clusterof server nodes,and. As is known in the art, server nodes,andimplement a container orchestration platform (e.g., Kubernetes) for execution of instances of the multi-tenant application. The application instances may access instances of a database application executing in clusterof server nodes,and. Each server node may comprise a virtual machine allocated by a cloud provider providing self-service and immediate provisioning, autoscaling, security, compliance and identity management features.

532 534 536 542 544 546 522 524 526 552 550 552 The application instances executing on server nodes,andaccess the database application instances executing on server nodes,andin order to create, read, update and delete objects belonging to the tenants of users,and. These functions include reading and writing objects from and to shared cacheof cache system. Cachemay be an in-memory key-value distributed data store such as a Redis cache but embodiments are not limited thereto.

554 550 555 555 520 554 552 555 552 5 FIG. Cache managerof cache systemmaintains and updates statisticsas described above. In theimplementation, statisticsare affected by cache accesses of each application instance executing in cluster. For example, the LastAccess and AccessFreq values for a given object reflect all accesses of the object by each application instance. Cache managermay determine a weighted cost of each object in cachebased on statisticsand tenant QoS levels as described above, in order to determine objects to evict from cache.

6 FIG. illustrates a cloud-based database deployment according to some embodiments. The illustrated components may comprise cloud-based compute resources residing in one or more public clouds providing self-service and immediate provisioning, autoscaling, security, compliance and identity management features.

610 630 610 630 610 620 630 Execution environments-may comprise servers or virtual machines of a Kubernetes cluster. Execution environments-may support containerized applications which provide one or more services to users. Execution environmentmay execute a multi-tenant application, execution environmentmay execute a database, and execution environmentmay execute a cache system as described herein.

The foregoing diagrams represent logical architectures for describing processes according to some embodiments, and actual implementations may include more or different components arranged in other manners. Other topologies may be used in conjunction with other embodiments. Moreover, each component or device described herein may be implemented by any number of devices in communication via any number of other public and/or private networks. Two or more of such computing devices may be located remote from one another and may communicate with one another via any known manner of network(s) and/or a dedicated connection. Each component or device may comprise any number of hardware and/or software elements suitable to provide the functions described herein as well as any other functions. For example, any computing device used in an implementation of a system according to some embodiments may include a processor to execute program code such that the computing device operates as described herein.

Embodiments described herein are solely for the purpose of illustration. Those in the art will recognize other embodiments may be practiced with modifications and alterations to that described above.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G06F G06F12/121

Patent Metadata

Filing Date

November 15, 2024

Publication Date

May 21, 2026

Inventors

Hui LI

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search