In one aspect, a computerized method for scalable, correct, and high-performance asynchronous lockless sharing of a computer resource comprising: determining there is a contention for a shared computer resource by a plurality of competing processes, wherein the plurality of competing processes are competing to access a same portion of the shared resource; adding the plurality of competing processes a priority queue; retrieving a process at a front of the queue of the plurality of competing processes; access a work area of the process at a front of the queue; sharing the work area with other processes of the plurality of competing processes in priority queue; sanitizing the work area to obtain a plurality of code bundles; placing the code bundles into a patchpointer; and processing the patchpointer until the patchpointer is empty.
Legal claims defining the scope of protection, as filed with the USPTO.
1. A computerized method for scalable, correct, and high-performance asynchronous lockless sharing of a computer resource comprising:
2. The computerized method of, wherein a plurality of other data structures is added to the priority queue.
3. The computerized method of, wherein the plurality of other data structures added to the priority queue comprises at least one of an adaptive radix tree, a chaining hash map, a balanced or unbalanced binary search tree (BST), or a linked list.
4. The computerized method of, wherein the patchpointer comprises a data structure used for a patchpointing operation.
5. The computerized method of, wherein the patchpointing operation comprises a method to manage one or more concurrent accesses to the shared computer resource.
6. The computerized method of, wherein the competing processes simultaneously select processing work uniformly at random.
7. The computerized method of, wherein an attempt to insert processing works simultaneously at uniformly available random locations in the patchpointer.
8. The computerized method of, wherein each priority queue is updated continuously, and new processes are added at any time.
9. The computerized method of, wherein the code bundle comprises a block of code without any function arguments.
10. The computerized method of, wherein a code bundle is nested inside one another code bundle.
11. The computerized method of, wherein a function of the plurality of competing processes is transformed into a code bundle by wrapping the function in into a set of code that reads arguments from memory.
12. The computerized method of, wherein the process with the highest priority is retrieved and the work area of the process with the highest priority is accessed.
13. The computerized method of, wherein it is determined that the work area of the process with the highest priority is not yet initialized.
14. The computerized method of, wherein the work area of the process with the highest priority is initialized by a corresponding interpretive processor of the process with the highest priority.
15. The computerized method of, wherein the work area of the process with the highest priority is made shareable among all the processes in the priority queue.
16. The computerized method of, wherein the code bundle resides inside the work area.
17. The computerized method of, wherein the code bundles is split or merged according to preferences.
18. The computerized method of,
19. The computerized method of,
20. The computerized method of,
Complete technical specification and implementation details from the patent document.
“This application claims priority under Article 4A of the Paris Convention for the Protection of Industrial Property to Indian Patent Application No. 202241063919, filed on Nov. 9, 2022 and titled METHOD AND APPARATUS FOR MANAGING CONCURRENT ACCESS TO A SHARED RESOURCE USING PATCHPOINTING.”
In a shared multiprocessor/multi-core processor setting, there is a need for optimizing transaction processing on shared resources. Contention on shared resources in concurrent environments is the primary inhibitor of scalability, performance and predictability (in terms of throughput and latency) of a concurrent/parallel application.
Many technologies have been proposed to enhance the performance of transaction processing in concurrent settings. On the hardware front, multiple microprocessors provide large workload capacity, while also providing multithreading functionality to act on shared computer resources. Various multiprocessors exist that provide cache coherency guarantees at the cache line level among processor cores. On the software front, multithreaded operating systems with logically partitioned address space have been developed. These permit computer programs to parallelly run-in multiple threads to enable concurrent tasks.
While parallelism enhances system performance and transaction processing in shared resources, it adds an additional complexity of task synchronization. Concurrent processes are generally unaware of activities and state of other processes and thus may interfere with their operations. This may result in data corruption, system crashes and other such indeterminate outputs. One example of a shared resource is buffer queue in a network adapter in a computer: processes compete for slots that reside in the buffer. The slots hold frames of data (network data packets). Concurrent access of processes to the slots to transact on frame data (in the slots) is typically managed through locking, interrupts or in a lockless fashion.
Another example of a shared resource is the computer heap memory: processes compete with each other for heap memory. The memory allocator (provided by the operating system or custom designed allocators) needs to manage concurrency either by locking or lockless mechanisms.
Locking (or interrupts) is a way to introduce serializability among multiple concurrent processes. It is a way to restrict access to the shared resource to other processes by a lock holding (or interrupting) process. Such locking and interrupt mechanisms prove counterproductive from concurrency/parallelism standpoint. To alleviate the drawbacks of locking and interrupt mechanisms, lockless methods have been developed. One major advantage of lockless algorithms over locking algorithms is that lockless offers protection against unbounded lock time. In lockless mechanisms, synchronization among processes is achieved by atomic transitions between consistent states of the shared resource.
A lockless algorithm is generally optimized for a specific data structure like FIFO queues, Ring Buffers, Sets, etc.). Such mechanisms work fine as long as the complete transaction processing engine centers around that single data structure. But even then, their correctness is very difficult to verify and are not easily extensible/applicable to other data structures. Various lockless algorithms supporting multiple updating/writing processes suggested in the prior art are very complicated in design, generally viewed impractical (and rarely implemented in practice), and scale very badly. These and other key limitations are generally well recognized in the art.
Various data structures have been implemented offering different progress guarantees for different operations (e.g., lock-free circular buffers for multiple producers and consumers, lock-free queues and stacks, etc.).
When various heterogeneous system components/data-sources (I/O Buffers, Memory Heaps, External Database Arrays, Collection of Objects, etc.) need to interact with one another, the state of art lockless methods negatively impact the performance due to bad interoperability of lockless algorithms, large context switching overheads, need numerous atomic operations to execute, need numerous cache line transfers, high cost of exchanging state between CPU cores, and excessive data copying (resulting in huge latencies and high consumption of computer resources and energy).
In locking algorithms, it is generally sufficient to handle the locking dependencies and critical sections of processes alone. For lockless algorithms, due to absence of critical sections, the number of execution traces of the concurrent application, involving the interactions of shared variables can be very large. This problem is generally referred to as state space explosion.
Also, in a distributed setting, where many multi-processor computers are connected by networks (LAN, WAN, or Internet), this problem of optimizing for shared source transaction processing is amplified due to network overhead (TCP/IP) and performance degrades rapidly.
As such, in a multiprocessor/multithreaded (and/or distributed) setting, due to the overhead and bad-interoperability among intermix of locking, interrupt and/or lockless mechanisms in managing concurrent access to heterogeneous shared data sources (like arrays, linked lists, sets, ring buffers, B-Trees, etc.), many state of art database systems suffer from data losses, stale reads, read skews, lock conflicts, etc., compromising on both transactional guarantees and performance.
A substantial need exists for an improved coordination among concurrent processes/tasks/threads that is applicable to any type of shared data source (in a multithreaded computer and/or network of multithreaded computers) and offers high performance gains and transactional guarantees. Also, no solution (either lockless or locking) exists that allows concurrent writes to a shared data source without needing high amounts of data copying (or caching) presently. The present invention achieves this goal.
In one aspect, a computerized method for scalable, correct, and high-performance asynchronous lockless sharing of a computer resource comprising: determining there is a contention for a shared computer resource by a plurality of competing processes, wherein the plurality of competing processes are competing to access a same portion of the shared resource; adding the plurality of competing processes to a priority queue; retrieving a process at a front of the queue of the plurality of competing processes; access a work area of the process at a front of the queue; sharing the work area with other processes of the plurality of competing processes in priority queue; sanitizing the work area to obtain a plurality of code bundles; placing the code bundles into a patchpointer; and processing the patchpointer until the patchpointer is empty.
The Figures described above are a representative set and are not an exhaustive with respect to embodying the invention.
Disclosed are a system, method, and article of manufacture for managing concurrent access to a shared resource using patchpointing. The following description is presented to enable a person of ordinary skill in the art to make and use the various embodiments. Descriptions of specific devices, techniques, and applications are provided only as examples. Various modifications to the examples described herein will be readily apparent to those of ordinary skill in the art, and the general principles defined herein may be applied to other examples and applications without departing from the spirit and scope of the various embodiments.
Reference throughout this specification to “one embodiment,” “an embodiment,” “one example,” or similar language means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, appearances of the phrases “in one embodiment,” “in an embodiment,” and similar language throughout this specification may, but do not necessarily, all refer to the same embodiment.
Furthermore, the described features, structures, or characteristics of the invention may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided, such as examples of programming, software modules, user selections, network transactions, database queries, database structures, hardware modules, hardware circuits, hardware chips, etc., to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art can recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, materials, and so forth. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid obscuring aspects of the invention.
The schematic flow chart diagrams included herein are generally set forth as logical flow chart diagrams. As such, the depicted order and labeled steps are indicative of one embodiment of the presented method. Other steps and methods may be conceived that are equivalent in function, logic, or effect to one or more steps, or portions thereof, of the illustrated method. Additionally, the format and symbols employed are provided to explain the logical steps of the method and are understood not to limit the scope of the method. Although various arrow types and line types may be employed in the flow chart diagrams, and they are understood not to limit the scope of the corresponding method. Indeed, some arrows or other connectors may be used to indicate only the logical flow of the method. For instance, an arrow may indicate a waiting or monitoring period of unspecified duration between enumerated steps of the depicted method. Additionally, the order in which a particular method occurs may or may not strictly adhere to the order of the corresponding steps shown.
The following terminology is used in example embodiments:
ABA problem is a multithreading problem that occurs during synchronization, when a location is read twice, has the same value for both reads, and “value is the same” is used to indicate “nothing has changed”. However, another thread can execute between the two reads and change the value, do other work, then change the value back, thus fooling the first thread into thinking “nothing has changed” even though the second thread did work that violates that assumption.
Arrays, B-Trees, linked-lists, queues, stacks, etc. are some examples of data structures.
Compare-And-Swap (CAS) is an atomic operation that takes three arguments (R, expected_value, new_value), where R is the register on which it is applied, expected_value is the expected value of the register, and new_value is the new value to be written. The operation compares expected_value with the current value of R, and atomically updates R to new_value if the expected value matches the current value. In this case, we say that the CAS succeeds. Otherwise, the value of R is not updated, and the CAS fails.
Contention is a product of the effective arrival rates of requests to a shared resource that directly (adversely) affects the responsiveness of a system.
Data structure is implemented using code for processes to run for each of its operations.
Generative model is a statistical model of the joint probability distribution P(X,Y) on given observable variable X and target variable Y.
Machine learning can include the construction and study of systems that can learn from data. Example machine learning techniques that can be used herein include, inter alia: decision tree learning, association rule learning, artificial neural networks, inductive logic programming, support vector machines, clustering, Bayesian networks, reinforcement learning, representation learning, similarity, and metric learning, and/or sparse dictionary learning.
Patchpointing is a method to manage concurrent accesses to a shared computer resource. A patchpointer is a data structure used for patchpointing operations.
Program/task/thread/process running a lockless algorithm does not use locks (e.g. mutexes, semaphores, etc.).
Steps taken by a process implementing an execution sequence are called operations.
Set of operations is called a data structure. Each operation has an expected behavior that is defined using a sequential specification. That is, it specifies the behavior of an operation in an environment where no other process/thread/task is interleaved with the steps of the executing process.
Transaction refers to one or more execution sequences (e.g. machine executable instructions).
These definitions are provided by way of example and not of limitation. They can be integrated into various example embodiments discussed infra.
In accordance with the invention, work areas of processes can be encapsulated and permitted to reside in data storage (in host processor or remote location). Encapsulated work areas can be accessed by plurality of live work areas, whether directly or through other encapsulated work areas.shows a simplified view of an exemplary work area of a process, along with its data structures and interactions among them. A process has its language/runtime systemassociated with it, which couples with its live work area. The live work areacan access named entities from other encapsulated work areasthat are addressable by nameA.B is denotive of the contents inside an encapsulated work area, which may contain other encapsulated work areas. Encapsulated work areas may also hold a language/runtime processorinside them. The control stackB maintains the execution/call stack of the program source codeA. The programA may also be optionally stored in binary format (e.g., as object module).
Each work area has a symbol or name table, as is shown diagrammaticallyC in. An exemplary name table may a plurality of address pointers as entries pointing to named entities within a work area. Such internally pointed entities are called internal entities. Entities residing outside live work area are called external entities.
The name or symbol table is populated with various entries for named entities with fields such as name identifier, creation date or timestamp, entity value (if applicable), a bit (or flag value) to check whether entity name is resolved or not, address of the entity, the class type of the entity(function, variable, constant, etc.), a reference countfor garbage collection purposes, and other miscellaneous fields (for example, if the entity is of array type, then addition fields like rank, shape, total number of elements, flag to check if it is sorted, etc.) may be present. Such entries for each named entity populate the name or symbol table.
Private copies of named entities from each accessed work area can be modified and retained in the live work area. A database of encapsulated work areas may be loaded preparatory to process execution or may be loaded upon a determined need. Subject to operating system's allocation controls, once an encapsulated work area is loaded, it is internalized into the live work area.
While executing a program, the language/runtime system stores in the corresponding work area a copy of all modified entities and changes in the symbol or name table, like marking external named entities as “resolved” or “not resolved”, etc.illustrates a simplified flowchart of such name resolution and modification of work area by language/runtime system. Upon starting, the work area is initializedand the next program statement is executed. If there is no statement to execute, the name resolution procedure halts. Else, if the program statement contains a nameandand is internal, it accesses the entity. If it is external, it resolves by accessing encapsulated work areaand searches for the name and then accesses the entity. If the program statement does not contain any reference to a name, it is executed normally. This procedure continues until all names are resolved and the live work area is populated with entities.
shows a block diagram of the problem context relating to present invention. Scalable, correct, and high-performance asynchronous lockless sharing of a computer resourceamong a set of processes is a fundamental problem in distributed computing. It becomes all the more difficult where processesandmove at arbitrary speeds and are crash prone. Network communicationmay also contribute to increased latencies and distributiveness. Patchpointingis a proposed lockless concurrent shared resource management mechanism that achieves high scalability, handles dynamic loads, offers great performance guarantees and is simple to implement and doesn't require complicated correctness proofs for its operations.
Access to the shared computer resourceis managed by. The shared resource can be distributed across various systems (external data storage, nodes, computers, across internet, etc.) anddenotes the communication betweenand. Various processesandmay interact asynchronously withdirectly or via internet, through an API interfaceand.
presents a flow-chart diagram of the overall operation of. In light of contention, where many processes compete to access the same portion of shared resource, all competing processes are added to a priority queue. Other data structures adaptive radix trees, chaining hash maps, balanced or unbalanced binary search trees (BST), linked lists, etc., may also be used in place of priority queues.
The priority order may be made suitable according to various preferences. For example, the contending processes can be prioritized based on their timestamp (access request priority, order of arrival, etc.), compute resources (processes with larger compute resources like GPU/CPU/TPU and larger memory may take higher priority compared to others with lower compute resources, or vice-versa, etc.), latency times (processes with high network latencies may be prioritized first, or vice-versa, etc.). The priority order may also be made dynamic and programmed into the ordering schedule. The chosen data structure (priority queue, linked list, balanced, or unbalanced BST, adaptive radix tree) is populated accordingly. In step, processretrieves a process at front of the queue.
The mechanism of patchpointingis based on a data structure called patchpointer. The structure and operations on patchpointer are illustrated inrespectively. The underlying intuition is that, to reduce contention drastically, competing processes must be turned into collaborating processes, and collaborating processes should simultaneously pick processing work uniformly at random, and attempt to insert processing work simultaneously at uniformly available random locations in the patchpointer. The processing work referred here pertains to execution of code bundles. The aforementioned priority queue can be updated continuously, and new processes can be added at any time.
In the patchpointing mechanism, for each cycle of operation, the following steps are performed, until the priority queue is empty (e.g. as determined in step). The process P with highest priority is retrieved and its work area is accessed. If its work area is not yet initialized, it is initialized by its corresponding interpretive processor. The work area is made shareableamong all the processes in the priority queue. This sharing of work area can utilize various techniques such as POSIX shmem API, XPMEM, SMARTMAP, PVAS, MPI interfacing, etc. The work area, and specifically the program code inside the work area is analyzedand various code bundles are extracted from it. A block of code without any function arguments is called a code bundle, and code bundles may be nested inside one another. Any function (or subroutine or procedure) can be transformed into a code bundle (or a collection of code bundles) by wrapping it in code that reads arguments from memory. For example, a lambda expression (or a procedure or subroutine with no arguments) in Java or C++ or Lisp or APL or Fortran is a code bundle.
The code bundles can also be split (or merged) according to preferences. For example, the code bundles can be split according to call-by-value (CBV) or call-by-name (CBN) continuation-passing style (CPS). Various programs exist which translate a given source code to CBV CPS or CBN CPS. Code bundles can also be merged together to better suit tensor computations (SIMD, SIMT, etc.). Code bundles can also be nested inside one another. Code bundles may or may not be disjoint from one another. However, highest performance and scalability is achieved with disjoint and short code bundles.
This analysis of program code and extraction of code bundles may be a multithreaded (parallel) computation. Code bundles may be further split or merged according to access patterns on the underlying shared computer resource. Extracted code bundles can be simultaneously placed into the patchpointer and read simultaneously by collaborating processes. Stepcan be a multithreaded operation.
The code bundles may reside inside the work area of process P. The collaborating processes may have access to their private memory. The patchpointer data structure contains a fairly large arrayof pointers, preferably of size 2(i.e., an integer power of 2) pointers. The array values may be optionally initialized to NULL.
Stepprocesses the patchpointer until all code bundles of process P have been executed and there is no contention from process P for the shared resource. Once process P is popped out and completed, the next process in the priority order is processed in the same way.
The patchpointer can be simultaneously accessed by many processes or threads. Process P (which can be multithreaded) or any other helper processes (which can be multithreaded) can add pointers to code bundlesby first acquiring a random per-thread index into the patchpointer.
A pointer to a code bundle is added by CAS into the index, assuming the value at index is NULL. If CAS is successful, index is incremented by 1, and another entry is added. If CAS fails, index is incremented by 1 and is tried again. If index value equals maximum index value of the patchpointer, then index is set to 0 (or any other chosen or random value less than maximum index value).
Unknown
October 14, 2025
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.