US-12602213-B2

Scalable approximate counting

PublishedApril 14, 2026

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

Techniques for implementing an intentionally approximate counting scheme are disclosed. A code counter is accessed. The counter updates a count that is approximately representative of a number of times the code is called. The counter operates in a first mode where the counter updates the count by a value of 1 and where a probability by which the counter updates the count is set to a value of 1. The counter switches from operating in the first mode to operating in a second mode. The counter now updates the count by a value of N, where N is an integer larger than 1. The probability by which the counter updates the count is set to a value of 1/N such that, despite multiple calls being made to the code, the count is updated in accordance with the 1/N probability. The counter continues to update the count.

Patent Claims

Legal claims defining the scope of protection, as filed with the USPTO.

. A computer system that implements an intentionally approximate counting scheme, said computer system comprising:

. The computer system of, wherein the code portion is called or executed by a thread.

. The computer system of, wherein the count is approximately representative of the number of times the code portion is called or executed when the count is within a determined tolerance level of the number of times the code portion is called or executed.

. The computer system of, wherein the determined tolerance level is between about 1-2%.

. The computer system of, wherein the counter is one of a plurality of counters that are included in the code.

. The computer system of, wherein a number of counters included in the plurality of counters exceeds 50.

. The computer system of, wherein the code is subsequently optimized.

. The computer system of, wherein the counter is a contended resource in which multiple calling entities are attempting to use.

. The method of, wherein the code portion is called or executed by a central processing unit.

. The method of, wherein the counter, when operating in the first mode, is a deterministic counter.

. The method of, wherein the counter, when operating in the second mode, is a probabilistic counter.

. The method of, wherein the first threshold is set to a count value of over 8,000 counts.

. The method of, wherein the first threshold is set to a count value that is over 8,000 counts and less than 10,000 counts.

. The method of, wherein the first threshold is set to a count value that is over 5,000 counts.

. The method of, wherein the first threshold is set to a count value that is less than 15,000 counts.

. A method for implementing an intentionally approximate counting scheme, said method comprising:

. The method of, wherein the counter is one of a plurality of counters that are included in the code, and wherein a number of counters included in the plurality of counters exceeds 100.

. A computer system comprising:

. The computer system of, wherein the counter is a contended resource.

. The computer system of, wherein the counter, when operating in the first mode, is a deterministic counter.

Detailed Description

Complete technical specification and implementation details from the patent document.

This application claims the benefit of and priority to U.S. Provisional Patent Application Ser. No. 63/455,704 filed on Mar. 30, 2023 and entitled “SCALABLE APPROXIMATE COUNTING,” and which application is expressly incorporated herein by reference in its entirety.

There are many tools available to develop a software application. For instance, an integrated development environment (IDE) is one tool that helps developers program code. There are also various different graphical user interfaces that are available to assist in the process.

In addition to simply developing features for an application, developers are often tasked with making an application run more efficiently on a computer system. For example, it is often desirable to improve the performance of an application through various optimization techniques.

The subject matter claimed herein is not limited to embodiments that solve any disadvantages or that operate only in environments such as those described above. Rather, this background is only provided to illustrate one exemplary technology area where some embodiments described herein may be practiced.

In some aspects, the techniques described herein relate to a computer system that implements an intentionally approximate counting scheme, said computer system including: a processor system; and a storage system that stores instructions that are executable by the processor system to cause the computer system to: access a counter that is included in a portion of code, wherein the counter is structured to update a count that is approximately representative of a number of times the code portion is called or executed; cause the counter to operate in a first mode, wherein the first mode is a mode in which the counter updates the count by a value of 1, and wherein, while the counter is operating in the first mode, a probability by which the counter updates the count is set to a value of 1 such that the counter updates the count each time the code portion is called or executed; determine the count has reached a first threshold while the counter is operating in the first mode; cause the counter to switch from operating in the first mode to operating in a second mode, wherein the second mode is a mode in which the counter updates the count by a value of N, where N is an integer larger than 1, and wherein, while the counter is operating in the second mode, the probability by which the counter updates the count is set to a value of 1/N such that, despite multiple calls being made to the code portion, the count is updated in accordance with the 1/N probability; and cause the counter, which is now in the second mode, to continue to update the count, wherein the count is updated in accordance with the 1/N probability, and wherein the counter updates the count by the value of N.

In some aspects, the techniques described herein relate to a method for implementing an intentionally approximate counting scheme, said method including: accessing a counter that is included in a portion of code, wherein the counter is structured to update a count that is approximately representative of a number of times the code portion is called or executed; causing the counter to operate in a first mode, wherein the first mode is a mode in which the counter updates the count by a value of 1, and wherein, while the counter is operating in the first mode, a probability by which the counter updates the count is set to a value of 1 such that the counter updates the count each time the code portion is called or executed; determining the count has reached a first threshold while the counter is operating in the first mode; causing the counter to switch from operating in the first mode to operating in a second mode, wherein the second mode is a mode in which the counter updates the count by a value of N, where N is an integer larger than 1, and wherein, while the counter is operating in the second mode, the probability by which the counter updates the count is set to a value of 1/N such that, despite multiple calls being made to the code portion, the count is updated in accordance with the 1/N probability; and causing the counter, which is now in the second mode, to continue to update the count, wherein the count is updated in accordance with the 1/N probability, and wherein the counter updates the count by the value of N.

In some aspects, the techniques described herein relate to a method for implementing an intentionally approximate counting scheme, said method including: accessing a counter that is included in a portion of code, wherein the counter is structured to update a count that is approximately representative of a number of times the code portion is called or executed; causing the counter to operate in a first mode, wherein the first mode is a mode in which the counter updates the count by a value of 1, and wherein, while the counter is operating in the first mode, a probability by which the counter updates the count is set to a value of 1 such that the counter updates the count each time the code portion is called or executed; determining the count has reached a first threshold while the counter is operating in the first mode; causing the counter to switch from operating in the first mode to operating in a second mode, wherein the second mode is a mode in which the counter updates the count by a value of N, where N is an integer larger than 1, and wherein, while the counter is operating in the second mode, the probability by which the counter updates the count is set to a value of 1/N such that, despite multiple calls being made to the code portion, the count is updated in accordance with the 1/N probability; causing the counter, which is now in the second mode, to continue to update the count, wherein the count is updated in accordance with the 1/N probability, and wherein the counter updates the count by the value of N; and writing, each time the count is updated, a value of the count to a storage system.

This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.

Additional features and advantages will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the teachings herein. Features and advantages of the invention may be realized and obtained by means of the instruments and combinations particularly pointed out in the appended claims. Features of the present invention will become more fully apparent from the following description and appended claims, or may be learned by the practice of the invention as set forth hereinafter.

When developing code, developers often have to provide solutions for various behaviors that might possibly occur during runtime even though most of the time those behaviors do not occur. Thus, a significant amount of the code is actually directed to covering scenarios that might not happen on a frequent basis. When optimizing code, developers usually want to focus their efforts on the portions of code that are called or used most frequently. Less used code snippets or portions may not need to be optimized simply due to the fact that they are not called on a frequent basis.

Dynamic profile-guided optimization (PGO) is a technique for collecting additional information about how a program is executing. That additional information can then be used by a developer or runtime system to help optimize the program. Dynamic PGO typically works by instrumenting early versions of methods (e.g., Tier0 codegen) to produce a profile data set that the just in time (JIT) component can use to optimize subsequent versions of those methods (e.g., Tier1 codegen).

When optimizing code (e.g., such as by using Dynamic PGO), developers and runtime systems will typically focus on the code that will be implemented most often during runtime. One way for determining which code or code segments are executed most frequently is through the use of a so-called “counter” or “counter based instrumentation.” With Dynamic PGO enabled, the JIT is able to add code to each Tier0 method to count how often the parts of the methods execute. Doing so allows the optimization process to then focus on the parts of the method that seem to be the most important for performance.

From the outset, Dynamic PGO has used fairly simplistic methods of counting. For example, for each distinct counter, the code the JIT adds to the program will simply increment a shared memory location. This aspect of counting is referred to herein as the “counter implementation,” and this particular way of counting is referred to as the “racing” implementation. The JIT tries to be a bit more sophisticated with counter placement, relying on an approach to try to reduce the total number of counters needed to a minimum and to place them in optimal locations.

Recently, two surprising observations about racing were discovered; the first being a measurement observation. Specifically, various tests were performed to measure the runtime costs of counting in heavily multithreaded applications by simply forcing these to run only the Tier0 instrumented code. It was observed that compared to un-instrumented Tier0 code, instrumented code was slower by a factor of 2, 3, or sometimes even more than 3. Some further experimentation revealed that cache contention (e.g., both true and false sharing) was a major contributor to the high instrumentation overhead.

The second discovery related to an accuracy measurement. Specifically, various tests were performed to look at how accurate the counts were in the Tier1 compilation. The results of the tests revealed widespread inaccuracies. The major contributor here was lost counter updates because of the unsynchronized access across many threads of execution. This was doubly bad news. For instance, not only was the system paying a significant amount at runtime for the racing counter implementation, but it was also performing poorly with regard to the actual count.

One potential fix for the lost counter updates is to stop racing and start synchronizing the updates. Various platforms can provide atomic counter updates in the form of “InterlockedIncrement” and similar, and the JIT can emit the proper machine code forms (e.g., say lock inc[mem]) that lay at the heart of these in place of the unsynchronized (e.g., inc [mem]) racing variant. This new version is referred to as the “interlocked” implementation scheme for a counter. The JIT can be updated to emit this variant and to make various measurements.

Stated differently, the “interlocked” implementation essentially requires synchronization for the counter updates. If one processor updates the counter, then all other updates to the counter are delayed until the one processor finishes its use of the counter. Interlocked largely fixes the accuracy problem that was previously observed, but it creates even more runtime overhead. So while it could serve as a component of a solution, it also has drawbacks. What is needed, therefore, is an improved technique for performing counting.

The disclosed embodiments bring about numerous benefits, advantages, and practical applications to the field of code optimization. Code optimization relies on count information that indicates how often a program or program portion is being called or executed.

In particular, the embodiments provide an improved counter that reduces the amount of computational overhead that has historically been involved in counting. This counter is structured to provide a high level of accuracy while also reducing the number of file system or memory write events that occur, thereby further reducing the overhead. The disclosed counter operates in one of two modes. Initially, the counter operates in a first or deterministic mode in which the counter updates the count each time a corresponding portion of code is executed. Later, the counter switches to a second or probabilistic mode in which the counter updates the count by a value of N, where Nis an integer larger than 1. The probability by which the counter updates the count is set to a value of 1/N. Therefore, it is not necessarily the case that the counter is triggered to update the count each time the code portion is executed. In doing so, the embodiments are able to reduce the number of times the count is updated and reduce the number of times the count is written to the file system or memory.

By performing the disclosed processes, the embodiments also facilitate the generation of improved and optimized code. That is, the embodiments are able to detect how a program operates, and then, using that information, the embodiments can facilitate or direct the generation of optimized code. For instance, using the disclosed counter, the embodiments are able to determine which code portions or snippets are being called or executed most frequently. These code portions can then be identified based on their respective counts. Special emphasis can then be directed to these frequently executed code portions so that they can be optimized. In this manner, the embodiments beneficially provide an in-process feedback loop in which frequently executed code portions can be identified and then subjected to optimization. Accordingly, these and many other benefits will now be described in more detail throughout the remaining portions of this disclosure.

Having just described some of the various benefits of the disclosed embodiments, attention will now be directed to, which illustrates an example architecturethat implements an approximately correct counter having relatively low overhead while also having scalability properties. As will be described in more detail later, the counter's accuracy can be dynamically tuned by trading off some of the scalability benefits. The counting process uses minimal runtime state, such as one random number generator (RNG) per physical thread, plus one storage location per counter. The value of the counter is readily available without any post-processing.

Architectureis shown as including a service. As used herein, the term “service” refers to an automated program that is tasked with performing different actions based on input. In some cases, servicecan be a deterministic service that operates fully given a set of inputs and without a randomization factor. In other cases, servicecan be or can include a machine learning (ML) or artificial intelligence engine.

As used herein, reference to any type of machine learning or artificial intelligence may include any type of machine learning algorithm or device, convolutional neural network(s), multilayer neural network(s), recursive neural network(s), deep neural network(s), decision tree model(s) (e.g., decision trees, random forests, and gradient boosted trees) linear regression model(s), logistic regression model(s), support vector machine(s) (“SVM”), artificial intelligence device(s), or any other type of intelligent computing system. Any amount of training data may be used (and perhaps later refined) to train the machine learning algorithm to dynamically perform the disclosed operations.

In some implementations, serviceis a cloud service operating in a cloud environment. In some implementations, serviceis a local service operating on a local device. In some implementations, serviceis a hybrid service that includes a cloud component that communicates with a local component. The machine learning engine can optionally be used to tune the threshold values, the probability values, the increment values, and the accuracy values.

In any event, serviceis tasked with maintaining and managing counter. Although reference is made to “a” counter, it should be noted how any number of counters can be used. For instance, serviceis able to access a filecomprising code. Any number of counters can be injected into the code. One, ten, fifty, hundreds, or even thousands of counters can be injected into the code.

As an example, consider the program flow diagramof. This program flow diagramis an abstracted representation of an application's code. Notice, the diagram includes a number of processes, including start, processes,,,,, and, and end. These processes may represent methods or basic blocks within methods or other program structures of interest. The embodiments are able to inject a counter into one, some, or all of these processes to determine how often they are being called.is illustrative.

now shows a count associated with each process. The count represents the number of times that process was called or executed by a calling entity (e.g., a CPU, thread, application, etc.). For instance, processwas called 20 times; processwas called 12 times; processwas called 152 times; processwas called 2 times; processwas called 25 times; and processwas called 1 time. Of course, these values are all for illustrative purposes and should not be viewed as being exemplary. If the code corresponding to this diagram is to be optimized, one can readily discern that processis the section of code that should likely be optimized first because it is called far more times than any of the other processes.

Returning to, the counters are designed to count the number of times a unit of code is executed. As generally described earlier, having information indicating the number of times a code snippet executes is useful when developers or runtime systems are tasked with optimizing code. The serviceis able to manage each counter, which generates a corresponding count. This countcan be written (e.g., write) to a storage system, such as a file system, memory, or any data storage mechanism.

Other entities (e.g., CPUs, threads, applications, optimization programs, etc.) can then access the storage systemto observe the count. That information can then be used to facilitate various optimizationoperations. In this manner, the disclosed embodiments provide a feedback loopthat allows codeto subsequently be optimized. This feedback loop may be entirely within the scope of one process, may involve multiple processes, may include saving the counter values from one or several executions of these processes into a file or other storage medium for use in optimizing future executions of said processes, and may include provisions for developer review and approval of the counter data.

It should be noted how each counter can be considered as a type of contended resourcein which multiple calling entities are attempting to use. For instance, if multiple CPUs or threads are calling the same routine, those multiple entities will all attempt to access the same counter, resulting in contention occurring between those different entities with respect to the counter. In an effort to combat or alleviate this contention, the disclosed embodiments provide an improved type of counting mechanism.

In particular, the disclosed counterincludes a deterministic countercomponent and a probabilistic countercomponent. When the deterministic countercomponent of counteris active, countercan be said to be operating in a first modeA. When the probabilistic countercomponent of counteris active, countercan be said to be operating in a second modeA.

As will be described in much more detail later, the counterinitially operates in a mode (i.e. the first modeA) where the countis incremented or updated each time the corresponding portion of code is called or executed (i.e. in a deterministic manner). Once a first thresholdfor that countis reached, however, the embodiments switch the counterfrom using the deterministic countercomponent (i.e. the first modeA) to using the probabilistic counter(i.e. the second modeA). Subsequently, that probabilistic countercomponent can be further modified based on the countreaching one or more subsequent threshold(s). Further details on the probabilistic attributes of the counter will be provided shortly.

provides a simplified example.shows a graph of the counter incrementation or update process used by serviceofand in particular the counter. Each rectangle represents an instance in which an entity (e.g., CPU threads) makes a call to the counter.

Initially, the counter operates in a deterministic mode using its deterministic counter. While in this mode, each call to a code portion corresponding to the counter results in the count being incremented by a value of 1. Thus, the counter, when operating in the first mode, can be considered a deterministic counter. On the other hand, when the counter operates in the second mode, the counter can be considered a probabilistic counter.

For instance, the graph shows a number of points labeled “A”, “B”, “C”, and “D” on the counter update request axis (i.e. the x-axis). This axis corresponds to the number of times an entity is calling for the counter to be updated. The graph also includes an axis labeled “count” (i.e. the y-axis); this axis corresponds to the actual value of the count.

At point “A,” a first entity calls the code portion, resulting in a +1 increment to the count, as shown by the single step up in the graph at point “A.” Between points “A” and “D,” multiple entities called the code portion, and the counter was updated each time that code portion was called by an entity, as shown by the stepwise increments at points “1”, “2”, “3”, and then up to point “6” on the count axis. Up to this point, the counter has been operating in the first mode. It should be noted thatis showing a scenario in which the thresholds for the counter are not changing by powers of 2.is provided for simplified illustrative purposes. Most embodiments, though not necessarily all, update the thresholds using powers of two. In a few embodiments, the thresholds can be updated using other techniques.

Point “D6” is where a first threshold(e.g., thresholdfrom) is reached by the count. At this threshold, the counter switches from using its deterministic countercomponent to using its probabilistic countercomponent. In other words, the counter switches from operating in the first mode to operating in the second mode. Different count values can be used as the first threshold. As some examples, the first threshold can be set to a count value of over 8,000 counts. In some cases, the first threshold is set to a count value that is over 8,000 counts and less than about 10,000 counts. In some cases, the first threshold is set to a count value that is over 5,000 counts. In some cases, the first threshold is set to a count value that is less than about 15,000 counts.

Prior to point “D6,” the counter had 100% accuracy, but the overhead was increasing as the number of calls to the code portion increased, resulting in the counter performing many file system writes. With the mode transition at point “D6,” the accuracy of the counter is now reduced, but the amount of overhead is also now reduced. Also, the probability of actually triggering an event where the count is increased is reduced. In this example, the counter is now incremented by a value of two, and the probability that the counter will actually be triggered to increment the count is reduced by two. In other words, the probability of the counter actually incrementing the count is now ½. The counter continues to count.

As shown by some of the counts in, the approximate count can, in some instances, exceed the actual count. For example, at point “E8” the count was incremented by two instead of a non-increment. As a result, the rectangle at point E rises above the actual count line. The next operation might not increment the count.

At point “G14,” the count reaches a new threshold. The counter stays in the second, probabilistic mode, but the counter now shifts its increment processes, its accuracy, its triggering probability, and its overhead. In this example scenario, the counter will now increment by a value of four each time the counter is triggered, but the probability that the counter will actually be triggered is now ¼. Again, the accuracy is reduced, but the overhead is also reduced.

Counts gathered by Dynamic PGO have a wide dynamic range, and even when the accuracy is being checked, the points of interest are usually in relatively self-consistent counts. For example, if a simple if/then/else construct is provided, it is expected that the count for the if block will equal the sum of the counts for the then and else portions. But there are some variances that are to be considered. As some examples, the method might throw an exception, or the thread might be asynchronously stopped.

As a result, it is typically the case that a reduced level or buffer level of accuracy is acceptable. For instance, most testing scenarios are satisfied if the profile flows are accurate at each conservation point at around 1%, and a diminishing return on having results more accurate than this. Given that it is already the case that the testing platforms tolerate some inaccuracy, the embodiments leverage that to produce an “intentionally approximate counting” scheme (i.e. the scheme just described with respect to) that is nearly as accurate as the interlocked implementation but with less overhead. The intentionally approximate counting scheme (aka “scaling” scheme) is significantly more accurate than the racing scheme.

The amount by which the accuracy is reduced is maintained to be within a threshold target. That is, the variance or delta between a true count and the disclosed count is kept to be within a margin of about 1-2%.generally (and simply) shows a representation of what the actual countwould look like if the counter were triggered for each call to the corresponding code portion. That representation is shown in the form of the line labeled actual count.

also shows a step-wise count representation (e.g., the steps defined by the tops of the combination of rectangles). Notice, this stepwise count representation is tracking the actual countwithin a level of error (which is often set to be about 1-2%). At some instances, the actual countand the stepwise count representation match exactly while in other instances there is a delta. On average, the embodiments are designed so that the difference in accuracy between the actual countand the stepwise count representation is within about 2% of each other. Further details on these aspects will be provided momentarily.

As mentioned, the embodiments operate in multiple different modes. Until a first count threshold is reached by the counter, the embodiments operate in the first mode and increment the count once every time an entity calls the code portion in which the counter resides. After the first count threshold is reached, the embodiments transition the counter from operating in that first mode to operating in the second, probabilistic mode. In this second mode, the embodiments vary the amount by which the counter increments the count.generally provided an illustration, but an additional example will be helpful.

Suppose the probabilistic mode is likened initially to a two-sided coin. Every time the coin lands on its head, the counter will increment the count by a value of two. Every time the coin lands on its tail, the counter will do nothing with regard to incrementing the count. The probability of the coin landing on its head is ½. Consequently, the probability that the counter will increment the count is ½. The counter will operate in this manner until a second count threshold is reached.

Once the second threshold is reached, then the probabilistic mode can be likened to a four-sided die. Each time the die lands on a “1”, the counter will increment the count by a value of four. Each time the die lands on a value that is not “1,” the counter will do nothing with regard to incrementing the count. The probability of the four-sided die landing on the “1” is ¼. Consequently, the probability that the counter will increment the count is ¼.

The counter will continue to operate in this manner until another a third count threshold is reached, then the probabilistic mode can be likened to an eight-sided die. Each time the die lands on a “1,” the counter will increment the count by a value of 8. Each time the die lands on a value that is not “1,” the counter will do nothing. The probability of the eight-sided die landing on the “1” is ⅛. Consequently, the probability that the counter will increment the count is ⅛.

This process can repeat itself any number of times. In this manner, the embodiments are able to repeatedly scale up the value by which the count is incremented while also scaling down the rate by which the count is actually incremented.

As mentioned before, the counter can be viewed as being a contended resource. It may be the case that a large number of processors are trying to trigger this counter, and all of those processors are competing with one another for access to the counter. The disclosed embodiments beneficially operate to reduce this contention. That is, as the count value progressively gets higher, in order to manage contention, the embodiments update the count less and less often.

Patent Metadata

Filing Date

Unknown

Publication Date

April 14, 2026

Inventors

Unknown

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search