Method and Apparatus for Achieving Fair Cache Sharing on Multi-Threaded Chip Multiprocessors

PublishedNovember 29, 2011

Assigneenot available in USPTO data we have

Technical Abstract

Patent Claims

20 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A method for achieving fair cache memory sharing in a processor having a plurality of cores, each of which has a thread running thereon, and a cache memory that is shared by the threads, the method comprising: allotting a respective CPU time quantum to each of the threads, wherein the CPU time quantum allotted to each thread is the share of CPU time allotted for the thread to run on the processor; computing for a given one of the threads a fair CPU latency value as a product of a value representing the number of thread cycles per instruction (CPI) which the given thread would experience in the case that the cache memory was equally shared by the threads and the CPU time quantum that was allotted to the given thread; computing an actual CPU latency value as a product of the actual number of thread cycles per instruction which the given thread is currently experiencing and the CPU time quantum that was allotted to the given thread; and performing an adjustment to the CPU time quantum allotted to the given thread, wherein the adjustment causes the actual CPU latency value of the given thread to become equal to the computed fair CPU latency value.

2. The method of claim 1 , wherein said performing an adjustment is done by an operating system scheduler.

3. The method of claim 1 , wherein said computing a fair CPU latency value comprises: during a reconnaissance phase, gathering information regarding the given thread via conventional hardware counters; and using an analytical model to estimate a fair cache miss rate that the given thread would experience if the cache memory was equally shared.

4. The method of claim 3 , wherein said computing a fair CPU latency value further comprises: during a calibration phase subsequent to the reconnaissance phase, using runtime statistics and the estimated fair cache miss rate value to determine a fair CPI value as the CPI value that the thread would experience if the cache memory was equally shared.

5. The method of claim 4 , wherein said computing an actual CPU latency value comprises: during the calibration phase, measuring the actual CPI value of the given thread and comparing the determined fair CPI value to the actual CPI value for the given thread.

6. The method of claim 3 , wherein said estimating a fair cache miss rate further comprises, during the reconnaissance phase, measuring cache memory miss rates for the given thread as the given thread operates with other concurrently running threads, and deriving a linear equation to estimate the fair cache miss rate as the cache miss rate that the given thread would experience if the cache memory was equally shared.

7. The method of claim 4 , further comprising repeating the calibration phase periodically during runtime.

8. The method of claim 7 , further comprising repeating the reconnaissance phase periodically during runtime, wherein the reconnaissance phase is repeated with a period that is longer than the period at which the calibration phase is repeated.

9. The method of claim 2 , further comprising: after adjusting the CPU time quantum of the given thread during a given time slot, selecting another thread that is running in the given time slot; and the operating system scheduler adjusting a CPU time quantum of the other thread to compensate for the adjustment made to the CPU time quantum of the given thread.

10. Apparatus for achieving fair cache memory sharing in a processor having a plurality of cores, each of which has a thread running thereon, and a cache memory that is shared by the threads, the apparatus comprising: a mechanism that allots a respective CPU time quantum to each of the threads, wherein the CPU time quantum allotted to each thread is the share of CPU time allotted for the thread to run on the processor; a mechanism that computes for a given one of the threads a fair CPU latency value as a product of a value representing the number of thread cycles per instruction (CPI) which the given thread would experience in the case that the cache memory was equally shared by the threads and the CPU time quantum that was allotted to the given thread; a mechanism that computes an actual CPU latency value as a product of the actual number of thread cycles per instruction which the given thread is currently experiencing and the CPU time quantum that was allotted to the given thread; and a mechanism that performs an adjustment to the CPU time quantum allotted to the given thread, wherein the adjustment causes the actual CPU latency value of the given thread to become equal to the computed fair CPU latency value.

11. The apparatus of claim 10 , wherein said performing an adjustment is done by an operating system scheduler.

12. The apparatus of claim 10 , wherein the mechanism that computes the fair CPU latency value comprises: a mechanism that, during a reconnaissance phase, gathers information regarding the given thread via conventional hardware counters; and a mechanism that uses an analytical model to estimate a fair cache miss rate that the given thread would experience if the cache memory was equally shared.

13. The apparatus of claim 12 , wherein the mechanism that computes the fair CPU latency value further comprises: a mechanism that, during a calibration phase subsequent to the reconnaissance phase, uses runtime statistics and the estimated fair cache miss rate value to determine a fair CPI value as the CPI value that the thread would experience if the cache memory was equally shared.

14. The apparatus of claim 13 , wherein the mechanism that computes an actual CPU latency value comprises: a mechanism that, during the calibration phase, measures the actual CPI value of the given thread and compares the determined fair CPI value to the actual CPI value for the given thread.

15. The apparatus of claim 12 , wherein the mechanism that estimates the fair cache miss rate further comprises a mechanism that, during the reconnaissance phase, measures cache memory miss rates for the given thread as the given thread operates with other concurrently running threads, and derives a linear equation to estimate the fair cache miss rate that the given thread would experience if the cache memory was equally shared.

16. The apparatus of claim 13 , further comprising a mechanism that causes the calibration phase to be repeated periodically during runtime.

17. The apparatus of claim 16 , further comprising a mechanism that causes the reconnaissance phase to be repeated periodically during runtime, wherein the reconnaissance phase is repeated with a period that is longer than the period at which the calibration phase is repeated.

18. The apparatus of claim 10 , further comprising: a mechanism that, after the mechanism for adjusting the CPU time quantum of the given thread adjusts the CPU time quantum of the given thread during a given time slot, selects another thread that is running in the given time slot; and a mechanism that adjusts a CPU time quantum of the other thread to compensate for the adjustment made to the CPU time quantum of the given thread.

19. Apparatus for achieving fair cache memory sharing in a processor having a plurality of cores, each of which has a thread running thereon, and a cache memory that is shared by the threads, the apparatus comprising: means for allotting a respective CPU time quantum to each of the threads, wherein the CPU time quantum allotted to each thread is the share of CPU time allotted for the thread to run on the processor; means for computing for a given one of the threads a fair CPU latency value as a product of a value representing the number of thread cycles per instruction (CPI) which the given thread would experience in the case that the cache memory was equally shared by the threads and the CPU time quantum that was allotted to the given thread; means for computing an actual CPU latency value as a product of the actual number of thread cycles per instruction which the given thread is currently experiencing and the CPU time quantum that was allotted to the given thread; and means for performing an adjustment to the CPU time quantum allotted to the given thread, wherein the adjustment causes the actual CPU latency value of the given thread to become equal to the computed fair CPU latency value.

20. The apparatus of claim 19 , wherein the means for performing an adjustment comprises an operating system scheduler that adjusts the CPU time quantum allotted to the given thread.

Patent Metadata

Filing Date

Unknown

Publication Date

November 29, 2011

Inventors

Alexandra Fedorova

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search