Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
1. Apparatus for performing data processing in a single program multiple data fashion on a target data set, the apparatus comprising: at least one graphics processor configured to execute multiple threads to perform the data processing, wherein the target data set comprises an image frame comprising a plurality of pixels, the threads corresponding to respective pixels within the image frame, the at least one graphics processor is capable of processing a maximum number of threads in parallel, and when a number of threads to be processed is greater than the maximum number, the at least one graphics processor is configured to process said number of threads in a plurality of iterations for processing successive groups of threads; and a digital data storage configured to store information defining a plurality of pixel-processing thread schedule configurations, each pixel-processing thread schedule configuration defining an order in which the pixels of the image frame are mapped to the successive groups of threads to be processed in said plurality of iterations by the at least one processor, wherein a first pixel-processing thread schedule configuration and a second pixel-processing thread schedule configuration of the plurality of pixel-processing thread schedule configurations each comprise a respectively different order in which the pixels of the image frame are to be mapped to said successive groups of threads, wherein the at least one graphics processor is further configured to, in response to pixel-processing a thread schedule selection signal, (a) select a pixel-processing thread schedule configuration of the plurality of pixel-processing thread schedule configurations, and (b) execute the multiple threads corresponding to the pixels of the image frame in a selected time-based ordering of the multiple threads as defined by the selected pixel-processing thread schedule configuration, and wherein the at least one graphics processor is further configured to gather performance data relating to the performed data processing in relation to each pixel-processing thread schedule of the plurality of pixel-processing thread schedule configurations and to generate the pixel-processing thread schedule selection signal in dependence on the gathered performance data.
A data processing system includes a graphics processor designed to execute multiple threads for processing pixels of an image frame. When the total number of threads (pixels) exceeds the processor's parallel capacity, it processes them in successive groups across multiple iterations. The system has digital storage containing various pixel-processing thread schedule configurations, each defining a unique order for mapping image pixels to these successive groups. The graphics processor selects one of these configurations based on a selection signal and executes the threads (pixels) in the time-based order defined by the selected configuration. It also gathers performance data for each available schedule configuration and generates the selection signal based on this collected data.
2. The apparatus as claimed in claim 1 , wherein the at least one graphics processor is further configured to control a performance analysis process, the performance analysis process comprising: selecting a minority subset of the multiple threads which are to be executed; executing a first subset of the minority subset of the multiple threads in a first selected order defined by a first pixel-processing thread schedule configuration; changing the thread schedule selection signal to cause a next subset of the minority subset of the multiple threads to be executed in a next selected order defined by a next pixel-processing thread schedule configuration; and repeating the changing step until all threads of the minority subset of threads have been executed, wherein the thread selection signal is set to cause the at least one graphics processor to execute a majority remaining subset of the multiple threads which are to be executed to perform the data processing on the target data set in an order defined by a selected thread schedule tested in the performance analysis process.
A data processing system including a graphics processor designed to execute multiple threads for processing pixels of an image frame, which processes threads in successive groups across multiple iterations if the thread count exceeds parallel capacity. The system has digital storage for various pixel-processing thread schedule configurations, each defining a unique order for mapping pixels to these groups. The graphics processor selects one configuration based on a signal, executes threads according to it, and generates the signal from gathered performance data. This system further controls a performance analysis process: it selects a small 'minority subset' of threads, executes a first part of this subset using an initial schedule, then repeatedly changes the schedule selection signal to execute subsequent parts of the minority subset with different schedules. Once all threads in the minority subset are executed, the system uses the best-performing schedule from this analysis to process the remaining 'majority subset' of threads for the main data processing.
3. The apparatus as claimed in claim 2 , wherein the first subset and next subset each correspond to a workgroup of threads, wherein a workgroup of threads is a selected subset of the multiple threads defined by a programmer of the apparatus or set by default.
A data processing system including a graphics processor designed to execute multiple threads for processing pixels of an image frame, which processes threads in successive groups across multiple iterations if the thread count exceeds parallel capacity. The system has digital storage for various pixel-processing thread schedule configurations, each defining a unique order for mapping pixels to these groups. The graphics processor selects one configuration based on a signal, executes threads according to it, and generates the signal from gathered performance data. This system further controls a performance analysis process: it selects a small 'minority subset' of threads, executes an initial part of this subset using an initial schedule, then repeatedly changes the schedule selection signal to execute subsequent parts of the minority subset with different schedules. Once all threads in the minority subset are executed, the system uses the best-performing schedule from this analysis to process the remaining 'majority subset' of threads. In this process, the initial and subsequent parts of the minority subset are specifically defined as 'workgroups' of threads, which can be set by a programmer or by default.
4. The apparatus as claimed in claim 2 , wherein at least some of the plurality of pixel-processing thread schedule configurations are further configured in dependence on a control parameter and the performance analysis process further comprises gathering performance data relating to the data processing performed for a plurality of values of the control parameter.
A data processing system including a graphics processor designed to execute multiple threads for processing pixels of an image frame, which processes threads in successive groups across multiple iterations if the thread count exceeds parallel capacity. The system has digital storage for various pixel-processing thread schedule configurations, each defining a unique order for mapping pixels to these groups. The graphics processor selects one configuration based on a signal, executes threads according to it, and generates the signal from gathered performance data. This system further controls a performance analysis process: it selects a small 'minority subset' of threads, executes an initial part of this subset using an initial schedule, then repeatedly changes the schedule selection signal to execute subsequent parts of the minority subset with different schedules. Once all threads in the minority subset are executed, the system uses the best-performing schedule from this analysis to process the remaining 'majority subset' of threads. Additionally, some of the pixel-processing thread schedule configurations are defined based on a modifiable 'control parameter,' and the performance analysis process gathers performance data for multiple different values of this control parameter to optimize the schedule selection.
5. The apparatus as claimed in claim 4 , wherein the plurality of values of the control parameter corresponds to a geometric progression of the control parameter.
A data processing system including a graphics processor designed to execute multiple threads for processing pixels of an image frame, which processes threads in successive groups across multiple iterations if the thread count exceeds parallel capacity. The system has digital storage for various pixel-processing thread schedule configurations, each defining a unique order for mapping pixels to these groups. The graphics processor selects one configuration based on a signal, executes threads according to it, and generates the signal from gathered performance data. This system further controls a performance analysis process: it selects a small 'minority subset' of threads, executes an initial part of this subset using an initial schedule, then repeatedly changes the schedule selection signal to execute subsequent parts of the minority subset with different schedules. Once all threads in the minority subset are executed, the system uses the best-performing schedule from this analysis to process the remaining 'majority subset' of threads. Some thread schedule configurations are defined based on a 'control parameter,' and the performance analysis gathers data for multiple settings of this parameter. Specifically, the multiple values of this control parameter are chosen to follow a geometric progression.
6. The apparatus as claimed in claim 5 , wherein the geometric progression is powers of two of the control parameter.
A data processing system including a graphics processor designed to execute multiple threads for processing pixels of an image frame, which processes threads in successive groups across multiple iterations if the thread count exceeds parallel capacity. The system has digital storage for various pixel-processing thread schedule configurations, each defining a unique order for mapping pixels to these groups. The graphics processor selects one configuration based on a signal, executes threads according to it, and generates the signal from gathered performance data. This system further controls a performance analysis process: it selects a small 'minority subset' of threads, executes an initial part of this subset using an initial schedule, then repeatedly changes the schedule selection signal to execute subsequent parts of the minority subset with different schedules. Once all threads in the minority subset are executed, the system uses the best-performing schedule from this analysis to process the remaining 'majority subset' of threads. Some thread schedule configurations are defined based on a 'control parameter,' and the performance analysis gathers data for multiple settings of this parameter, which follow a geometric progression. Specifically, these control parameter values are powers of two.
7. The apparatus as claimed in claim 4 , wherein the control parameter is a stride value, the stride value determining a number of threads which are skipped to find a next thread in the selected order, the next thread in the selected order being determined subject to a modulo of a total number of the multiple threads.
A data processing system including a graphics processor designed to execute multiple threads for processing pixels of an image frame, which processes threads in successive groups across multiple iterations if the thread count exceeds parallel capacity. The system has digital storage for various pixel-processing thread schedule configurations, each defining a unique order for mapping pixels to these groups. The graphics processor selects one configuration based on a signal, executes threads according to it, and generates the signal from gathered performance data. This system further controls a performance analysis process: it selects a small 'minority subset' of threads, executes an initial part of this subset using an initial schedule, then repeatedly changes the schedule selection signal to execute subsequent parts of the minority subset with different schedules. Once all threads in the minority subset are executed, the system uses the best-performing schedule from this analysis to process the remaining 'majority subset' of threads. Some thread schedule configurations are defined based on a 'control parameter,' and the performance analysis gathers data for multiple settings of this parameter. Specifically, this control parameter is a 'stride value,' which dictates how many threads are skipped to find the next thread in the selected processing order, subject to a modulo of the total number of threads.
8. The apparatus as claimed in claim 4 , wherein the control parameter is at least one tiling dimension value, the tiling dimension value determining a dimension of tiles within an at least two-dimensional coordinate space of the threads, and wherein the selected order causes the at least one graphics processor to execute the multiple threads on a tile-by-tile basis.
A data processing system including a graphics processor designed to execute multiple threads for processing pixels of an image frame, which processes threads in successive groups across multiple iterations if the thread count exceeds parallel capacity. The system has digital storage for various pixel-processing thread schedule configurations, each defining a unique order for mapping pixels to these groups. The graphics processor selects one configuration based on a signal, executes threads according to it, and generates the signal from gathered performance data. This system further controls a performance analysis process: it selects a small 'minority subset' of threads, executes an initial part of this subset using an initial schedule, then repeatedly changes the schedule selection signal to execute subsequent parts of the minority subset with different schedules. Once all threads in the minority subset are executed, the system uses the best-performing schedule from this analysis to process the remaining 'majority subset' of threads. Some thread schedule configurations are defined based on a 'control parameter,' and the performance analysis gathers data for multiple settings of this parameter. Specifically, this control parameter is a 'tiling dimension value,' which defines the size of 'tiles' in a two-dimensional or higher coordinate space of the threads, causing the graphics processor to execute threads on a tile-by-tile basis.
9. The apparatus as claimed in claim 2 , wherein the at least one graphics processor is configured to repeat the performance analysis process at predetermined intervals.
A data processing system including a graphics processor designed to execute multiple threads for processing pixels of an image frame, which processes threads in successive groups across multiple iterations if the thread count exceeds parallel capacity. The system has digital storage for various pixel-processing thread schedule configurations, each defining a unique order for mapping pixels to these groups. The graphics processor selects one configuration based on a signal, executes threads according to it, and generates the signal from gathered performance data. This system further controls a performance analysis process: it selects a small 'minority subset' of threads, executes an initial part of this subset using an initial schedule, then repeatedly changes the schedule selection signal to execute subsequent parts of the minority subset with different schedules. Once all threads in the minority subset are executed, the system uses the best-performing schedule from this analysis to process the remaining 'majority subset' of threads. Additionally, the graphics processor is configured to repeat this entire performance analysis process at predetermined intervals.
10. The apparatus as claimed in claim 2 , wherein the at least one graphics processor is configured to select the minority subset of the threads to start at a predetermined offset from a beginning of all the multiple threads.
A data processing system including a graphics processor designed to execute multiple threads for processing pixels of an image frame, which processes threads in successive groups across multiple iterations if the thread count exceeds parallel capacity. The system has digital storage for various pixel-processing thread schedule configurations, each defining a unique order for mapping pixels to these groups. The graphics processor selects one configuration based on a signal, executes threads according to it, and generates the signal from gathered performance data. This system further controls a performance analysis process: it selects a small 'minority subset' of threads, executes an initial part of this subset using an initial schedule, then repeatedly changes the schedule selection signal to execute subsequent parts of the minority subset with different schedules. Once all threads in the minority subset are executed, the system uses the best-performing schedule from this analysis to process the remaining 'majority subset' of threads. The graphics processor is configured to select this minority subset of threads to start at a predetermined offset from the beginning of all available threads.
11. The apparatus as claimed in claim 1 , wherein the at least one graphics processor is configured to measure a performance versus time taken metric for the data processing as the performance data.
A data processing system includes a graphics processor designed to execute multiple threads for processing pixels of an image frame. When the total number of threads (pixels) exceeds the processor's parallel capacity, it processes them in successive groups across multiple iterations. The system has digital storage containing various pixel-processing thread schedule configurations, each defining a unique order for mapping image pixels to these successive groups. The graphics processor selects one of these configurations based on a selection signal and executes the threads (pixels) in the time-based order defined by the selected configuration. It also gathers performance data, specifically measuring a 'performance versus time taken' metric for the data processing, for each available schedule configuration and generates the selection signal based on this collected data.
12. The apparatus as claimed in claim 1 , wherein the at least one graphics processor is further configured to measure an energy use metric for the data processing as the performance data.
A data processing system includes a graphics processor designed to execute multiple threads for processing pixels of an image frame. When the total number of threads (pixels) exceeds the processor's parallel capacity, it processes them in successive groups across multiple iterations. The system has digital storage containing various pixel-processing thread schedule configurations, each defining a unique order for mapping image pixels to these successive groups. The graphics processor selects one of these configurations based on a selection signal and executes the threads (pixels) in the time-based order defined by the selected configuration. It also gathers performance data, specifically measuring an 'energy use' metric for the data processing, for each available schedule configuration and generates the selection signal based on this collected data.
13. The apparatus as claimed in claim 1 , wherein the at least one graphics processor comprises at least one event counter configured to count occurrences of a predetermined event during the data processing as the performance data.
A data processing system includes a graphics processor designed to execute multiple threads for processing pixels of an image frame. When the total number of threads (pixels) exceeds the processor's parallel capacity, it processes them in successive groups across multiple iterations. The system has digital storage containing various pixel-processing thread schedule configurations, each defining a unique order for mapping image pixels to these successive groups. The graphics processor selects one of these configurations based on a selection signal and executes the threads (pixels) in the time-based order defined by the selected configuration. It also gathers performance data for each available schedule configuration and generates the selection signal based on this collected data. The graphics processor includes at least one event counter configured to count occurrences of a predetermined event during the data processing, and these counts are used as part of the performance data.
14. The apparatus as claimed in claim 13 , wherein the predetermined event is a cache miss in a cache which forms part of the apparatus.
A data processing system includes a graphics processor designed to execute multiple threads for processing pixels of an image frame. When the total number of threads (pixels) exceeds the processor's parallel capacity, it processes them in successive groups across multiple iterations. The system has digital storage containing various pixel-processing thread schedule configurations, each defining a unique order for mapping image pixels to these successive groups. The graphics processor selects one of these configurations based on a selection signal and executes the threads (pixels) in the time-based order defined by the selected configuration. It also gathers performance data for each available schedule configuration and generates the selection signal based on this collected data. The graphics processor includes at least one event counter configured to count occurrences of a predetermined event during the data processing, with these counts used as performance data. Specifically, this predetermined event is a cache miss within a cache memory that is part of the system.
15. The apparatus as claimed in claim 14 , wherein the at least one graphics processor comprises multiple processor cores and the cache is shared by the multiple processor cores.
A data processing system includes a graphics processor designed to execute multiple threads for processing pixels of an image frame. When the total number of threads (pixels) exceeds the processor's parallel capacity, it processes them in successive groups across multiple iterations. The system has digital storage containing various pixel-processing thread schedule configurations, each defining a unique order for mapping image pixels to these successive groups. The graphics processor selects one of these configurations based on a selection signal and executes the threads (pixels) in the time-based order defined by the selected configuration. It also gathers performance data for each available schedule configuration and generates the selection signal based on this collected data. The graphics processor includes at least one event counter configured to count occurrences of a cache miss in a cache memory that is part of the system. In this system, the graphics processor itself comprises multiple processor cores, and the cache where cache misses are counted is shared by these multiple processor cores.
16. The apparatus as claimed in claim 1 , further configured to receive the thread schedule selection signal from an external source.
A data processing system includes a graphics processor designed to execute multiple threads for processing pixels of an image frame. When the total number of threads (pixels) exceeds the processor's parallel capacity, it processes them in successive groups across multiple iterations. The system has digital storage containing various pixel-processing thread schedule configurations, each defining a unique order for mapping image pixels to these successive groups. The graphics processor selects one of these configurations based on a selection signal and executes the threads (pixels) in the time-based order defined by the selected configuration. It also gathers performance data for each available schedule configuration and generates the selection signal based on this collected data. Additionally, the system is configured to receive this thread schedule selection signal from an external source, rather than solely generating it internally.
17. The apparatus as claimed in claim 16 , wherein the apparatus is configured to receive a set of instructions defining the data processing to be performed in the single program multiple data fashion on the target data set, wherein the thread schedule selection signal is generated by the apparatus in dependence on a thread schedule selection value definition associated with the set of instructions.
A data processing system includes a graphics processor designed to execute multiple threads for processing pixels of an image frame. When the total number of threads (pixels) exceeds the processor's parallel capacity, it processes them in successive groups across multiple iterations. The system has digital storage containing various pixel-processing thread schedule configurations, each defining a unique order for mapping image pixels to these successive groups. The graphics processor selects one of these configurations based on a selection signal and executes the threads (pixels) in the time-based order defined by the selected configuration. It also gathers performance data for each available schedule configuration and generates the selection signal based on this collected data. The system is configured to receive the thread schedule selection signal from an external source. Furthermore, when receiving the external signal, the system also receives a set of instructions for the data processing, and the external thread schedule selection signal is generated based on a specific thread schedule selection value defined in association with these instructions.
18. A method of performing data processing in a single program multiple data fashion on a target data set, the method comprising the steps of: executing multiple threads to perform the data processing using at least one graphics processor, wherein the target data set comprises an image frame comprising a plurality of pixels, the threads corresponding to respective pixels within the image frame, the at least one graphics processor is capable of processing a maximum number of threads in parallel, and when a number of threads to be processed is greater than the maximum number, the at least one graphics processor is configured to process said number of threads in a plurality of iterations for processing successive groups of threads; and storing information defining a plurality of pixel-processing thread schedule configurations, each pixel-processing thread schedule configuration defining an order in which the pixels of the image frame are mapped to the successive groups of threads to be processed in said plurality of iterations in the executing step, wherein a first pixel-processing thread schedule configuration and a second pixel-processing thread schedule configuration of the plurality of pixel-processing thread schedule configurations each comprise a respectively different order in which the pixels of the image frame are to be mapped to said successive groups of threads; in response to a thread schedule selection signal, (a) selecting a pixel-processing thread schedule configuration of the plurality of pixel-processing thread schedule configurations, and (b) controlling the execution of the multiple threads corresponding to the pixels of the image frame in a selected time-based ordering of the multiple threads as defined by the selected pixel processing thread schedule configuration; and gathering performance data relating to the data processing in relation to each pixel-processing thread schedule configuration of the plurality of pixel-processing thread schedule configurations performed by executing and selecting the thread schedule in dependence on the gathered performance data.
A method for processing data in a single program multiple data fashion on an image frame, where each pixel corresponds to a thread. The method involves executing multiple threads using a graphics processor, which processes threads in successive groups across multiple iterations when the thread count exceeds parallel capacity. Information defining a plurality of pixel-processing thread schedule configurations is stored, with each configuration specifying a unique order for mapping image pixels to these successive groups. In response to a thread schedule selection signal, the method includes selecting one of these configurations and controlling the execution of the threads according to its defined time-based ordering. Performance data is gathered for the data processing performed in relation to each pixel-processing thread schedule configuration, and the selection signal (or the chosen schedule) is then determined based on this gathered performance data.
19. Apparatus for performing data processing in a single program multiple data fashion on a target data set comprising: means for executing multiple threads to perform the data processing, wherein the target data set comprises an image frame comprising a plurality of pixels, the threads corresponding to respective pixels within the image frame, the means for executing is capable of processing a maximum number of threads in parallel, and when a number of threads to be processed is greater than the maximum number, the means for executing is configured to process said number of threads in a plurality of iterations for processing successive groups of threads; and means for storing information defining a plurality of pixel-processing thread schedule configurations, each pixel-processing thread schedule configuration defining an order in which the pixels of the image frame are mapped to the successive groups of threads to be processed in said plurality of iterations by the means for executing multiple threads, wherein a first pixel-processing thread schedule configuration and a second pixel-processing thread schedule configuration of the plurality of pixel-processing thread schedule configurations each comprise a respectively different order in which the pixels of the image frame are to be mapped to said successive groups of threads; means for, in response to a pixel-processing thread schedule selection signal, (a) selecting a pixel-processing thread schedule configuration of the plurality of pixel-processing thread schedule configurations, and (b) controlling the execution of the multiple threads corresponding to the pixels of the image frame in a selected time-based ordering of the multiple threads as defined by the selected pixel-processing thread schedule configuration; and means for gathering performance data relating to the data processing performed by the means for executing multiple threads in relation to each pixel-processing thread schedule configuration of the plurality of pixel-processing thread schedule configurations and for generating the pixel-processing thread schedule selection signal in dependence on the gathered performance data.
A data processing apparatus for single program multiple data processing on an image frame, where each pixel corresponds to a thread. The apparatus comprises: means for executing multiple threads using a graphics processor, capable of processing threads in successive groups across multiple iterations when the thread count exceeds parallel capacity. It includes means for storing information defining various pixel-processing thread schedule configurations, each uniquely ordering how image pixels are mapped to these successive groups. The apparatus also features means for selecting a pixel-processing thread schedule configuration and controlling thread execution according to its defined time-based order, in response to a selection signal. Finally, it includes means for gathering performance data related to the data processing for each schedule configuration, and for generating the pixel-processing thread schedule selection signal based on this gathered performance data.
Unknown
August 4, 2020
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.