A relationship table stores a plurality of producer-consumer relationships defining associations between producers and consumers, wherein a source operand of a consumer is generated in dependence on producer data resulting from the producer. A candidate producer is selected and, based on subsequent candidate consumers, a candidate producer-consumer relationship is established and stored in the relationship table. A dependency marker is set in association with the producer data and a set dependency marker is propagated so as to be associated with result data values generated in data processing operations in dependence on the producer data. Candidate consumers are selected when they have at least one source operand that has a set dependency marker.
Legal claims defining the scope of protection, as filed with the USPTO.
. A data processing apparatus comprising:
. The data processing apparatus of,
. The data processing apparatus of,
. The data processing apparatus of,
. The data processing apparatus of,
. The data processing apparatus of,
. The data processing apparatus of,
. The data processing apparatus of, wherein the training circuitry is configured to select as the candidate producer instruction a predetermined type of instruction and is configured to select the observed instruction for which at least one source operand has the dependency marker that is set from a set of observed instructions that follows the candidate producer instruction.
. The data processing apparatus of,
. The data processing apparatus of, further comprising:
. The data processing apparatus of, wherein a confidence value is associated with each producer-consumer relationship stored in the relationship table,
. The data processing apparatus of,
. The data processing apparatus of, further comprising a store buffer configured to hold queued data values generated in the data processing operations before the queued data values are passed to a memory system,
. The data processing apparatus of,
. The data processing apparatus of,
. A system comprising:
. A chip-containing product comprising the system ofassembled on a further board with at least one other product component.
. A method of data processing comprising:
. The data processing apparatus of, wherein the processing circuitry comprises a 6×128 bit vector datapath.
Complete technical specification and implementation details from the patent document.
The present disclosure relates to data processing.
A data processing apparatus may perform data processing operations by executing instructions. Amongst the many instructions that the data processing apparatus executes there may be a producer-consumer relationship between a producer instruction and a consumer instruction, whereby a consumer instruction source operand is generated in dependence on producer data resulting from the producer instruction.
In one example embodiment described herein there is a data processing apparatus comprising:
In one example embodiment described herein there is a system comprising:
In one example embodiment described herein there is a chip-containing product comprising the system of the above-mentioned example embodiment assembled on a further board with at least one other product component.
In one example embodiment described herein there is a method of data processing comprising:
Before discussing the embodiments with reference to the accompanying figures, the following description of embodiments is provided.
In accordance with one example configuration there is provided a data processing apparatus comprising:
The data processing apparatus is configured to perform data processing operations by executing instructions. Amongst the many instructions that the data processing apparatus executes there may be a producer-consumer relationship between a producer instruction and a consumer instruction, whereby a consumer instruction source operand is generated in dependence on producer data resulting from the producer instruction. It may be useful to identify this relationship for a number of reasons, often related to improving the data processing efficiency of the data processing apparatus. Where the consumer instruction source operand depends on producer data resulting from the producer instruction, it will be understood that execution of that consumer instruction cannot begin until that producer data is available. In order to identify producer-consumer relationships, the data processing apparatus is provided with training circuitry that is configured to select a candidate producer instruction and to evaluate a candidate producer-consumer relationship between the candidate producer instruction and subsequent candidate consumer instructions in a plurality of observed instructions. However, it may not be feasible to assess every subsequent instruction in the plurality of observed instructions to determine if it is indeed a consumer instruction that has a producer-consumer relationship with the candidate producer instruction. Additionally, the selection of suitable candidate consumer instructions may be made more difficult by the fact that the producer-consumer relationship may not be direct, i.e. producer data resulting from the producer instruction may not itself be the consumer instruction source operand. The dependency may be indirect and indeed with varying levels of complexity, e.g. the consumer instruction source operand may be generated on the basis of other data values, which themselves have undergone some data processing steps, and those other data values may only depend in some indirect way on the producer data. In order to address this, the data processing apparatus is provided with dependency tracking circuitry, which is responsive to the provision of the producer data to set a dependency marker associated with the producer data. This set dependency marker is then caused to be propagated by the processing circuitry of the data processing apparatus as it performs data processing operations, such that the set dependency marker is associated with result data values generated in the data processing operations in dependence on the producer data. Then, when determining which observed instructions to select as the subsequent candidate consumer instructions, this is done by selecting observed instructions for which at least one source operand has the dependency marker that is set. The dependency marker thus provides a technique for tracking the dependency from the producer data to a subsequent consumer instruction source operand.
The identified and tracked producer-consumer relationships can be used in various ways either to seek to improve performance of the data processing apparatus, such as in a prefetcher where a prefetch for a consumer load may be able to be initiated sooner and thus the latency associated with performance of the prefetch for the consumer load may be reduced. In other examples, these techniques may support improvement of branch prediction techniques, e.g. identifying which loads tend to cause branch mispredicts. In such examples a producer load is feeding (directly or indirectly) a consumer branch. In yet other examples, these techniques may support improvement of register caching/partitioning, where it may be desirable to steer dependent instructions to specific clusters of pipelines.
The training circuitry may be configured in a variety of ways, but in some examples the training circuitry comprises a training table with multiple entries and an entry of the training table holds an indication of the candidate producer instruction, and the training circuitry is configured to select for each entry of the multiple entries a respective candidate producer instruction and to evaluate a respective candidate producer-consumer relationship between the respective candidate producer instruction and subsequent respective candidate consumer instructions in the plurality of observed instructions. Accordingly the training circuitry can then hold a candidate producer instruction in each of the multiple entries and determine a respective producer-consumer relationship between each candidate producer instruction and the subsequent candidate consumer instructions in the plurality of observed instructions.
Whilst there might still only be a single dependency marker which can be set in association with the producer data values (such that when the dependency marker is propagated it is not possible to distinguish which producer instruction it is originally associated with), this may nonetheless provide a sufficiently useful filter in the selection of subsequent candidate consumer instructions. However in some examples, the dependency marker is a respective dependency marker of multiple dependency markers in a dependency vector associated with the producer data, the dependency vector corresponding to the multiple entries of the training table. The multiple dependency markers thus allow that distinction to be made between the multiple candidate producer instructions.
The processing circuitry may be configured to propagate a set dependency marker in a variety of ways through the data processing operations, but in some examples the processing circuitry is configured, when executing an instruction as part of the executing instructions, to generate the dependency marker for a result value as an OR function of the dependency markers for source operands of the instruction. More particularly, when the dependency markers are more than single bit values, this OR function can be implemented as a bitwise OR function such that the independence of those multiple bits of a given dependency marker is maintained. In some examples the processing circuitry is configured, when executing an instruction as part of the executing instructions, to generate the dependency marker for a result value as an XOR function of the dependency markers for source operands of the instruction, this meaning that when both (or all if appropriate) input operands have their dependency markers set, the dependency chain is intentionally broken. Avoiding such non-linear dependency chains may be desirable.
The candidate producer instruction may be selected in a variety of ways. In some examples the training circuitry is configured to select the candidate producer instruction from a plurality of observed instructions in a sampling period and is configured to select the observed instruction for which at least one source operand has the dependency marker that is set in the sampling period.
The general principle of the dependency marker is to track producer-consumer relationships that have been identified during the sampling period. As such, the dependency marker indicates a data dependency on the producer data of the source operands of the candidate consumer instructions observed in that sampling period. In examples in which it is considered to be important that the dependencies marked have strictly only been identified during that sampling period, the dependency marker (wherever it has propagated to) may be cleared before a new sampling period starts. However it is also recognised here that there will be a natural “decay” of dependencies with time elapsed (i.e. with instructions encountered) following a given candidate producer instruction. Consequently in some examples the further cost of explicitly clearing the dependency marker between sampling periods may be dispensed with. Nevertheless in some examples such clearing may be performed.
Accordingly, in some examples the training circuitry is configured, after the sampling period, to commence a new sampling period without causing the dependency marker to be cleared. Equally, in some examples the training circuitry is configured, after the sampling period, to cause the dependency marker to be cleared before commencing a new sampling period.
In some examples there are multiple dependency markers that can be set, whereby each of the multiple dependency markers corresponds to a different sampling period. Accordingly, in some examples the dependency marker is a sequence dependency marker of multiple sequence dependency markers associated with the producer data, wherein the sequence dependency marker is associated with the sampling period and a subsequent sequence dependency marker of the multiple sequence dependency markers is associated with a subsequent sampling period.
In some examples, the training circuitry is configured to select as the candidate producer instruction a predetermined type of instruction and is configured to select the observed instruction for which at least one source operand has the dependency marker that is set from a set of observed instructions that follows the candidate producer instruction.
The producer instruction and the at least one consumer instruction may generally be any kind of instruction and the dependence of a source operand of the at least one consumer instruction on the producer data may take a corresponding wide range of forms. However, in some examples the producer instruction is a producer load instruction and the at least one consumer instruction is at least one consumer load instruction, wherein a load address of the at least one consumer load instruction is generated in dependence on the producer data retrieved by the producer load instruction. The use of the present techniques in the context of load instructions may be of particular benefit, given the latency associated with retrieval of a data value from a memory system for a load.
In some examples, the data processing apparatus further comprises: a cache to store local copies of data items for use in the data processing operations; and prefetch circuitry to initiate a prefetch of data for storage in the cache and, when an observed data load matches the producer load of an identified producer-consumer relationship in the relationship table, to initiate the prefetch for the at least one consumer load in dependence on the identified producer-consumer relationship and the producer data from the observed data load to return respective consumer data for storage in the data cache. The identified producer-consumer relationship can thus enable a prefetch for the at least one consumer load to be initiated sooner and thus the latency associated with performance of the prefetch for the at least one consumer load may be reduced.
In some examples, a confidence value is associated with each producer-consumer relationship stored in the relationship table, wherein the training circuitry is configured to update the confidence value associated with each said producer-consumer relationship in iterations over multiple sampling periods, and wherein initiation of the prefetch for the at least one consumer load in dependence on the identified producer-consumer relationship requires the confidence value associated with the identified producer-consumer relationship to meet a threshold value. Accordingly, a more reliable prediction mechanism can be established.
A data processing apparatus may be provided with a store buffer, for example as part of a load/store unit, and the inventors of the present techniques have further realised in the context of the present techniques that when store-to-load-forwarding is supported it can be beneficial for the propagation of the dependency marker to include propagation through store-to-load-forwarding. Accordingly in some examples the data processing apparatus further comprises a store buffer to hold queued data values generated in the data processing operations before the queued data values are passed to a memory system, wherein propagation of the dependency marker further comprises associating the dependency marker with a queued data value held in the store buffer and, when the queued data value becomes a loaded data value via store-to-load-forwarding, propagating the dependency marker to the loaded data value.
Accordingly, whilst in some examples the propagation of the dependency markers does not penetrate the memory system, this is an example of where the propagation need not completely be excluded from all loads, since here the store-to-load-forwarding enables the propagation into the load data. Moreover, other examples will allow this propagation of the dependency markers into the memory system (e.g. the dependency markers are stored in one or more levels of data cache in association with the respective data values). In other examples, a side structure is used to track addresses with dependency bits set. This may support a more area efficient implementation, with the additional benefit that the entire structure can be cleared after a training period to avoid stale dependency markers. Alternative mechanisms may be provided for tracking the dependency information, such as through bits associated with each instruction associated with one or more stages of an execution pipeline. These can be complemented by state bits (such as one state bit per logical register) to cover larger gaps between instructions in a dependency chain as they flow through a pipeline.
The producer data on which a source operand of the at least one consumer instruction depends may take a variety of forms. However, in some examples the producer data comprises a pointer indicative of the load address of the at least one consumer load.
In some examples, the producer data comprises an array index. The use of an array index as the producer data on which a source operand of the at least one consumer instruction depends may be a useful technique in some programming contexts. However, this can also make more difficult the identification of consumer instructions for which a source operand depends on producer data resulting from a producer instruction, because of the indirection via the array into which the producer data indexes. In this context, the present techniques may be particularly beneficial in nevertheless allowing that dependency to be tracked.
The producer data and the source operands of the consumer instruction may be held in the data processing apparatus is a variety of ways. A register file configured to hold data values is a common arrangement. The registers of a register file may be used in various ways, but in some examples the data processing apparatus further comprises register storage associated with the processing circuitry, wherein registers of the register storage hold data values providing source operands of the instructions executed by the processing circuitry when performing the data processing operations, wherein the registers each have a predetermined data size, and wherein the data processing apparatus is configured to use the registers as packed registers, wherein multiple data values are held in one packed register, and wherein the dependency tracking circuitry is configured to cause the dependency marker associated with a packed register to be set when one of the multiple data values held in the packed register is a result data value generated in the data processing operations in dependence on the producer data.
In some examples, the dependency marker is a respective dependency marker of multiple dependency markers in a dependency vector associated with the producer data, the dependency vector corresponding to the multiple data values held in the packed register, wherein the dependency vector is associated with the packed register and the respective dependency markers of the multiple dependency markers correspond to the multiple data values held in the packed register.
In accordance with one example configuration there is provided a system comprising: the apparatus of any of the examples discussed above, implemented in at least one packaged chip; at least one system component; and a board, wherein the at least one packaged chip and the at least one system component are assembled on the board.
In accordance with one example configuration there is provided a chip-containing product comprising the above-mentioned system assembled on a further board with at least one other product component.
In accordance with one example configuration there is provided a method of data processing comprising:
In accordance with one example configuration there is provided the apparatus of any of the above discussed examples, wherein the processing circuitry comprising a 6×128 bit vector datapath.
Particular embodiments will now be described with reference to the figures.
illustrates a data processing apparatusin accordance with one embodiment. The apparatus comprises processing circuitry, which is arranged to perform data processing operations defined by a sequence of instructions. These instructions are stored in memoryand may be temporarily cached in instruction cache. In performing the data processing operations various producer-consumer relationships may be encountered, a producer-consumer relationship being an association between a producer instruction and at least one consumer instruction, wherein a source operand of the at least one consumer instruction is generated in dependence on producer data resulting from the producer instruction. Such a producer-consumer relationship may be relatively direct, such as when the producer data itself directly provides the source operand of the at least one consumer instruction. Other producer-consumer relationships may be considerably more indirect, such as when the producer data is involved in further data processing operations, possibly in multiple steps, which ultimately result in a data value which provides the source operand of the at least one consumer instruction. The apparatusfurther comprises training circuitry, which is arranged to evaluate candidate producer-consumer relationships. For this purpose the training circuitryselects a candidate producer instruction and subsequent candidate consumer instructions and evaluates each candidate producer-consumer relationship. An established producer-consumer relationship is caused to be stored as an entryin a relationship table. The apparatusalso further comprises dependency tracking circuitry, which is arranged to monitor the progress of the execution of the candidate producer instruction and, in response to the provision of the producer data, to set a dependency markerassociated with the producer data. The producer data may for example be held in a register accessible to the processing circuitry. The processing circuitryis configured to propagate the set dependency marker such that it is associated with result data values generated in the data processing operations in dependence on the producer data. Furthermore, in selecting the candidate consumer instructions the training circuitryis arranged to select as the candidate consumer instructions those instructions for which at least one source operand have a set dependency marker. The candidate producer instruction can for example be selected in a sampling period (also see further discussion below referring to) with the subsequent candidate consumer instructions also being selected within that sampling period. The trigger to select the candidate producer instruction can also be a predetermined type of instruction, with the subsequent candidate consumer instructions also being selected within a set (e.g. fixed number) of observed instructions that follows the candidate producer instruction.
schematically illustrates an example relationship table,schematically illustrates an example training table, andschematically illustrates example producer data,in accordance with some examples. The relationship tableis arranged to store a producer-consumer relationships. These may take the form shown inwhere each entry,,,of the relationship tablecomprises an indication of a producer instruction and one or more consumer instruction indications. In the example entries shown in: the entryindicates a relationship between a producer instruction and a consumer instruction; the entryindicates a relationship between a producer instruction and two consumer instructions; and the entryindicates a relationship between a producer instruction and three consumer instructions. The number of consumer instructions that may be associated with a given producer instruction is not limited (other than by the storage capacity per entry of the table provided). Inthe training tablecomprises two entries,each of which holds a candidate producer instruction and then, over the course of a sampling period (e.g. 1024 instructions, although the present techniques are not limited to any particular sampling period length), candidate consumer instructions are selected to form a possible producer-consumer relationship. Example producer data,is shown, wherein producer dataonly has a single associated dependency marker, and producer datahas two associated dependency markers,. In some implementations, the training tablemay comprise only one entry for a candidate producer instruction. In such a case the single dependency markerof producer datacan then correspond that that one entry. By contrast, in some implementations (such as that shown in), the training tablecomprises two entries,each of which holds a candidate producer instruction. The two dependency markers,of producer datamay then each correspond to a respective one of the two candidate producers in the two entries,. However, in some implementations the training tablemay comprises the two entries,, and yet only the single dependency marker type of producer datais used. This approach may be opted for when the use of the single dependency marker brings sufficient benefit in identifying candidate producer-consumer relationships, even though it does not allow distinction to be made between the respective candidate producer instructions in the two entries,of the training table.
schematically illustrates the propagation of a dependency marker in accordance with some examples. The execution of a producer instruction (itself having one or more source operands) generates a producer data value, which is stored in a destination register Rwith its associated dependency marker. Since this is a candidate producer instruction, its associated dependency markeris set. A subsequent instruction (not necessarily the sequentially next instruction) is configured to cause the values held in the source operand registers Rand Rto be added together with the result value being stored in the destination register R. The data valueheld in register Rhas an associated dependency marker, which is not set. The processing circuitry which carries out the addition operation also propagates the dependency markers, this being performed with the use of an OR function, which takes the dependency markers,as its inputs. Thus in the illustrated example the output of the functionis the “set” value (shown as “1” in the figure). This then provides the associated dependency markerthat accompanies the data valueheld in destination register R. When the content of the register Rthen provides a source operand for a subsequent instruction, that instruction may then be selected as a candidate consumer instruction. It will be appreciated that for clarity of illustrationonly shows an example of a single OR function, but the propagation of dependency markers may comprise many such steps. In addition, in an implementation with multiple dependency markers associated with each data value, the OR function can then be provided as a bitwise OR function such that the multiple dependency markers can be individually propagated. Alternatively an XOR function may be used, such that if both (or all if appropriate) input operands have their dependency markers set, the dependency chain is intentionally broken due to its nonlinearity.
schematically illustrates two sampling periods SPand SPin accordance with some examples. Sequentially encountered instructionsare monitored and a predefined period (e.g. 1024 instructions) forms a sampling period. Candidate producer instructions and subsequent candidate consumer instructions are selected within a given sampling period. Note that in the figure two sampling periods are shown, which (in this particular case) are overlapping. Two sampling periods may: fully overlap (i.e. correspond to an identical set of observed instructions); partially overlap; or not overlap at all. The sampling periods each correspond to one of the training table entries,(see also). The figure also shows a first item of example datathat has two associated dependency markers,, and a second item of example datathat has two associated dependency markers,, where the two dependency markers of each correspond respectively to one of the sampling periods. Accordingly without overlap between the sampling periods, a given data item will likely only have one of the two dependency marks set (if at all), although when the sampling periods overlap a given item of data could have both dependency markers set (indicating that it depends on both of the relevant producers).
schematically illustrates a data processing apparatuscomprising prefetch circuitryin accordance with some examples. The apparatus comprises processing circuitry, which is arranged to perform data processing operations defined by a sequence of instructions. These instructions are stored in memoryand may be temporarily cached in instruction cache. Data values that are the subject of the data processing operations are loaded from and stored to the memory. These data values may be temporarily cached in data cache. As in the example discussed with reference to, in performing the data processing operations various producer-consumer relationships may be encountered. One such type of producer-consumer relationship of relevance toarises when the producer instruction is a producer load instruction and the at least one consumer instruction is at least one consumer load instruction, whereby a load address of the at least one consumer load instruction is generated in dependence on the producer data retrieved by the producer load instruction. The apparatus further comprises prefetch circuitry, which is arranged to initiate prefetches for load instructions that are expected to be encountered in the upcoming sequence of instructions. Data values retrieved by such prefetch operations are stored in the data cache, so that when those expected load instructions are indeed encountered the corresponding data values are already available in the data cache, avoiding the latency associated with the performance of the loads. The prefetch circuitrymay be configured to identify the prefetches to be performed in any known manner, such as through the use of a stride prefetcher that is arranged to identify regular patterns of loads (such as a sequence of loads from memory addresses with a constant (or at least easily calculable) offset between the memory addresses). Prefetched consumer loads may occur in a number of ways. In some cases, one or more consumer prefetches may be triggered directly from an executed producer load. In other cases, another prefetcher (such as a stride prefetcher) may initiate the prefetch of the producer and subsequently the data from that producer prefetch is used to generate a consumer prefetch. In yet other cases there may be a recursive producer load, which is both a consumer load and a producer load.
The apparatusfurther comprises training circuitry, which is arranged to establish candidate producer-consumer relationships. For this purpose, the training circuitryselects a candidate producer instruction and subsequent candidate consumer instructions from observed instructions in a sampling period, indications of these forming a candidate producer-consumer relationship. The training circuitrycomprises dependency tracking circuitry, which is arranged to set a dependency marker associated with the producer data. When the producer datais required as part of the data processing operations performed by the processing circuitry, it is loaded into a register accessible to the processing circuitry (not shown in the figure). Furthermore, because the processing circuitryis configured to propagate the set dependency marker, any result data values generated in the data processing operations in dependence on the producer datawill also have the dependency marker set. In order to select candidate consumer instructions in the sampling period the training circuitryis arranged to select as the candidate consumer instructions those instructions for which at least one source operand have a set dependency marker. The candidate producer-consumer relationships established in the sampling period are caused to be stored as an entryin the pattern history table. Nevertheless, it is worth noting here that in such examples where load instructions are involved, a configuration may be adopted in which dependency markers are not propagated through load instructions, i.e. a set dependency bit is not propagated through a consumer load from its sources to its destinations. This is because, for a consumer load which is dependent on a producer load's data, it would not be expected that the consumer load's own data has a relationship to the producer load. Accordingly, by not propagating the dependency markers though load instructions, the filtering effectiveness of the dependency marks for other instructions can be better supported. One exception to this non-propagation of the dependency markers is in the case where a load instruction produce an updated value for its base register.
schematically illustrates an example pattern history table,schematically illustrates an example training table, andschematically illustrates example producer data in accordance with some examples. The pattern history tableis arranged to store a producer-consumer relationships. In this example these are relationships between producer loads and consumer loads, for example where the memory address that is accessed by a consumer load is given (directly or indirectly) by the data value loaded by a producer load. These may take the form shown inwhere each entry,,,of the pattern history tablecomprises an indication of a producer load instruction and one or more indications of at least one consumer load instruction. In the example entries shown in: the entryindicates a relationship between a producer load instruction and a consumer load instruction; the entryindicates a relationship between a producer load instruction and two consumer load instructions; and the entryindicates a relationship between a producer load instruction and three consumer load instructions. The number of consumer load instructions that may be associated with a given producer load instruction is not limited (other than by the storage capacity per entry of the table provided). Inthe training tablecomprises two entries,each of which holds a candidate producer load instruction and then, over the course of a sampling period (e.g. 1024 instructions, although the present techniques are not limited to any particular sampling period length), candidate consumer load instructions are selected to form a possible producer-consumer relationship. Example producer data,is shown, wherein producer dataonly has a single associated dependency marker, and producer datahas two associated dependency markers,. In some implementations, the training tablemay comprise only one entry for a candidate producer instruction. In such a case the single dependency markerof producer datacan then correspond that that one entry. By contrast, in some implementations (such as that shown in), the training tablecomprises two entries,each of which holds a candidate producer instruction. The two dependency markers,of producer datamay then each correspond to a respective one of the two candidate producers in the two entries,. However, in some implementations the training tablemay comprises the two entries,, and yet only the single dependency marker type of producer datais used. This approach may be opted for when the use of the single dependency marker brings sufficient benefit in identifying candidate producer-consumer relationships, even though it does not allow distinction to be made between the respective candidate producer load instructions in the two entries,of the training table.
schematically illustrates the propagation of a dependency marker in accordance with some examples. This represents a variant of the example shown inand a detailed description of each step is dispensed with for brevity. In the example ofit is the execution of a producer load instruction, which specifies a memory address (candidate producer load address), that results in the retrieval of a producer data value, which is stored in a destination register Rwith its associated dependency marker. Since this is a candidate producer load instruction, its associated dependency markeris set. As shown in the figure, via a similar set of steps as is the case in, the execution of an ADD instruction causes the values held in the source operand registers Rand Rto be added together with the result value being stored in the destination register R. The data valueheld in register Rhas an associated dependency marker, which is not set. The OR functiongenerates the associated dependency markerthat accompanies the data valueheld in destination register R. The data valuethen provides the load address for the consumer load instruction.
schematically illustrates two sampling periods SPand SPin accordance with some examples in a variant on the example of. The sequence of instructionsthat are monitored comprise load instructions and a predefined period (e.g. 1024 instructions) forms a sampling period. Candidate producer load instructions and subsequent candidate consumer load instructions are selected within a given sampling period. As before for the example of, the two sampling periods are overlapping and sampling periods may: fully overlap; partially overlap; or not overlap at all. The sampling periods each correspond to one of the training table entries,(see also). The figure also shows example producer datathat has two associated dependency markers,, where these each correspond respectively to one of the sampling periods.
schematically illustrates a data processing apparatuscomprising a load/store unitin accordance with some examples. The processing circuitryperforms data processing operations specified by a sequence of instructions. These data processing operations comprise the loading of data values from memoryand the storing of data values to the memory. A cacheis also shown as part of the memory system. When the processing circuitry executes a store instruction for a given data value(with an associated dependency marker), the resulting store operation is primarily handled by the load/store unit. Within the load store unitthere is provided a store buffer, in which to-be-stored data values are held in a queue, before being passed to the memory system for storage. An example data value(with an associated dependency marker) held in the store bufferis shown. The load store unitalso comprises a load queue. When the processing circuitry executes a load instruction, an entry in the load queueindicates that there is a pending load (whilst the specified memory address is accessed and the data value stored there is retrieved). The load/store unitis arranged to support store-to-load-forwarding, whereby when the processing circuitry executes a load instruction, the load/store unitis arranged to check whether the memory address specified by the load instruction corresponds to the memory address of an entry in the store buffer. When this is the case, store-to-load-forwarding enables the data value requested by the load instruction to be provided promptly by provision directly from the store buffer(because this will be the most up-to-date version of that data value), rather than waiting for the data value to be written to memory by the store operation and the retrieved by the load operation. Hence the example data value(with an associated dependency marker) in the load queuecan be provided in this manner from the store buffer. Note the associated dependency marker is also forwarded, such that the propagation of the dependency marker is also maintained for operations which store then load a data value. Accordingly it is to be noted that although implementations may limit the propagation of the dependency markers to not penetrate the memory system, this is an example of where the propagation need not completely be excluded from all loads, since here the store-to-load-forwarding enables the propagation into the load data.
schematically illustrates the selection of candidate consumer loads in accordance with some examples. Here the load instructions concerned make use of pointers to specify the memory addresses from which data values are to be retrieved. Furthermore, because of the nature of the data processing operations being performed, related producer load data and consumer load instructions are specified by pointers that share the upper bits of their respective values. Accordingly, this fact is made use of here by the training circuitryas a further filter by which to identify candidate consumer loads. For a selected item of candidate producer load data, the training circuitry is arranged to compare the upper set of bits of the candidate producer load datawith the same portion of a subsequent load addressobserved in a sampling period. When these are found to match by comparison circuitry, then the observed load addressmay be selected as a candidate consumer load address. Other examples may have a different kind of relationship between the candidate producer load data and the candidate consumer load(s), such as when the producer load retrieves an array index as the producer data, where this array index is then used to determine the candidate consumer load address (for example with reference to an array base address and using an array element size). Such indirection (via the array) would otherwise make the identification of candidate consumer loads for a given candidate producer load very challenging, yet the use of the dependency marker of the present techniques allows the dependency between the two to be identified.
schematically illustrates the use of dependency markers in packed registers in accordance with some examples. A first packed registeris shown wherein the two halves of the register,each independently hold a data value. A dependency markeris shown which is associated with the register. Where dependency markeras a single bit value, whilst this marker can be set as described in any of the examples described herein, distinction as to whether the marker has been set in association with the first packed register halfor the second packed register halfis not possible. A second packed registeris shown wherein the two halves of the register,each independently hold a data value. For this register, two dependency markers,are provided. In this example, where the dependency markers,each comprise (at least) a single bit value, the dependency markercan be associated with the first packed register halfand the dependency markercan be associated with the second packed register half.
is a flow diagram schematically illustrating a sequence of steps that are taken in a method in accordance with some examples. The flow can be considered to begin at step, where a candidate producer instruction is selected in a new sampling period. At step, the dependency marker for the resulting producer data from the candidate producer instruction is set. Then at stepinstructions are observed during the sampling period, whereby for each instruction in the sampling period (step) it is then determined at stepwhether a source operand of that instruction has the dependency marker set. When it does not have the dependency marker set, the flow proceeds to stepat which it is determined whether there are further instructions in the sampling period. When this is not the case the sampling period is concluded and the flow returns to step. Otherwise, at step, when the dependency marker is found to be set in a source operand of the instruction, at stepthat instruction is selected as a candidate consumer instruction. At stepa producer-consumer relationship can be added to the relationship table indicating a relationship between the candidate producer instruction and the candidate consumer instruction. The flow then links to step.
is a flow diagram schematically illustrating a sequence of steps that are taken in a method in accordance with some examples. Here the instructions are load instructions. Hence at the start of the flow, at step, a candidate producer load instruction is selected in a new sampling period. At step, the dependency marker for the retrieved producer data from the candidate producer load instruction is set. Then at stepload instructions are observed during the sampling period, whereby for each load instruction in the sampling period (step) it is determined at stepwhether the load address of that instruction has the dependency marker set. When it does not have the dependency marker set, the flow proceeds to stepat which it is determined whether there are further loads in the sampling period. When this is not the case the sampling period is concluded and the flow returns to step. Otherwise, at step, when the dependency marker is found to be set in the load address of the load instruction, at stepthat instruction is selected as a candidate consumer load instruction. At stepa producer-consumer relationship can be added to the pattern history table indicating a relationship between the candidate producer load instruction and the candidate consumer load instruction. The flow then links to step.
is a flow diagram schematically illustrating a sequence of steps that are taken in a method in accordance with some examples. This method describes the operation of prefetch circuitry that makes use of a pattern history table (in which producer-consumer relationships are stored). At stepit is determined whether an observed data load (whether initiated by a load instruction or by a prefetch operation) matches an indication of a producer load in the pattern history table. When this is not the case, the flow loops on itself at step. When such a match is found, at stepit is determined whether the corresponding producer-consumer relationship satisfies a confidence criterion. Producer-consumer relationships that are stored in the pattern history table may initially not meet this confidence criterion, but further observations of this producer-consumer relationships is subsequent sampling periods can be used to increase the confidence in the relationship. When the confidence criterion is not met the flow returns to set. When the confidence criterion is met, at stepthe prefetch circuitry initiates a prefetch corresponding to the consumer load associated with the producer load using the producer data to define the memory address to be accessed by the consumer load. The flow then returns to step.
Concepts described herein may be embodied in a system comprising at least one packaged chip. The apparatus and circuitry described earlier is implemented in the at least one packaged chip (either being implemented in one specific chip of the system, or distributed over more than one packaged chip). The at least one packaged chip is assembled on a board with at least one system component. A chip-containing product may comprise the system assembled on a further board with at least one other product component. The system or the chip-containing product may be assembled into a housing or onto a structural support (such as a frame or blade).schematically illustrates a systemcomprising an implementation in a packaged chipand an implementation in a chip-containing productin accordance with some examples. Hence, one or more packaged chips, with the apparatus described above implemented on one chip or distributed over two or more of the chips, are manufactured by a semiconductor chip manufacturer. In some examples, the chip productmade by the semiconductor chip manufacturer may be provided as a semiconductor package which comprises a protective casing (e.g. made of metal, plastic, glass or ceramic) containing the semiconductor devices implementing the apparatus and circuitry described above and connectors, such as lands, balls or pins, for connecting the semiconductor devices to an external environment. Where more than one chipis provided, these could be provided as separate integrated circuits (provided as separate packages), or could be packaged by the semiconductor provider into a multi-chip semiconductor package (e.g. using an interposer, or by using three-dimensional integration to provide a multi-layer chip product comprising two or more vertically stacked integrated circuit layers).
In some examples, a collection of chiplets (i.e. small modular chips with particular functionality) may itself be referred to as a chip. A chiplet may be packaged individually in a semiconductor package and/or together with other chiplets into a multi-chiplet semiconductor package (e.g. using an interposer, or by using three-dimensional integration to provide a multi-layer chiplet product comprising two or more vertically stacked integrated circuit layers).
The one or more packaged chipsare assembled on a boardtogether with at least one system componentto provide a system. For example, the board may comprise a printed circuit board. The board substrate may be made of any of a variety of materials, e.g. plastic, glass, ceramic, or a flexible substrate material such as paper, plastic or textile material. The at least one system componentcomprise one or more external components which are not part of the one or more packaged chip(s). For example, the at least one system componentcould include, for example, any one or more of the following: another packaged chip (e.g. provided by a different manufacturer or produced on a different process node), an interface module, a resistor, a capacitor, an inductor, a transformer, a diode, a transistor and/or a sensor.
Unknown
May 19, 2026
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.