Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
1. A system comprising: a memory; and one or more processor cores, communicatively coupled to the memory, the one or more processor cores configured to: issue, to a dynamic random access memory with extensive internal parallelism (DRAM with EIP), a first group of two or more load requests to load data from a hash table comprising one or more hash buckets, wherein the hash table is constructed from hashed join-key values of a dimension table for a hash-join procedure, and wherein each load request in the first group corresponds to an entry in a fact table of the hash-join procedure and seeks a hash bucket matching a hashed join-key value for the corresponding entry in the fact table; issue, to the DRAM with EIP, a second group of two or more load requests to load data from the hash table; receive, from the DRAM with EIP, first response data that is responsive to the first group of load requests, wherein the first response data comprises one or more hash buckets from the hash table; and process the first response data while awaiting second response data that is responsive to the second group of load requests, wherein processing the first response data comprises: identifying matches between the join-key values corresponding to entries in the two or more load requests of the first group and the one or more hash buckets in the first response data; wherein the size of the second group of two or more load requests is selected such that a time for processing the first response data is based on the latency in receiving the second response data.
A system performs hash joins by sending parallel load requests to a memory (DRAM) with internal parallelism. First, it sends a group of two or more load requests to fetch data from a hash table. This hash table contains hashed join-key values from a dimension table. Each load request corresponds to an entry in a fact table and seeks a matching hash bucket. Next, a second group of two or more load requests is sent. While waiting for the response to the second group, the system processes the first group's response, which contains hash buckets. This processing involves finding matches between the join-key values from the first group's requests and the hash buckets received. The size of the second group is chosen so that the time to process the first response is roughly equal to the time it takes to receive the second response.
2. The system of claim 1 , wherein issuing the first group of two or more load requests and issuing the second group of two or more load requests are performed on back-to-back processor cycles.
The hash join system described previously improves performance by issuing the first and second groups of parallel memory load requests on consecutive processor cycles. This minimizes the delay between sending the requests, allowing the memory to operate continuously and maximizing its internal parallelism to improve overall hash join speed.
3. The system of claim 1 , wherein the one or more processor cores are further configured to: read two or more entries of the fact table; hash a join-key value of each entry of the fact table; and add the hashed join-key value of each entry of the fact table, along with associated data, to a work queue; wherein issuing the first group of two or more load requests comprises issuing load requests corresponding to two or more entries of the work queue.
The hash join system first reads entries from the fact table and calculates a hash value from the join-key of each entry. It adds these hashed join-key values, along with other relevant data, into a work queue. When the system sends out the first group of memory load requests, it creates these requests based on the entries currently stored within the work queue. This allows for efficient pipelining of the fact table entries.
4. The system of claim 3 , wherein the one or more processor cores are further configured to sort the work queue to dynamically reduce differential latencies for receiving response data that is responsive to two or more groups of load requests issued.
In the hash join system with a work queue described earlier, the work queue is sorted to reduce differences in response times from the memory. By prioritizing requests that are expected to return more quickly (e.g., based on memory locality or access patterns), the system reduces the overall latency and processing time for multiple groups of load requests. This dynamic sorting helps to optimize the memory's parallel processing capabilities.
5. The system of claim 1 , wherein the one or more processor cores are further configured to dynamically modify the size of the second group of two or more load requests.
The hash join system dynamically adjusts the size of the second group of memory load requests. This allows the system to adapt to changing memory access patterns and processing loads. By modifying the number of parallel requests, the system can optimize the balance between request latency and data processing time, improving the overall performance of the hash join operation.
6. The system of claim 1 , wherein the one or more processor cores are further configured to select the size of the second group of two or more load requests, wherein selecting the size of the second group comprises: calculating an aggregate latency of a third group of two or more load requests issued by a single thread, wherein the aggregate latency is the time between issuing the third group of two or more load requests and receiving a response; identifying the dependence of the aggregate latency on the number of requests in the third group; and determining an optimum number of load requests in the second group based at least in part on the aggregate latency and the dependence of the aggregate latency on the number of requests in the third group.
To optimize the size of the second group of memory load requests, the hash join system calculates the "aggregate latency" for a third group of memory load requests issued by a single thread. Aggregate latency is the total time from issuing the requests to receiving the response. The system identifies how the aggregate latency changes based on the number of requests in the third group. Based on this relationship, the system determines the best number of load requests for the second group to optimize the balance between memory access time and data processing time.
7. A computer program product for managing a hash-join procedure, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to perform a method comprising: issuing, to a dynamic random access memory with extensive internal parallelism (DRAM with EIP), a first group of two or more load requests to load data from a hash table comprising one or more hash buckets, wherein the hash table is constructed from hashed join-key values of a dimension table for a hash-join procedure, and wherein each load request in the first group corresponds to an entry in a fact table of the hash-join procedure and seeks a hash bucket matching a hashed join-key value for the corresponding entry in the fact table; issuing, to the DRAM with EIP, a second group of two or more load requests to load data from the hash table; receiving, from the DRAM with EIP, first response data that is responsive to the first group of load requests, wherein the first response data comprises one or more hash buckets from the hash table; and processing the first response data while awaiting second response data that is responsive to the second group of load requests, wherein processing the first response data comprises: identifying matches between the join-key values corresponding to entries in the two or more load requests of the first group and the one or more hash buckets in the first response data; wherein the size of the second group of two or more load requests is selected such that a time for processing the first response data is based on the latency in receiving the second response data.
A computer program manages hash joins by issuing parallel load requests to a memory (DRAM) with internal parallelism. First, it sends a group of two or more load requests to fetch data from a hash table. This hash table contains hashed join-key values from a dimension table. Each load request corresponds to an entry in a fact table and seeks a matching hash bucket. Next, a second group of two or more load requests is sent. While waiting for the response to the second group, the program processes the first group's response, which contains hash buckets. This processing involves finding matches between the join-key values from the first group's requests and the hash buckets received. The size of the second group is chosen so that the time to process the first response is roughly equal to the time it takes to receive the second response.
8. The computer program product of claim 7 , wherein issuing the first group of two or more load requests and issuing the second group of two or more load requests are performed on back-to-back processor cycles.
The computer program that performs hash joins, as described previously, improves performance by issuing the first and second groups of parallel memory load requests on consecutive processor cycles. This minimizes the delay between sending the requests, allowing the memory to operate continuously and maximizing its internal parallelism to improve overall hash join speed.
9. The computer program product of claim 7 , the method further comprising: reading two or more entries of the fact table; hashing a join-key value of each entry of the fact table; and adding the hashed join-key value of each entry of the fact table, along with associated data, to a work queue; wherein issuing the first group of two or more load requests comprises issuing load requests corresponding to two or more entries of the work queue.
The computer program that performs hash joins first reads entries from the fact table and calculates a hash value from the join-key of each entry. It adds these hashed join-key values, along with other relevant data, into a work queue. When the system sends out the first group of memory load requests, it creates these requests based on the entries currently stored within the work queue. This allows for efficient pipelining of the fact table entries.
10. The computer program product of claim 9 , the method further comprising sorting the work queue to dynamically reduce differential latencies for receiving response data that is responsive to two or more groups of load requests issued.
In the computer program that manages a hash join using a work queue, the work queue is sorted to reduce differences in response times from the memory. By prioritizing requests that are expected to return more quickly (e.g., based on memory locality or access patterns), the program reduces the overall latency and processing time for multiple groups of load requests. This dynamic sorting helps to optimize the memory's parallel processing capabilities.
11. The computer program product of claim 7 , the method further comprising dynamically modifying the size of the second group of two or more load requests.
The computer program that performs hash joins dynamically adjusts the size of the second group of memory load requests. This allows the program to adapt to changing memory access patterns and processing loads. By modifying the number of parallel requests, the program can optimize the balance between request latency and data processing time, improving the overall performance of the hash join operation.
12. The computer program product of claim 7 , the method further comprising selecting the size of the second group of two or more load requests, wherein the selecting comprises: calculating an aggregate latency of a third group of two or more load requests issued by a single thread, wherein the aggregate latency is the time between issuing the third group of two or more load requests and receiving a response; identifying the dependence of the aggregate latency on the number of requests in the third group; and determining an optimum number of load requests in the second group based at least in part on the aggregate latency and the dependence of the aggregate latency on the number of requests in the third group.
To optimize the size of the second group of memory load requests, the computer program calculates the "aggregate latency" for a third group of memory load requests issued by a single thread. Aggregate latency is the total time from issuing the requests to receiving the response. The program identifies how the aggregate latency changes based on the number of requests in the third group. Based on this relationship, the program determines the best number of load requests for the second group to optimize the balance between memory access time and data processing time.
Unknown
November 14, 2017
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.