High-Performance Hash Joins Using Memory with Extensive Internal Parallelism

PublishedNovember 14, 2017

Assigneenot available in USPTO data we have

InventorsJeffrey H. Derby Charles Johnson Robert K. Montoye Dheeraj Sreedhar Steven P. VanderWiel

Technical Abstract

Patent Claims

12 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A system comprising: a memory; and one or more processor cores, communicatively coupled to the memory, the one or more processor cores configured to: issue, to a dynamic random access memory with extensive internal parallelism (DRAM with EIP), a first group of two or more load requests to load data from a hash table comprising one or more hash buckets, wherein the hash table is constructed from hashed join-key values of a dimension table for a hash-join procedure, and wherein each load request in the first group corresponds to an entry in a fact table of the hash-join procedure and seeks a hash bucket matching a hashed join-key value for the corresponding entry in the fact table; issue, to the DRAM with EIP, a second group of two or more load requests to load data from the hash table; receive, from the DRAM with EIP, first response data that is responsive to the first group of load requests, wherein the first response data comprises one or more hash buckets from the hash table; and process the first response data while awaiting second response data that is responsive to the second group of load requests, wherein processing the first response data comprises: identifying matches between the join-key values corresponding to entries in the two or more load requests of the first group and the one or more hash buckets in the first response data; wherein the size of the second group of two or more load requests is selected such that a time for processing the first response data is based on the latency in receiving the second response data.

2. The system of claim 1 , wherein issuing the first group of two or more load requests and issuing the second group of two or more load requests are performed on back-to-back processor cycles.

3. The system of claim 1 , wherein the one or more processor cores are further configured to: read two or more entries of the fact table; hash a join-key value of each entry of the fact table; and add the hashed join-key value of each entry of the fact table, along with associated data, to a work queue; wherein issuing the first group of two or more load requests comprises issuing load requests corresponding to two or more entries of the work queue.

4. The system of claim 3 , wherein the one or more processor cores are further configured to sort the work queue to dynamically reduce differential latencies for receiving response data that is responsive to two or more groups of load requests issued.

5. The system of claim 1 , wherein the one or more processor cores are further configured to dynamically modify the size of the second group of two or more load requests.

6. The system of claim 1 , wherein the one or more processor cores are further configured to select the size of the second group of two or more load requests, wherein selecting the size of the second group comprises: calculating an aggregate latency of a third group of two or more load requests issued by a single thread, wherein the aggregate latency is the time between issuing the third group of two or more load requests and receiving a response; identifying the dependence of the aggregate latency on the number of requests in the third group; and determining an optimum number of load requests in the second group based at least in part on the aggregate latency and the dependence of the aggregate latency on the number of requests in the third group.

7. A computer program product for managing a hash-join procedure, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to perform a method comprising: issuing, to a dynamic random access memory with extensive internal parallelism (DRAM with EIP), a first group of two or more load requests to load data from a hash table comprising one or more hash buckets, wherein the hash table is constructed from hashed join-key values of a dimension table for a hash-join procedure, and wherein each load request in the first group corresponds to an entry in a fact table of the hash-join procedure and seeks a hash bucket matching a hashed join-key value for the corresponding entry in the fact table; issuing, to the DRAM with EIP, a second group of two or more load requests to load data from the hash table; receiving, from the DRAM with EIP, first response data that is responsive to the first group of load requests, wherein the first response data comprises one or more hash buckets from the hash table; and processing the first response data while awaiting second response data that is responsive to the second group of load requests, wherein processing the first response data comprises: identifying matches between the join-key values corresponding to entries in the two or more load requests of the first group and the one or more hash buckets in the first response data; wherein the size of the second group of two or more load requests is selected such that a time for processing the first response data is based on the latency in receiving the second response data.

8. The computer program product of claim 7 , wherein issuing the first group of two or more load requests and issuing the second group of two or more load requests are performed on back-to-back processor cycles.

9. The computer program product of claim 7 , the method further comprising: reading two or more entries of the fact table; hashing a join-key value of each entry of the fact table; and adding the hashed join-key value of each entry of the fact table, along with associated data, to a work queue; wherein issuing the first group of two or more load requests comprises issuing load requests corresponding to two or more entries of the work queue.

10. The computer program product of claim 9 , the method further comprising sorting the work queue to dynamically reduce differential latencies for receiving response data that is responsive to two or more groups of load requests issued.

11. The computer program product of claim 7 , the method further comprising dynamically modifying the size of the second group of two or more load requests.

12. The computer program product of claim 7 , the method further comprising selecting the size of the second group of two or more load requests, wherein the selecting comprises: calculating an aggregate latency of a third group of two or more load requests issued by a single thread, wherein the aggregate latency is the time between issuing the third group of two or more load requests and receiving a response; identifying the dependence of the aggregate latency on the number of requests in the third group; and determining an optimum number of load requests in the second group based at least in part on the aggregate latency and the dependence of the aggregate latency on the number of requests in the third group.

Patent Metadata

Filing Date

Unknown

Publication Date

November 14, 2017

Inventors

Jeffrey H. Derby

Charles Johnson

Robert K. Montoye

Dheeraj Sreedhar

Steven P. VanderWiel

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search