US-8099733

Parallel data processing architecture

PublishedJanuary 17, 2012

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

A tree-structured index to multidimensional data is created using naturally occurring patterns and clusters within the data which permit efficient search and retrieval strategies in a database of DNA profiles. A search engine utilizes hierarchical decomposition of the database by identifying clusters of similar DNA profiles and maps to parallel computer architecture, allowing scale up past to previously feasible limits. Key benefits of the new method are logarithmic scale up and parallelization. These benefits are achieved by identification and utilization of naturally occurring patterns and clusters within stored data. The patterns and clusters enable the stored data to be partitioned into subsets of roughly equal size. The method can be applied recursively, resulting in a database tree that is balanced, meaning that all paths or branches through the tree have roughly the same length. The method achieves high performance by exploiting the natural structure of the data in a manner that maintains balanced trees. Implementation of the method maps naturally to parallel computer architectures, allowing scale up to very large databases.

Patent Claims

19 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. In a parallel data processing architecture for search, storage and retrieval of data of a database responsive to queries for specific data of said database, said parallel data processing architecture comprising a) a plurality of host processors comprising at least one root host processor responsive to a client query for said specific data of said database and at least one other host processor; b) a communication system coupling said plurality of host processors, said plurality of host processors being capable of communicating with one another; and c) host processor memory, a method of balancing workload between said plurality of host processors comprising: each of said plurality of host processors maintaining load information of processor capacity and search queue length of said host processor, each of said plurality of host processors further maintaining information of a search queue of client queries at said host processor for said specific data of said database; each of said plurality of host processors broadcasting said load information of its processor capacity and search queue length to at least one other of said plurality of host processors; and each of said plurality of host processors bringing its search queue of client queries into balance with another of said plurality of host processors according to a time constant responsive to receipt of said broadcast capacity and load information, said balancing including exchanging unprocessed search requests with a recipient host processor responsive to a stochastic selection process to determine the recipient host processor of an exchanged search request between said root host processor and said recipient host processor thereby minimizing a time required to respond to client queries for retrieval of responsive data from said database, said bringing into balance comprising exchanging a block of search requests from host processors having long average waiting times to host processors having short waiting times and adjusting a size of a block of exchanged requests according to relative processing speeds of host processors and inter-processor communications protocol between the host processors.

2. The method of claim 1 , the root host processor using an initial search queue for said client query for said specific data of said database.

3. The method of claim 1 , the root host processor selecting a search queue of client queries for specific data of said database of another of said plurality of host processors as an initial search queue of client queries.

4. The method of claim 1 , said database comprising a plurality of DNA profiles, said client query for said specific data of said database being generated in response to one of target, match stringency, mismatch, equivalence, number of alleles and measurement error specifications, and said bringing its search queue of client queries into balance according to a time constant minimizing the time required for performing a client query for DNA profiles satisfying a set of selected database search criteria.

5. In a parallel data processing architecture for search, storage and retrieval of data of a database responsive to queries for specific data of said database, said parallel data processing architecture comprising a) a plurality of host processors comprising at least one root host processor responsive to a client query for said specific data of said database and at least one other host processor; b) a communication system coupling said plurality of host processors, said plurality of host processors being capable of communicating with one another; and c) host processor memory, a method of balancing workload between said plurality of host processors comprising: each of said plurality of host processors maintaining load information of processor capacity and search queue length of said host processor, each of said plurality of host processors further maintaining information of a search queue of client queries at said host processor for said specific data of said database; each of said plurality of host processors broadcasting said load information of its processor capacity and search queue length to at least one other of said plurality of host processors; and each of said plurality of host processors bringing its search queue of client queries into balance with another of said plurality of host processors according to a time constant responsive to receipt of said broadcast capacity and load information, said balancing including exchanging unprocessed search requests with a recipient host processor responsive to a stochastic selection process to determine the recipient host processor of an exchanged search request between said root host processor and said recipient host processor thereby minimizing a time required to respond to client queries for retrieval of responsive data from said database, said bringing a search queue of client queries into balance further comprising exchanging a block of search requests between at least two of said plurality of host processors, thereby minimizing a time required to respond to client queries by said exchanging of said block of search requests and adjusting a size of the block of exchanged search requests according to relative processing speeds of host processors and inter-processor communications protocol between the host processors.

6. The method of claim 5 , the root host processor using an initial search queue for said client query for said specific data of said database.

7. The method of claim 5 , the root host processor selecting a search queue of client queries for specific data of said database of another of said plurality of host processors as an initial search queue of client queries.

8. The method of claim 5 , said database comprising a plurality of DNA profiles, said client query for said specific data of said database being generated in response to one of target, match stringency, mismatch, equivalence, number of alleles and measurement error specifications, and said bringing its search queue of client queries into balance according to a time constant minimizing the time required for performing a client query for DNA profiles satisfying a set of selected database search criteria.

9. In a parallel data processing architecture for search, storage and retrieval of specific data of a database responsive to queries for said specific data, said parallel data processing architecture comprising a) a plurality of available host processors comprising at least one root host processor responsive to a client query for said specific data and at least one other host processor; b) a communication system coupling said plurality of available host processors, said plurality of available host processors being capable of communicating with one another; and c) host processor memory, a method of storing information of said plurality of available host processors for responding to client queries for specific data of said database comprising: at least two of said plurality of available host processors each having a search engine and maintaining information of said plurality of said available host processors and a processor capacity and load of said plurality of available host processors, said information including information of a search queue length of client queries; each of said plurality of available host processors broadcasting its capacity and load information to each other of said plurality of available host processors and bringing each search queue of client queries into balance with another of said plurality of available host processors according to a time constant in response to receipt of said broadcast capacity and load information, said bringing its search queue into balance comprising equalizing average waiting times for service and computation between said search engines, exchanging a block of search requests from host processors having long average waiting times to host processors having short waiting times and adjusting a size of the block of exchanged requests according to relative processing speeds of host processors and inter-processor communications protocol between the host processors; stochastically selecting among said plurality of available host processors to determine a recipient host processor for an exchanged search request from said root host processor for searching said database, said recipient host processor storing a database index for said database comprising nodes of said database index for said database and data accessible via said nodes; and each of said plurality of available host processors reconfiguring information on said plurality of available host processors responsive to the receipt of said broadcast information of said capacity and load information.

10. The method of claim 9 , wherein the information on said plurality of available host processors at each available host processor changes in response to failure of a first host processor, said changed information on said plurality of available host processors permitting a swapping of a second available host processor for said failed first host processor.

11. The method of claim 9 , wherein the information on said plurality of available host processors at each of said plurality of available host processors changes in response to an addition of a new host processor, said changed information on said plurality of available host processors permitting said addition of said new host processor, said new host processor being upgraded with more current technology than possessed by at least one of said plurality of available host processors.

12. The method of claim 9 , wherein the information on said plurality of available host processors at each available host processor changes in response to a failure of a first host processor and an addition of a second host processor, said changing information on said plurality of available host processors permitting a swapping of said failed first host processor for said second host processor.

13. The method of claim 12 , wherein said second host processor is upgraded with more current technology than possessed by said failed first host processor.

14. The method of claim 12 , wherein said second host processor is upgraded with more current technology than possessed by said any other of said plurality of available host processors.

15. The method of claim 9 , the root host processor using an initial search queue for said client query for said specific data of said database.

16. The method of claim 9 , the root host processor selecting a search queue of client queries for specific data of said database of another of said plurality of available host processors as an initial search queue of client queries.

17. The method of claim 9 , said database comprising a plurality of DNA profiles, said client query for said specific data of said database being generated in response to one of target, match stringency, mismatch, equivalence, number of alleles and measurement error specifications, and said bringing its search queue of client queries into balance according to a time constant minimizing the time required for performing a client query for DNA profiles satisfying a set of selected database search criteria.

18. In a parallel data processing architecture for search, storage and retrieval of data of a database responsive to queries for specific data of said database, said parallel data processing architecture comprising a) a plurality of host processors comprising at least one root host processor responsive to a client query for said specific data of said database and at least one other host processor; b) a communication system coupling said plurality of host processors, said plurality of host processors being capable of communicating with one another; and c) host processor memory, a method of balancing workload between said plurality of host processors comprising: each of said plurality of host processors maintaining load information of processor capacity and search queue length of said host processor, each of said plurality of host processors further maintaining information of a search queue of client queries at said host processor for said specific data of said database; each of said plurality of host processors broadcasting said load information of its processor capacity and search queue length to at least one other of said plurality of host processors; each of said plurality of host processors reconfiguring information on available host processors responsive to the receipt of said broadcast information of said capacity and load information; and each of said plurality of host processors bringing its search queue of client queries into balance with another of said plurality of host processors according to a time constant responsive to receipt of said broadcast capacity and load information, said balancing including exchanging unprocessed search requests with a recipient host processor responsive to a stochastic selection process to determine the recipient host processor of an exchanged search request between said root host processor and said recipient host processor, said bringing a search queue of client queries into balance further comprising exchanging a block of search requests between said plurality of host processor and adjusting a size of the block of exchanged requests according to relative processing speeds of host processors and inter-processor communications protocol between the host processors.

19. The method of claim 18 , wherein a time required to respond to client queries is minimized by said exchanging of said block of search requests between said plurality of host processors.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G06F G16B

Patent Metadata

Filing Date

January 2, 2008

Publication Date

January 17, 2012

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search