Distributed Query Execution and Aggregation

PublishedJanuary 14, 2025

Assigneenot available in USPTO data we have

InventorsLuke A. Higgins Robert R. Bruno

Technical Abstract

Patent Claims

17 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A system for distributing a query and aggregating results of the distributed query, comprising: one or more data stores; a distribution server comprising one or more processors; a first one or more querying computing devices, each computing device being associated with at least one data store of the one or more data stores; a second one or more aggregating computing devices; and non-transitory memory comprising instructions that, when executed by the one or more processors of the distribution server, cause those processors to: receive a dynamically generated script representing a query of files in one or more data stores; transmit the dynamically generated script to each computing device of a first one or more querying computing devices; direct each computing device of the first one or more querying computing devices to perform the query on a subset of files in the one or more data stores; receive from each computing device a subset of results based on execution of the dynamically generated script on the subset of files, wherein each subset of results represents a set of records that satisfy a query that was used to dynamically generate the script or that contain statistical summary information on records based on a query that was used to dynamically generate the script; direct each computing device of a second one or more aggregating computing devices to aggregate the subsets of results by a key value, wherein a hash of the key value is used to perform partitioning the subsets of results by a key value to omit at least one timestamp from the first one or more querying computing devices; and transmit aggregated results into at least one bucket wherein the at least one bucket was generated based on a predetermined time schedule to facilitate time-based queries and wherein each of the at least one bucket includes at least one sub-bucket that receives files of a specified attribute and wherein the at least one bucket is made accessible to a user who wrote the query or another user.

2. The system of claim 1, wherein the subsets of files are assigned to the first one or more querying computing devices in a round robin fashion.

3. The system of claim 1, wherein a value of a hash of the key value is used to determine which computing device of the second one or more aggregating computing devices is directed to aggregate results related to that key value.

4. The system of claim 1, wherein the statistical summary information is a count of numbers of files that satisfy the query, grouped by value of a provided key field.

5. The system of claim 1, wherein the statistical summary information is a sum of values, count of values, standard deviation of values, or other statistical property of values in a provided key field, grouped by value of another key field.

6. The system of claim 1, wherein a k-means clustering is performed during aggregation of the subsets of results to generate a classifying field in output of the query.

7. The system of claim 1, wherein the query comprises one or more enrichment fields specifying replacement of values from a key field in the query with a replacement value from an external data source.

8. The system of claim 1, wherein the aggregated results are stored in the storage made available for an end user to download and process in any order at the end user's convenience.

9. The system of claim 1 wherein the specified attribute is at least one of: a file type, a log type, or a file source.

10. A computer-implemented method for distributing a query and aggregating results of the distributed query, comprising: receiving a dynamically generated script representing a query of files in one or more data stores; transmitting the dynamically generated script to each computing device of a first one or more querying computing devices; directing each computing device of the first one or more querying computing devices to perform the query on a subset of files in the one or more data stores; receiving from each computing device a subset of results based on execution of the dynamically generated script on the subset of files, wherein each subset of results represents a set of records that satisfy a query that was used to dynamically generate the script or that contain statistical summary information on records based on a query that was used to dynamically generate the script; directing each computing device of a second one or more aggregating computing devices to aggregate the subsets of results by a key value, wherein a hash of the key value is used to perform partitioning the subsets of results by a key value to omit at least one timestamp from the first one or more querying computing devices; and transmitting aggregated results into at least one bucket wherein the at least one bucket was generated based on a predetermined time schedule to facilitate time-based queries and wherein each of the at least one bucket includes at least one sub-bucket that receives files of a specified attribute and wherein the at least one bucket is made accessible to a user who wrote the query or another user.

11. The computer-implemented method of claim 10, wherein the subsets of files are assigned to the first one or more querying computing devices in a round robin fashion.

12. The computer-implemented method of claim 10, wherein a value of a hash of the key value is used to determine which computing device of the second one or more aggregating computing devices is directed to aggregate results related to that key value.

13. The computer-implemented method of claim 10, wherein the statistical summary information is a count of numbers of files that satisfy the query, grouped by value of a provided key field.

14. The computer-implemented method of claim 10, wherein the statistical summary information is a sum of values, count of values, standard deviation of values, or other statistical property of values in a provided key field, grouped by value of another key field.

15. The computer-implemented method of claim 10, wherein a k-means clustering is performed during aggregation of the subsets of results to generate a classifying field in output of the query.

16. The computer-implemented method of claim 10, wherein the query comprises one or more enrichment fields specifying replacement of values from a key field in the query with a replacement value from an external data source.

17. The computer-implemented method of claim 10, wherein the aggregated results are stored in the storage made available for an end user to download and process in any order at the end user's convenience.

Patent Metadata

Filing Date

Unknown

Publication Date

January 14, 2025

Inventors

Luke A. Higgins

Robert R. Bruno

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search