Various embodiments relate generally to data science and data analysis, computer software and systems, and wired and wireless network communications to provide an interface between repositories of disparate datasets and computing machine-based entities that seek access to the datasets, and, more specifically, to a computing and data storage platform that facilitates consolidation of one or more datasets, whereby a collaborative data layer and associated logic facilitate, for example, efficient access to, and implementation of, collaborative datasets. In some examples, a method may include receiving data representing a query of a consolidated dataset that may include datasets formatted atomized datasets, analyzing the query to classify portions of the query to form classified query portions, partitioning the query into sub-queries as a function of a classification type for each of the classified query portions, and retrieving data representing a query result from distributed data repositories.
Legal claims defining the scope of protection, as filed with the USPTO.
1. A method comprising: receiving, into a collaborative dataset consolidation system, data representing a query of a consolidated dataset comprising a plurality of datasets formatted as plurality of atomized datasets; analyzing the query to classify portions of the query to form classified query portions; partitioning the query into a plurality of sub-queries as a function of a classification type associated with each of the classified query portions, at least one classified query portion being classified by a type of dataset; determining a type of triple store loaded based on the type of dataset; executing the query and the sub-queries as a federated query to one or more of the plurality of datasets using one or more triple stores stored in one or more distributed data repositories configured to be stored and accessed by the collaborative data consolidation system, each of the one or more triple stores being further formatted to query using a format associated with at least one of the one or more of the plurality of data sets, at least one of the one or more triple stores being the type of triple store; and retrieving responsive data representing a query result returned in response to the query or sub-queries from at least one of the one or more distributed data repositories.
2. The method of claim 1 further comprising: rewriting the query as a plurality of sub-queries.
3. The method of claim 1 wherein receiving the data representing the query comprises: receiving data representing a federated query.
4. The method of claim 3 wherein receiving the data representing the federated query comprises: receiving requests to access different computer-based repositories.
5. The method of claim 1 further comprising: identifying a number of atomized datasets associated with the query.
6. The method of claim 5 further comprising: implementing per-dataset permissions to determine whether a user account is authorized to query each of the atomized datasets.
7. The method of claim 5 further comprising: identifying an identifier associated with the query; determining authorization data between the identifier and each of atomized datasets in the number of atomized datasets; and granting authorization to apply the query to the number of atomized datasets based on authorization to each of the atomized datasets.
8. The method of claim 1 wherein partitioning the query into the plurality of sub-queries comprises: identifying the classification type of a sub-query identifies a type of repository; and transmitting the sub-query to a repository identified as the type of repository.
9. The method of claim 8 wherein the type of repository is a function of an architecture influencing operational characteristics of the repository.
10. The method of claim 1 wherein partitioning the query into the plurality of sub-queries comprises: identifying the classification type of a sub-query identifies an external repository that is external to the collaborative dataset consolidation system; and transmitting the sub-query to the external repository.
11. The method of claim 10 wherein identifying the classification type identifies the external repository comprises: determine that the query includes a dataset disposed in the external repository and linked to other atomized datasets associated with the query.
12. The method of claim 1 wherein partitioning the query into the plurality of sub-queries comprises: identifying the classification type of a sub-query identifies a type of query; identifying a repository to receive the sub-query having the type of query; and transmitting the sub-query to a repository having the repository.
13. The method of claim 12 wherein identifying the classification type identifies the external repository comprises: determine that the query includes a dataset disposed in the external repository and linked to other atomized datasets associated with the query.
14. The method of claim 1 wherein retrieving the data representing the query result from distributed data repositories comprises: retrieving the data representing the query result from triple store data repositories.
15. The method of claim 1 wherein receiving the data representing the query comprises: receiving the query from a computing device associated with data representing a user account.
16. The method of claim 1 wherein receiving the data representing the query comprises: receiving the query from a computing device associated with an external computing system hosting an external dataset linked to one of the plurality of atomized datasets.
17. The method of claim 16 wherein receiving the query from the external computing system comprises: performing a query-level authorization process.
18. The method of claim 1 wherein retrieving the data representing the query result from distributed data repositories comprises: retrieving the data representing the query result from triple store data repositories.
19. A collaborative dataset consolidation system, comprising: a data store configured to receive, into the collaborative dataset consolidation system, data representing a query of a consolidated dataset comprising a plurality of datasets formatted as plurality of atomized datasets; and a dataset query engine configured to analyze the query to classify portions of the query to form classified query portions, to partition the query into a plurality of sub-queries as a function of a classification type associated with each of the classified query portions, at least one classified query portion being classified by a type of dataset, to determine a type of triple store loaded based on the type of dataset, to execute the query and the sub-queries as a federated query to one or more of the plurality of datasets using one or more triple stores stored in one or more distributed data repositories configured to be stored and accessed by the collaborative data consolidation system, each of the one or more triple stores being further formatted to query using a format associated with at least one of the one or more of the plurality of data sets, at least one of the one or more triple stores being the type of triple store, and to retrieve responsive data representing a query result returned in response to the query or sub-queries from at least one of the one or more distributed data repositories.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
June 19, 2016
June 18, 2019
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.