Legal claims defining the scope of protection, as filed with the USPTO.
1. An automatic information integration flow optimizer apparatus comprising: an input/output port configured to connect the information integration flow optimizer to extract-transform-load (ETL) tools, and to receive a tool-specific input file; and a processor configured to execute computer-readable instructions, the computer-readable instructions comprising: a parser unit configured to parse the tool-specific input file into semantics, and to create a tool-agnostic input file containing rich semantics of at least one of datasets, implementations, schema, operators, database management systems, or ETL tools; a converter unit configured to transform the tool-agnostic input file into an input directed acyclic graph (DAG); and a quality objective (QoX) driven optimizer unit configured to apply one or more heuristic algorithms to the tool-agnostic input DAG to develop an optimum information integration flow design based on the rich semantics.
2. The apparatus of claim 1 , wherein the QoX-driven optimizer unit is configured to provide an output DAG representing the optimum information integration flow design.
3. The apparatus of claim 2 , wherein the converter unit and parser unit are configured to transform the output DAG to a tool-specific output file.
4. The apparatus of claim 1 , wherein the optimum information integration flow design is optimized for multiple goals.
5. The apparatus of claim 1 , wherein the QoX-driven optimizer unit is configured to designate portions of the optimum information integration flow design to be respectively performed in one or more of the ETL tools.
6. The apparatus of claim 1 , wherein the QoX-driven optimizer unit is configured to examine functionality properties of an information integration flow operator provided by the tool-agnostic input file.
7. The apparatus of claim 1 , wherein the QoX-driven optimizer unit is configured to choose among one or more specific execution instances related to a dataset to be processed by the physical information integration flow.
8. The apparatus of claim 1 , wherein the QoX-driven optimizer unit is configured to partition a dataset to be processed by the physical information integration flow based on schema properties.
9. The apparatus of claim 1 , wherein the QoX-driven optimizer unit is configured to optimize the physical information integration flow based on implementation properties presented in the tool-agnostic input file.
10. A method for automatically optimizing an information integration flow, the method comprising the steps of: receiving a tool-specific input file representing a physical information integration flow; parsing the tool-specific input file to identify semantics of the physical information integration flow; creating a tool-agnostic input file containing rich semantics of at least one of datasets, implementations, schema, operators, database management systems, or ETL tools; transforming the tool-agnostic input file into an input directed acyclic graph (DAG); providing the input DAG to a quality objective (QoX) driven optimizer unit; and applying one or more heuristic algorithms to the input DAG to develop an optimum information integration flow design based on the rich semantics.
11. The method of claim 10 , further including the steps of: searching possible information integration design space guided by forecasting metrics of quality objectives identified for a proposed information integration flow design; and selecting the optimized information integration flow design based on evaluating the metrics.
12. The method of claim 11 , further including the steps of: identifying dependencies and relationships among the metrics; and evaluating tradeoffs among the metrics based on the dependencies and relationships to determine the optimized information integration flow design, wherein the optimized information integration flow design is optimized for multiple goals.
13. The method of claim 10 , further including the step of representing the optimized information integration flow design as an output DAG.
14. The method of claim 13 , further including the steps of: converting and parsing the output DAG into a tool-specific output file; and providing portions of the tool-specific output file to ETL tools as designated in the optimum information integration flow design.
15. The method of claim 10 , further including the step of examining functionality properties of a information integration operator provided by the tool-agnostic input file.
16. The method of claim 10 , further including the step of choosing among one or more specific execution instances related to a dataset to be processed by the physical information integration flow.
17. The method of claim 10 , further including the step of partitioning a dataset to be processed by the physical information integration flow based on schema properties.
18. The method of claim 10 , further including the step of optimizing the physical information integration flow based on implementation properties presented in the tool-agnostic input file.
19. A non-transitory, tangible computer readable medium comprising: an executable computer program code configured to instruct a system to automatically optimize an information integration flow, the executable computer program code comprising the steps of: receiving a tool-specific input file representing a physical information integration flow; parsing the tool-specific input file to identify semantics of the physical information integration flow; creating a tool-agnostic input file containing rich semantics of at least one of datasets, implementations, schema, operators, database management systems, or ETL tools; transforming the tool-agnostic input file into an input directed acyclic graph (DAG); providing the input DAG to a quality objective (QoX) driven optimizer unit; and applying one or more heuristic algorithms to the input DAG to develop an optimum information integration flow design based on the rich semantics.
20. The non-transitory, tangible computer readable medium of claim 19 , wherein the executable computer program code further comprises the steps of: choosing among one or more specific execution instances related to a dataset to be processed by the physical information integration flow; partitioning a dataset to be processed by the physical information integration flow based on schema properties; and optimizing the physical information integration flow based on implementation properties presented in the tool-agnostic input file.
Unknown
September 17, 2013
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.