10528538

Leveraging SQL with User Defined Aggregation to Efficiently Merge Inverted Indexes Stored as Tables

PublishedJanuary 7, 2020
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
22 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A method comprising: a DBMS storing an index table and a staging table that stores changes to be made to said index table; said DBMS executing an execution plan for executing a query for updating said index table, said query specifying: to group rows from said staging table into groups according to a grouping key, each group of said groups comprising respective one or more rows; to apply an aggregate operator that is defined by a database dictionary of said DBMS to said groups; wherein executing said execution plan comprises executing said aggregate operator, wherein executing said aggregator operator includes: for each group of said groups, updating one or more rows of said index table based on changes recorded in the respective one or more rows comprising said each group.

Plain English Translation

This invention relates to database management systems (DBMS) and specifically to optimizing the process of updating index tables using a staging table. The problem addressed is the inefficiency in applying aggregate operations to index tables when changes are staged in a separate table before being committed. Traditional methods often require multiple passes over the data or complex joins, leading to performance bottlenecks. The invention describes a method where a DBMS maintains an index table and a staging table that stores pending changes to the index table. When a query is executed to update the index table, the DBMS generates an execution plan that groups rows from the staging table by a specified grouping key. Each group of rows is then processed by an aggregate operator defined in the database dictionary. The aggregate operator updates the corresponding rows in the index table based on the changes recorded in the grouped staging table rows. This approach reduces the number of operations needed by directly applying aggregated changes from the staging table to the index table, improving efficiency and performance. The method ensures that updates are applied in bulk, minimizing the overhead associated with individual row updates.

Claim 2

Original Legal Text

2. The method of claim 1 , wherein said execution plan specifies multiple work granules, each work granule of said multiple work granules specifying to aggregate said groups of one or more rows; and wherein executing said execution plan includes multiple processes executing said multiple work granules in parallel.

Plain English Translation

This invention relates to database systems and methods for efficiently processing and aggregating large datasets. The problem addressed is the computational inefficiency in aggregating groups of rows in a database, particularly when dealing with large datasets that require significant processing time and resources. The method involves generating an execution plan that divides the aggregation task into multiple work granules. Each work granule specifies how to aggregate groups of one or more rows from the dataset. The execution plan is designed to distribute these work granules across multiple processes, which execute them in parallel. This parallel execution allows for faster processing by leveraging multiple computational resources simultaneously. The aggregation results from each process are then combined to produce the final output. By breaking down the aggregation task into smaller, parallelizable units, the method improves performance and reduces the time required to process large datasets. This approach is particularly useful in database systems where efficient data aggregation is critical for query performance and resource utilization. The parallel execution of work granules ensures that the system can handle large-scale data processing tasks more effectively.

Claim 3

Original Legal Text

3. The method of claim 2 wherein work granules for each group are executed by the same process.

Plain English Translation

A system and method for parallel processing of tasks involves dividing a workload into multiple groups of work granules, where each group is assigned to a separate process for execution. The method ensures that all work granules within a single group are executed by the same process, preventing fragmentation of related tasks across different processes. This approach improves efficiency by reducing inter-process communication overhead and maintaining data locality, particularly in distributed or multi-core computing environments. The system dynamically assigns groups to processes based on workload characteristics, such as task dependencies or resource requirements, to optimize performance. By enforcing same-process execution for grouped granules, the method minimizes synchronization delays and enhances parallel processing throughput. The technique is applicable in high-performance computing, cloud-based task scheduling, and real-time data processing systems where task grouping and process affinity are critical for performance.

Claim 4

Original Legal Text

4. The method of claim 1 , wherein executing said aggregate operator further includes committing one or more updates to said index table.

Plain English Translation

A system and method for efficiently processing aggregate operations in a database environment, particularly for handling large-scale data analytics. The invention addresses the challenge of optimizing performance when executing aggregate functions (e.g., SUM, AVG, COUNT) on large datasets, which often requires significant computational resources and can lead to bottlenecks in query execution. The method involves executing an aggregate operator that processes data from a source table and generates results stored in an index table. The index table is structured to accelerate subsequent queries by precomputing and storing aggregated values, reducing the need for repeated full-table scans. The method further includes committing updates to the index table as part of the aggregate operation, ensuring that the index remains synchronized with the source data. This synchronization step is critical for maintaining data consistency, especially in dynamic environments where the source table is frequently updated. The approach improves query performance by leveraging precomputed aggregates while ensuring the index table reflects the latest data state. The invention is particularly useful in data warehousing, real-time analytics, and systems requiring low-latency responses to aggregate queries.

Claim 5

Original Legal Text

5. The method of claim 1 , wherein executing said aggregate operator further includes deleting rows from said staging table.

Plain English Translation

Technical Summary: This invention relates to data processing systems, specifically methods for managing data in staging tables during aggregation operations. The problem addressed is the inefficiency and resource consumption in handling large datasets during aggregation, particularly when intermediate results need to be stored and processed. The method involves executing an aggregate operator on a dataset stored in a staging table. The aggregate operator performs computations such as summing, averaging, or counting values across rows of the staging table. A key feature is the ability to delete rows from the staging table during this aggregation process. This deletion step helps optimize storage and processing by removing rows that are no longer needed after their values have been aggregated, reducing the overall data volume and improving performance. The method may also include additional steps such as filtering rows before aggregation, applying transformations to the data, and managing the staging table's structure to ensure efficient access and modification. The deletion of rows is performed in a way that maintains data integrity, ensuring that only rows that have been fully processed and are no longer required are removed. This approach is particularly useful in systems where staging tables are used as temporary storage for large datasets during batch processing or real-time analytics. By integrating row deletion into the aggregation process, the method reduces memory and storage overhead, accelerates subsequent operations, and enhances the overall efficiency of data processing workflows.

Claim 6

Original Legal Text

6. The method of claim 1 , further comprising: the DBMS storing a second staging table; and in response to executing the query, storing changes to said index table in said second staging table.

Plain English Translation

A database management system (DBMS) is used to manage and query structured data, often requiring indexing to improve query performance. However, maintaining indexes during data modifications can be resource-intensive, leading to performance bottlenecks. This invention addresses the challenge of efficiently managing index updates in a DBMS by introducing a staging table mechanism to handle changes before they are applied to the primary index table. The system includes a DBMS configured to store a primary index table and a second staging table. When a query is executed, the DBMS processes the query and stores the resulting changes to the index table in the second staging table instead of directly updating the primary index table. This staging approach allows for batch processing, reducing the frequency of direct index updates and minimizing the performance impact on the database. The staging table can later be used to consolidate changes and apply them to the primary index table in a controlled manner, improving overall system efficiency. This method ensures that index updates are managed without disrupting ongoing database operations, enhancing query performance and system stability.

Claim 7

Original Legal Text

7. The method of claim 1 wherein updating one or more rows of said index table comprises appending data to said one or more rows of said index table.

Plain English Translation

This invention relates to database indexing, specifically improving the efficiency of updating index tables. The problem addressed is the computational overhead and performance degradation that occurs when modifying index tables, particularly in large-scale databases where frequent updates are required. Traditional methods often involve complex operations like row deletions and reinsertions, which can be resource-intensive and slow down system performance. The invention provides a method for updating index tables by appending data to existing rows rather than performing full row replacements or deletions. This approach reduces the computational load by avoiding the need to reindex or restructure the table. The method involves identifying one or more rows in the index table that require updates and then appending new data to those rows. This appending process can include adding new fields, modifying existing values, or extending the row structure without disrupting the overall table integrity. The appended data is then integrated into the index table, ensuring that the updated information is accurately reflected in subsequent queries. This technique is particularly useful in scenarios where index tables are frequently updated, such as in real-time analytics or transactional systems, as it minimizes latency and maintains high performance. The method can be applied to various types of index tables, including B-trees, hash indexes, and other structured data storage formats. By simplifying the update process, the invention enhances database efficiency and scalability.

Claim 8

Original Legal Text

8. The method of claim 7 further comprising: determining said one or more rows has insufficient space to append data; and creating a new row in said index table to store said data.

Plain English Translation

A system and method for managing data storage in an index table involves dynamically allocating space for data entries. The index table is structured with multiple rows, each containing one or more data entries. When new data needs to be appended to an existing row, the system first checks whether the row has sufficient remaining space. If the row lacks adequate space, the system automatically creates a new row in the index table to accommodate the data. This approach ensures efficient use of storage by preventing data fragmentation and maintaining organized data allocation. The method supports scalable data management by dynamically adjusting the table structure based on storage requirements, improving performance and reducing the need for manual intervention. The system may also include additional features such as tracking row capacity and optimizing data distribution across rows to enhance storage efficiency. This solution addresses challenges in database management where static row sizes can lead to wasted space or inefficient data placement.

Claim 9

Original Legal Text

9. The method of claim 1 wherein executing said aggregate operator comprises, for each group in said groups: iterating through each row of a respective one or more rows; determining one or more changes recorded in said each row; and updating a row of said index table based on the one or more changes recorded in said each row.

Plain English Translation

This invention relates to database systems, specifically methods for efficiently updating index tables in response to changes in grouped data. The problem addressed is the computational overhead and inefficiency in maintaining accurate index tables when processing large datasets with grouped operations, such as aggregations or transformations. The method involves executing an aggregate operator on a dataset divided into groups. For each group, the method iterates through each row of the group's associated rows, identifies any recorded changes in those rows, and updates a corresponding row in an index table based on those changes. This ensures the index table reflects the latest state of the data after processing. The approach optimizes performance by minimizing redundant computations and reducing the number of updates required to maintain index consistency. The method may also include pre-processing steps to identify groups and their associated rows, as well as post-processing steps to finalize the index table. The aggregate operator can perform various operations, such as summing, averaging, or counting values within each group. The index table serves as a summary or reference structure, enabling faster queries or subsequent operations on the grouped data. This technique is particularly useful in large-scale data processing systems where maintaining accurate indexes is critical for performance.

Claim 10

Original Legal Text

10. The method of claim 1 wherein the DBMS periodically automatically executes said query.

Plain English Translation

Technical Summary: This invention relates to database management systems (DBMS) and addresses the need for automated, periodic execution of database queries without manual intervention. The method involves a DBMS that automatically runs a predefined query at scheduled intervals, eliminating the need for users to manually trigger the query each time data needs to be retrieved or processed. This automation is particularly useful in environments where data must be consistently monitored, analyzed, or reported on, such as financial systems, inventory tracking, or real-time analytics. The periodic execution ensures that the latest data is always available without requiring user interaction, improving efficiency and reducing the risk of human error. The system can be configured to adjust the query execution frequency based on system load, data change rates, or other performance metrics to optimize resource usage. This approach enhances reliability and consistency in data processing workflows.

Claim 11

Original Legal Text

11. The method of claim 1 wherein the index table is an inverted index and said grouping key corresponds to keywords mapped by the inverted index.

Plain English Translation

This invention relates to data processing systems, specifically methods for organizing and retrieving data using an inverted index. The problem addressed is the efficient storage and retrieval of data in large datasets, particularly when searching for specific keywords or terms. The method involves creating an index table that maps keywords to their locations in the dataset, allowing for rapid search and retrieval operations. The index table is structured as an inverted index, which is a data structure that maps keywords to the documents or data entries where they appear. This allows for quick lookup of relevant data when a keyword is queried. The method further includes grouping data entries based on a grouping key, where the grouping key corresponds to the keywords mapped by the inverted index. This grouping facilitates efficient data organization and retrieval, as related data entries can be accessed together based on shared keywords. The method may also involve generating the inverted index by scanning the dataset to identify keywords and recording their locations. The grouping key is used to categorize data entries, ensuring that entries with the same or related keywords are grouped together. This improves search performance by reducing the number of lookups required to retrieve relevant data. The method is particularly useful in applications such as search engines, databases, and information retrieval systems where fast and accurate data access is critical.

Claim 12

Original Legal Text

12. One or more non-transitory computer-readable media storing instructions, wherein the instructions include: instructions which, when executed by one or more hardware processors, cause a DBMS storing an index table and a staging table that stores changes to be made to said index table; instructions which, when executed by one or more hardware processors, cause said DBMS executing an execution plan for executing a query for updating said index table, said query specifying: to group rows from said staging table into groups according to a grouping key, each group of said groups comprising respective one or more rows; to apply an aggregate operator that is defined by a database dictionary of said DBMS to said groups; wherein executing said execution plan comprises executing said aggregate operator, wherein executing said aggregator operator includes: for each group of said groups, updating one or more rows of said index table based on changes recorded in the respective one or more rows comprising said each group.

Plain English Translation

This invention relates to database management systems (DBMS) that handle updates to index tables using a staging table to record pending changes. The problem addressed is efficiently applying batch updates to an index table while maintaining data consistency and performance. The system stores an index table and a staging table, where the staging table holds changes to be applied to the index table. The DBMS executes a query to update the index table by grouping rows from the staging table based on a grouping key. Each group of rows is processed using an aggregate operator defined in the database dictionary. The aggregate operator processes each group, updating one or more rows in the index table based on the changes recorded in the corresponding staging table rows. This approach ensures that updates are applied in a structured, efficient manner, reducing the overhead of individual row updates and improving overall database performance. The system leverages the DBMS's execution plan to optimize the update process, ensuring consistency and minimizing resource usage.

Claim 13

Original Legal Text

13. The one or more non-transitory computer-readable media of claim 12 , wherein said execution plan specifies multiple work granules, each work granule of said multiple work granules specifying to aggregate said groups of one or more rows; and wherein executing said execution plan includes multiple processes executing said multiple work granules in parallel.

Plain English Translation

This invention relates to database systems and specifically to optimizing the execution of aggregation operations on large datasets. The problem addressed is the inefficiency of traditional single-threaded or sequential aggregation processes, which can be slow and resource-intensive when handling large volumes of data. The solution involves dividing the aggregation task into smaller, parallelizable units called work granules. Each work granule specifies how to aggregate groups of rows from a database table. The execution plan for the aggregation operation includes multiple such work granules, allowing different processes to execute them concurrently. This parallel execution significantly improves performance by distributing the workload across multiple processors or cores, reducing the overall time required to complete the aggregation. The system dynamically generates and assigns these work granules based on the structure of the data and the available computational resources, ensuring efficient utilization of hardware capabilities. This approach is particularly useful in large-scale data processing environments where fast aggregation of grouped data is critical for analytics and reporting.

Claim 14

Original Legal Text

14. The one or more non-transitory computer-readable media of claim 13 wherein work granules for each group are executed by the same process.

Plain English Translation

This invention relates to parallel computing systems and methods for managing workload distribution. The problem addressed is inefficient execution of parallel tasks due to uneven workload distribution, which can lead to resource underutilization and performance bottlenecks. The solution involves grouping work granules (small units of computational work) and assigning each group to a single process for execution. This ensures that related work granules are processed together, reducing overhead from context switching and improving cache efficiency. The system dynamically adjusts group sizes based on workload characteristics to optimize performance. By executing all granules in a group within the same process, the invention minimizes inter-process communication and synchronization overhead, leading to faster task completion. The approach is particularly useful in high-performance computing environments where workload balancing is critical. The invention may also include mechanisms to monitor process performance and redistribute work granules if imbalances are detected. This method enhances scalability and efficiency in parallel computing systems by ensuring that computational resources are used optimally.

Claim 15

Original Legal Text

15. The one or more non-transitory computer-readable media of claim 12 , wherein executing said aggregate operator further includes committing one or more updates to said index table.

Plain English Translation

A system and method for optimizing database operations involves managing an index table that stores data entries and associated metadata. The index table is used to efficiently retrieve and update data in a database, particularly in distributed or large-scale systems where performance is critical. The system includes an aggregate operator that processes data entries in the index table to perform operations such as counting, summing, or averaging values. During execution, the aggregate operator commits updates to the index table, ensuring that changes are persistently stored and reflected in subsequent queries. This allows for real-time data consistency and accuracy, which is essential for applications requiring up-to-date information. The system may also include mechanisms to handle concurrent updates, ensuring data integrity even when multiple processes or users modify the index table simultaneously. By committing updates during aggregate operations, the system reduces latency and improves overall database performance, making it suitable for high-throughput environments. The method may further involve validating the updates before committing them to prevent errors or inconsistencies.

Claim 16

Original Legal Text

16. The one or more non-transitory computer-readable media of claim 12 , wherein executing said aggregate operator further includes deleting rows from said staging table.

Plain English Translation

A system and method for managing data in a database involves processing data using aggregate operators to transform and store information in a staging table. The staging table temporarily holds data before it is moved to a target table. The aggregate operators perform operations such as summing, averaging, or counting values from the staging table. In addition to these operations, the system includes functionality to delete specific rows from the staging table as part of the aggregate operation. This deletion process ensures that only relevant or processed data remains in the staging table, optimizing storage and improving efficiency. The system may also include mechanisms to validate data before deletion and to log changes for audit purposes. The deletion of rows is integrated into the aggregate operation, allowing for a streamlined data processing workflow. This approach reduces the need for separate cleanup steps and ensures that the staging table remains clean and up-to-date, enhancing overall database performance.

Claim 17

Original Legal Text

17. The one or more non-transitory computer-readable media of claim 12 , the instructions further including: instructions which, when executed by one or more hardware processors, cause the DBMS storing a second staging table; and instructions which, when executed by one or more hardware processors, cause in response to executing the query, storing changes to said index table in said second staging table.

Plain English Translation

This invention relates to database management systems (DBMS) and addresses the challenge of efficiently handling index updates during query execution. The system involves a database management system that manages a primary index table and a staging table. The primary index table stores index data used for query optimization and execution. The staging table is used to temporarily store changes to the index table that occur during query execution. When a query is executed, the DBMS processes the query and updates the index table. Instead of directly modifying the primary index table, the changes are first stored in the staging table. This approach allows for efficient and consistent index updates, reducing the risk of data corruption or performance degradation during concurrent operations. The use of a staging table ensures that index updates are atomic and can be rolled back if necessary, improving the reliability of the database system. The system is particularly useful in high-performance database environments where maintaining data integrity and query efficiency are critical.

Claim 18

Original Legal Text

18. The one or more non-transitory computer-readable media of claim 12 wherein updating one or more rows of said index table comprises appending data to said one or more rows of said index table.

Plain English Translation

This invention relates to database indexing systems, specifically methods for efficiently updating index tables to improve query performance. The problem addressed is the computational overhead and latency associated with traditional index updates, which often require full table scans or complex restructuring operations. The solution involves a system that updates index tables by appending data to existing rows rather than modifying them in place. This approach reduces the need for costly rewrite operations and minimizes lock contention during concurrent access. The index table is structured to store multiple versions of data, allowing for efficient point-in-time queries while maintaining consistency. The system also includes mechanisms to manage storage growth by periodically consolidating appended data into optimized structures. This method is particularly useful in high-throughput environments where frequent updates are required, such as financial transaction systems or real-time analytics platforms. By avoiding in-place modifications, the system ensures faster update operations and improved scalability. The invention may be implemented in relational databases, NoSQL systems, or distributed storage architectures.

Claim 19

Original Legal Text

19. The one or more non-transitory computer-readable media of claim 18 further comprising: instructions which, when executed by one or more hardware processors, cause determining said one or more rows has insufficient space to append data; and instructions which, when executed by one or more hardware processors, cause creating a new row in said index table to store said data.

Plain English Translation

This invention relates to database systems, specifically managing index tables in a database. The problem addressed is efficiently handling data storage when existing rows in an index table lack sufficient space to accommodate new data entries. Traditional approaches may either waste storage by over-allocating space or fail to handle dynamic data growth effectively. The invention provides a solution by dynamically determining when a row in an index table has insufficient space to append additional data. When this condition is detected, the system automatically creates a new row in the index table to store the data. This ensures efficient use of storage while maintaining data integrity and performance. The system includes instructions executed by hardware processors to perform these operations, enabling real-time adjustments to the index table structure based on data requirements. The invention also involves maintaining the index table's structure, ensuring that data is properly organized and accessible. By dynamically allocating new rows when needed, the system avoids the inefficiencies of static row allocation while preventing data loss or corruption due to insufficient space. This approach is particularly useful in databases where data volume and types are variable, requiring flexible storage management. The solution enhances database performance by reducing the need for costly reindexing or restructuring operations.

Claim 20

Original Legal Text

20. The one or more non-transitory computer-readable media of claim 12 wherein executing said aggregate operator comprises, for each group in said groups: iterating through each row of a respective one or more rows; determining one or more changes recorded in said each row; and updating a row of said index table based on the one or more changes recorded in said each row.

Plain English Translation

This invention relates to database systems, specifically to optimizing the processing of aggregate operations on grouped data. The problem addressed is the inefficiency in updating index tables when performing aggregate operations on large datasets, particularly when changes occur in the underlying data. Traditional methods often require reprocessing entire datasets, leading to high computational overhead and slow performance. The invention provides a method for efficiently updating an index table during aggregate operations. The process involves grouping data into one or more groups, where each group contains one or more rows. For each group, the system iterates through each row, identifies any recorded changes, and updates the corresponding row in the index table based on those changes. This approach ensures that only the affected rows are processed, reducing unnecessary computations and improving performance. The aggregate operator may include operations such as summing, averaging, or counting values within each group, and the index table is dynamically updated to reflect these aggregates. By processing changes incrementally rather than reprocessing the entire dataset, the invention minimizes computational overhead and enhances the efficiency of database operations, particularly in systems handling large-scale data with frequent updates.

Claim 21

Original Legal Text

21. The one or more non-transitory computer-readable media of claim 12 wherein the DBMS periodically automatically executes said query.

Plain English Translation

A database management system (DBMS) is configured to automatically and periodically execute a predefined query without manual intervention. The system includes a query execution module that retrieves data from a database based on the query and generates a result set. The query may be predefined by a user or administrator and stored within the DBMS. The periodic execution is scheduled according to a configurable time interval, ensuring the query runs at specified times to monitor or analyze data changes. The system may also include a notification module that alerts users or other systems when the query results meet certain criteria, such as exceeding a threshold value. The periodic execution ensures timely data retrieval and analysis, reducing the need for manual intervention and improving efficiency in monitoring database changes. The system may further support multiple queries, each with its own execution schedule, allowing for comprehensive and automated data monitoring across different datasets. This approach is particularly useful in environments where continuous or scheduled data analysis is required, such as financial reporting, system health monitoring, or compliance tracking.

Claim 22

Original Legal Text

22. The one or more non-transitory computer-readable media of claim 12 wherein the index table is an inverted index and said grouping key corresponds to keywords mapped by the inverted index.

Plain English Translation

The invention relates to data processing systems that use inverted indexes for efficient data retrieval. Inverted indexes map keywords to their locations in a dataset, allowing fast searches. The invention improves this by grouping data entries based on a grouping key, where the grouping key corresponds to keywords mapped by the inverted index. This enables more efficient querying and filtering of data by leveraging the inverted index structure. The system includes a data storage component that stores data entries, each associated with one or more keywords. An inverted index maps these keywords to their respective data entries. A grouping module processes the data entries, assigning them to groups based on the grouping key, which is derived from the keywords in the inverted index. This allows for optimized data retrieval, where queries can be resolved by accessing only the relevant groups of data entries, reducing search time and computational overhead. The invention is particularly useful in large-scale data systems where fast and efficient data retrieval is critical, such as search engines, databases, or data analytics platforms.

Patent Metadata

Filing Date

Unknown

Publication Date

January 7, 2020

Inventors

Zhen Hua Liu
Aleksandra Czarlinska
Douglas James McMahon
Asha Makur

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, FAQs, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “LEVERAGING SQL WITH USER DEFINED AGGREGATION TO EFFICIENTLY MERGE INVERTED INDEXES STORED AS TABLES” (10528538). https://patentable.app/patents/10528538

© 2026 Nomic Interactive Technology LLC. Machine-readable context available at /api/llm-context/10528538. See llms.txt for full attribution policy.

LEVERAGING SQL WITH USER DEFINED AGGREGATION TO EFFICIENTLY MERGE INVERTED INDEXES STORED AS TABLES