10713249

Managing Snapshots and Application State in Micro-Batch Based Event Processing Systems

PublishedJuly 14, 2020
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
17 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A method for managing snapshots created from a Continuous Query Language (CQL) engine, comprising: receiving, by a computing device, a continuous query; applying, by the computing device, a Directed Acrylic Graph (DAG) transformation to the continuous query to generate a query plan for the continuous query, wherein the query plan is an ordered set of steps used to access data for processing of the continuous query; applying, by the computing device, a CQL transformation to the query plan to generate a transformed query plan; receiving, by a computing device, a micro-batch stream of input events related to an application; processing, by the computing device, the input events using the CQL engine to generate a set of output events related to the application, wherein the processing comprises: performing, by the CQL engine, incremental computation on each of the input events of the micro-batch stream for the continuous query based at least in part on the transformed query plan; and creating, by the CQL engine, output events for each of the input events of the micro-batch stream, wherein the set of output events comprise the output events for each of the input events of the micro-batch stream; generating, by the computing device and using a snapshot management algorithm implemented by the CQL engine, a snapshot of a current state of a system based at least in part on the set of output events related to the application; generating, by the computing device, a first directory structure to access snapshot information associated with the snapshot of the current state of the system; generating, by the computing device, a second directory structure to generate a list of snapshots associated with the current state of the system; and determining, by the computing device, based at least in part on the snapshot management algorithm, a process to get, add, or clean the list of snapshots associated with the current state of the system.

Plain English translation pending...
Claim 2

Original Legal Text

2. The method of claim 1 , wherein the micro-batch stream is a continuous stream of data discretize into sub-second micro-batches.

Plain English Translation

A method for processing data streams involves handling a continuous stream of data that is divided into sub-second micro-batches. The data stream is segmented into these micro-batches, each containing a small subset of the continuous data flow, with each micro-batch processed within a fraction of a second. This approach enables real-time or near-real-time data processing, allowing for rapid analysis, decision-making, or further transmission of the data. The method is particularly useful in applications requiring low-latency processing, such as financial transactions, sensor data monitoring, or real-time analytics. By discretizing the continuous data into micro-batches, the system can efficiently manage and process the data in smaller, more manageable chunks while maintaining the integrity and continuity of the original stream. This technique helps mitigate delays and ensures timely processing, which is critical for time-sensitive operations. The method may also include additional steps such as filtering, aggregation, or transformation of the micro-batches to prepare the data for downstream applications or storage. The use of sub-second micro-batches allows for fine-grained control over data processing, improving accuracy and responsiveness in dynamic environments.

Claim 3

Original Legal Text

3. The method of claim 1 , further comprising storing, by the computing device, the set of output events related to the application in an output queue; and transmitting, by the computing device, the output events in the output queue when all of the input events have been processed.

Plain English Translation

This invention relates to event processing systems in computing environments, specifically addressing the challenge of managing input and output events in applications to ensure proper sequencing and synchronization. The system processes a set of input events associated with an application, where these events may include user interactions, system triggers, or other asynchronous inputs. The processing involves executing one or more operations based on the input events, generating a corresponding set of output events. These output events are stored in an output queue, which holds them until all input events have been fully processed. Once processing is complete, the output events are transmitted in the order they were generated, ensuring that the application's responses are synchronized with the input events. This approach prevents premature transmission of output events, which could lead to inconsistencies or errors in the application's behavior. The system may also include mechanisms to track the processing status of input events and manage the output queue dynamically, ensuring efficient and reliable event handling. The invention is particularly useful in distributed systems, real-time applications, and scenarios where event ordering is critical.

Claim 4

Original Legal Text

4. The method of claim 3 , wherein the micro-batch stream comprises micro-batches of data or Resilient Distributed Datasets (RDDs).

Plain English Translation

A method for processing data streams involves handling micro-batches of data or Resilient Distributed Datasets (RDDs) in a distributed computing environment. The method addresses the challenge of efficiently processing large-scale data streams by breaking them into smaller, manageable units called micro-batches. These micro-batches or RDDs are processed in parallel across multiple nodes in a distributed system, ensuring fault tolerance and scalability. The use of micro-batches allows for incremental processing, reducing latency and improving throughput compared to batch processing. RDDs, which are immutable distributed collections of objects, enable efficient fault recovery by tracking lineage information, allowing lost partitions to be recomputed rather than reprocessed from scratch. This approach is particularly useful in big data applications where real-time or near-real-time processing is required, such as in stream analytics, machine learning, and large-scale data transformations. The method ensures that data is processed reliably and efficiently, even in the presence of node failures or network issues.

Claim 5

Original Legal Text

5. The method of claim 4 , wherein the processing each of the input events comprises performing a computation on each of the input based at least in part on the transformed query plan.

Plain English Translation

This invention relates to data processing systems, specifically methods for optimizing the execution of queries in database management systems. The problem addressed is the inefficiency in processing input events, such as database queries, due to suboptimal query execution plans. Traditional systems often generate static query plans that fail to adapt to dynamic workloads, leading to performance bottlenecks. The invention improves upon prior art by dynamically transforming a query plan based on runtime conditions. The method involves receiving input events, such as database queries, and processing each event by performing computations based on a transformed query plan. The transformation of the query plan is influenced by factors such as system load, resource availability, or query characteristics. This dynamic adjustment ensures that the query execution remains efficient under varying conditions. Additionally, the method may include preprocessing the input events to extract relevant features, such as query structure or data dependencies, which are then used to refine the transformed query plan. The system may also monitor the execution of the transformed query plan to gather feedback, which is used to further optimize subsequent transformations. This feedback loop allows the system to continuously adapt to changing workloads and improve performance over time. By dynamically adjusting the query plan, the invention reduces execution time and resource consumption, particularly in high-load or unpredictable environments. This approach is particularly useful in large-scale database systems where static query plans are insufficient. The method ensures that queries are processed efficiently, even as system conditions fluctuate.

Claim 6

Original Legal Text

6. The method of claim 5 , wherein the continuous query includes pattern matching.

Plain English Translation

A system and method for processing continuous queries in a data stream environment addresses the challenge of efficiently identifying and extracting relevant data patterns from high-velocity data streams. The method involves receiving a continuous query that specifies one or more data patterns to be matched against incoming data streams. The query is processed in real-time, where incoming data elements are compared against the defined patterns to detect matches. When a match is found, the system triggers a predefined action, such as storing the matched data, generating an alert, or forwarding the data to another processing module. The pattern matching may involve regular expressions, string matching, or other pattern recognition techniques to identify specific sequences, structures, or anomalies in the data. The system dynamically adjusts the pattern matching parameters based on data stream characteristics, such as velocity or variability, to optimize performance and accuracy. This approach ensures that critical data patterns are detected and acted upon in real-time, enabling applications in fraud detection, network monitoring, and real-time analytics. The method supports multiple concurrent queries and can handle complex pattern definitions, making it suitable for diverse data stream processing scenarios.

Claim 7

Original Legal Text

7. A system, comprising: a memory configured to store computer-executable instructions; and a processor configured to access the memory and execute the computer-executable instructions to: receive a continuous query; apply a Directed Acrylic Graph (DAG) transformation to the continuous query to generate a query plan for the continuous query, wherein the query plan is an ordered set of steps used to access data for processing of the continuous query; apply a Continuous Query Language (CQL) transformation to the query plan to generate a transformed query plan such that a CQL engine can execute the continuous query using the transformed query plan; receive a micro-batch stream of input events related to an application; process the input events using the CQL engine to generate a set of output events related to the application, wherein the processing comprises: performing, by the CQL engine, incremental computation on each of the input events of the micro-batch stream for the continuous query based at least in part on the transformed query plan; and creating, by the CQL engine, output events for each of the input events of the micro-batch stream, wherein the set of output events comprise the output events for each of the input events of the micro-batch stream; generate, using a snapshot management algorithm implemented by the CQL engine, a snapshot of a current state of a system based at least in part on the set of output events related to the application; generate a first directory structure to access snapshot information associated with the snapshot of the current state of the system; generate a second directory structure to generate a list of snapshots associated with the current state of the system; and determine based at least in part on the snapshot management algorithm, a process to get, add, or clean the list of snapshots associated with the current state of the system.

Plain English Translation

This system processes continuous queries in real-time data streams using a Directed Acyclic Graph (DAG) transformation and Continuous Query Language (CQL) engine. The system addresses the challenge of efficiently executing continuous queries on streaming data by converting a received continuous query into an optimized query plan. The DAG transformation generates an ordered set of steps to access and process data, which is then converted into a CQL-compatible query plan. A CQL engine executes this plan incrementally on micro-batch streams of input events, producing output events for each input event. The system also manages state snapshots, generating directories to store and retrieve snapshot information. A snapshot management algorithm determines operations like adding, retrieving, or cleaning snapshots, ensuring efficient state management. The system is designed for applications requiring real-time data processing, such as event-driven systems or streaming analytics, where maintaining query performance and state consistency is critical. The incremental computation and snapshot management optimize resource usage while ensuring accurate state tracking.

Claim 8

Original Legal Text

8. The system of claim 7 , wherein the micro-batch stream is a continuous stream of data discretize into sub-second micro-batches.

Plain English Translation

The system processes data in a continuous stream, where the data is divided into micro-batches that are smaller than one second in duration. This approach enables real-time or near-real-time data processing, which is particularly useful in applications requiring low-latency analysis, such as financial transactions, sensor monitoring, or real-time analytics. The micro-batch processing allows for efficient handling of high-velocity data streams while maintaining the ability to process data in small, manageable chunks. This method improves upon traditional batch processing, which may introduce delays due to larger batch sizes, and continuous stream processing, which may lack the structure needed for certain analytical tasks. The system ensures that data is processed in discrete, time-bound segments, allowing for better resource allocation and more precise timing in data analysis. The micro-batch stream is generated by segmenting the continuous data flow into these sub-second intervals, ensuring that each micro-batch contains a portion of the data stream that can be processed independently. This segmentation helps in maintaining data integrity and reducing processing overhead, as each micro-batch can be handled separately without requiring the entire stream to be processed at once. The system is designed to optimize both throughput and latency, making it suitable for environments where rapid data processing is critical.

Claim 9

Original Legal Text

9. The system of claim 7 , wherein the computer executable instructions are further executable to store the set of output events related to the application in an output queue; and transmit the output events in the output queue when all of the input events have been processed.

Plain English Translation

This invention relates to a system for managing event processing in a computing environment, particularly where input and output events are handled asynchronously. The system addresses the challenge of ensuring data consistency and synchronization between input and output events in applications that process multiple event streams. The system includes a processor and memory storing executable instructions that, when executed, perform specific functions. These instructions process a set of input events related to an application, generate a set of output events based on the processed input events, and store the output events in an output queue. The system ensures that all input events are fully processed before transmitting the output events from the queue, thereby maintaining synchronization and preventing partial or out-of-order data transmission. This approach is particularly useful in distributed systems or applications where event-driven architectures require strict consistency between input and output operations. The system may also include additional components, such as event processors or communication interfaces, to facilitate event handling and transmission. The invention improves reliability and predictability in event-driven applications by enforcing a strict processing order and ensuring that output events are only transmitted after all input events have been fully processed.

Claim 10

Original Legal Text

10. The system of claim 9 , wherein the micro-batch stream comprises micro-batches of data or Resilient Distributed Datasets (RDDs).

Plain English Translation

The system processes data streams using micro-batch processing techniques, where data is divided into small, manageable batches (micro-batches) for efficient and fault-tolerant analysis. These micro-batches can consist of raw data segments or Resilient Distributed Datasets (RDDs), which are immutable, partitioned collections of data that can be processed in parallel across distributed computing environments. The system leverages micro-batching to balance latency and throughput, enabling real-time or near-real-time data processing while maintaining scalability and fault tolerance. By using RDDs, the system ensures that data is resilient to failures, as RDDs can be recomputed from lineage information if lost. The micro-batch stream allows for incremental processing, where each batch is processed independently, reducing the risk of data loss and improving system reliability. This approach is particularly useful in big data applications, such as stream processing frameworks, where handling large volumes of data with low latency is critical. The system optimizes resource utilization by dynamically adjusting batch sizes based on workload demands, ensuring efficient processing without overwhelming system resources.

Claim 11

Original Legal Text

11. The system of claim 10 , wherein the processing each of the input events comprises performing a computation on each of the input based at least in part on the transformed query plan.

Plain English Translation

A system processes input events by executing computations based on a transformed query plan. The system operates in the domain of data processing, particularly for handling and transforming input data streams or events in real-time or batch processing environments. The problem addressed is the efficient and accurate execution of computations on input data, which may require dynamic adaptation of processing logic based on a transformed query plan. The transformed query plan defines how input events should be processed, including operations such as filtering, aggregation, or transformation. The system ensures that each input event is processed according to the transformed query plan, enabling flexible and optimized data handling. This approach allows for dynamic adjustments in processing logic without requiring a complete system overhaul, improving scalability and adaptability in data processing workflows. The system may be part of a larger data processing framework, where the transformed query plan is generated or modified based on user queries, system requirements, or external inputs. The computations performed on the input events may include mathematical operations, data transformations, or conditional logic, all governed by the transformed query plan. This ensures that the system can handle diverse processing requirements while maintaining efficiency and accuracy.

Claim 12

Original Legal Text

12. The system of claim 11 , wherein wherein the continuous query includes pattern matching.

Plain English Translation

The system is designed for real-time data processing and analysis, particularly in environments where continuous monitoring and querying of streaming data is required. The problem addressed is the need for efficient and accurate pattern matching within continuous queries to identify specific data sequences or structures in real-time data streams. Traditional systems often struggle with the computational overhead and latency associated with pattern matching in high-velocity data environments, leading to delays or inaccuracies in detection. The system includes a continuous query processing module that executes queries on streaming data to detect predefined patterns. The pattern matching functionality allows the system to identify complex sequences, structures, or anomalies within the data stream, such as specific event sequences, temporal patterns, or irregularities. This is achieved through specialized algorithms optimized for real-time processing, ensuring low-latency detection without compromising accuracy. The system may also include preprocessing modules to filter or transform data before pattern matching, enhancing efficiency and reducing noise. Additionally, the system can dynamically adjust query parameters or matching criteria based on real-time conditions, improving adaptability in varying data environments. The overall goal is to provide a robust solution for real-time pattern detection in streaming data, enabling applications in fraud detection, network monitoring, industrial IoT, and other domains requiring immediate insights from high-velocity data.

Claim 13

Original Legal Text

13. A computer-readable medium storing computer-executable code that, when executed by a processor, cause the processor to perform operations comprising: receiving a continuous query; applying a Directed Acrylic Graph (DAG) transformation to the continuous query to generate a query plan for the continuous query, wherein the query plan is an ordered set of steps used to access data for processing of the continuous query; applying a Continuous Query Language (CQL) transformation to the query plan to generate a transformed query plan such that a CQL engine can execute the continuous query using the transformed query plan; receiving a micro-batch stream of input events related to an application; processing the input events based at least in part on the transformed query plan using the CQL engine to generate a set of output events related to the application, wherein the processing comprises: performing, by the CQL engine, incremental computation on each of the input events of the micro-batch stream for the continuous query based at least in part on the transformed query plan; and creating, by the CQL engine, output events for each of the input events of the micro-batch stream, wherein the set of output events comprise the output events for each of the input events of the micro-batch stream; generating, using a snapshot management algorithm implemented by the CQL engine, a snapshot of a current state of a system based at least in part on the set of output events related to the application; generating a first directory structure to access snapshot information associated with the snapshot of the current state of the system; generating a second directory structure to generate a list of snapshots associated with the current state of the system; and determining based at least in part on the snapshot management algorithm, a process to get, add, or clean the list of snapshots associated with the current state of the system.

Plain English Translation

This invention relates to a system for processing continuous queries in a stream processing environment. The system addresses the challenge of efficiently executing continuous queries on micro-batch streams of input events, ensuring incremental computation and state management for real-time applications. The system receives a continuous query and applies a Directed Acyclic Graph (DAG) transformation to generate a query plan, which is an ordered set of steps for accessing and processing data. This query plan is then transformed using a Continuous Query Language (CQL) transformation to enable execution by a CQL engine. The system processes micro-batch streams of input events based on the transformed query plan, performing incremental computations on each event to generate output events. The CQL engine creates output events for each input event, compiling them into a set of output events related to the application. Additionally, the system generates snapshots of the current system state using a snapshot management algorithm, which relies on the output events. It creates two directory structures: one for accessing snapshot information and another for maintaining a list of snapshots associated with the system state. The snapshot management algorithm determines processes for retrieving, adding, or cleaning the list of snapshots, ensuring efficient state management and retrieval. This approach optimizes continuous query execution and state tracking in stream processing applications.

Claim 14

Original Legal Text

14. The computer-readable medium of claim 13 , wherein the micro-batch stream is a continuous stream of data discretize into sub-second micro-batches.

Plain English Translation

A system processes data streams in real-time by discretizing continuous data into sub-second micro-batches. The data stream is divided into small, time-based segments, each processed as a micro-batch to enable low-latency analysis. This approach allows for efficient handling of high-velocity data while maintaining the ability to process individual data points at fine temporal granularity. The micro-batches are generated at intervals shorter than one second, ensuring minimal delay between data ingestion and processing. This method is particularly useful in applications requiring real-time analytics, such as financial trading, sensor monitoring, or event-driven systems, where timely insights are critical. The system may include mechanisms to dynamically adjust batch sizes or processing intervals based on data characteristics or system load, optimizing performance and resource utilization. By breaking the continuous stream into micro-batches, the system balances the need for real-time responsiveness with the computational efficiency of batch processing. This technique improves scalability and reduces latency compared to traditional batch or pure stream processing methods.

Claim 15

Original Legal Text

15. The computer-readable medium of claim 13 , wherein operations further comprise storing the set of output events related to the application in an output queue; and transmitting the output events in the output queue when all of the input events have been processed.

Plain English Translation

This invention relates to event processing in computing systems, specifically addressing the challenge of managing input and output events in applications to ensure proper sequencing and synchronization. The system processes a set of input events associated with an application, where these events may include user interactions, system triggers, or other inputs. The system generates a set of output events based on the processed input events, where these output events may include responses, updates, or other outputs triggered by the input events. To maintain consistency and avoid race conditions, the system stores the output events in an output queue before transmitting them. The transmission of output events is delayed until all input events have been fully processed, ensuring that the application's state remains stable and that outputs are generated in a predictable order. This approach is particularly useful in distributed systems or applications where event-driven processing is critical, such as real-time analytics, financial transactions, or multi-user collaborative environments. By queuing output events and deferring their transmission, the system prevents partial or out-of-order updates, improving reliability and consistency in event-driven workflows.

Claim 16

Original Legal Text

16. The computer-readable medium of claim 15 , wherein the micro-batch stream comprises micro-batches of data or Resilient Distributed Datasets (RDDs).

Plain English Translation

This invention relates to data processing systems, specifically methods for handling data streams in distributed computing environments. The problem addressed is the efficient processing of large-scale data streams, particularly in scenarios where data arrives in irregular or unpredictable patterns. Traditional batch processing systems struggle with latency and resource utilization when dealing with such streams, while pure stream processing systems may lack fault tolerance and scalability. The invention describes a system that processes data streams by dividing them into micro-batches, which are small, manageable units of data. These micro-batches can be further organized into Resilient Distributed Datasets (RDDs), a fault-tolerant data structure used in distributed computing frameworks like Apache Spark. The system dynamically adjusts the size and frequency of these micro-batches based on incoming data rates, ensuring optimal resource utilization and low-latency processing. The use of RDDs allows for efficient fault recovery, as lost data can be recomputed from lineage information rather than requiring redundant storage. The system includes mechanisms for monitoring data stream characteristics, such as arrival rates and processing loads, to dynamically adapt micro-batch sizes. This adaptability ensures that the system remains efficient under varying workloads, avoiding bottlenecks and resource waste. Additionally, the system may integrate with distributed computing frameworks to leverage parallel processing capabilities, further enhancing throughput and scalability. The invention aims to provide a robust solution for real-time data processing that balances latency, fault tolerance, and resource efficiency.

Claim 17

Original Legal Text

17. The computer-readable medium of claim 16 , wherein the processing each of the input events comprises performing a computation on each of the input based at least in part on the transformed query plan.

Plain English Translation

A system and method for optimizing query processing in a database management system (DBMS) involves transforming a query plan to improve performance. The system receives input events representing database queries and processes each event by performing computations based on a transformed query plan. The transformed query plan is generated by analyzing the original query plan to identify inefficiencies, such as redundant operations or suboptimal execution paths, and modifying the plan to enhance efficiency. The system may include a query optimizer that evaluates the original query plan, applies transformation rules to optimize it, and generates the transformed query plan. The processing of input events involves executing the transformed query plan to handle the queries more efficiently, reducing computational overhead and improving response times. The system may also include a monitoring component to track query performance and adjust the transformed query plan dynamically based on real-time data. This approach ensures that database operations are executed in an optimized manner, minimizing resource usage and enhancing overall system performance.

Patent Metadata

Filing Date

Unknown

Publication Date

July 14, 2020

Inventors

Hoyong Park
Sandeep Bishnoi
Prabhu Thukkaram
Santosh Kumar
Pavan Advani
Kunal Mulay
Jeffrey Toillion

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, FAQs, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “MANAGING SNAPSHOTS AND APPLICATION STATE IN MICRO-BATCH BASED EVENT PROCESSING SYSTEMS” (10713249). https://patentable.app/patents/10713249

© 2026 Nomic Interactive Technology LLC. Machine-readable context available at /api/llm-context/10713249. See llms.txt for full attribution policy.