System and Method for Maintaining Data Consistency Across Replicas in a Cluster of Nodes

PublishedMay 26, 2020

Assigneenot available in USPTO data we have

Technical Abstract

Patent Claims

17 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A method of maintaining data consistency in a cluster of nodes where each node stores data in the form of tables, the method comprising: (a) dividing into data segments, by one node in the cluster of nodes, the data stored as tables by that one node, wherein the data segments are smaller in size than the tables; (b) loading into memory from a globally available location in the cluster of nodes, by the one node, metadata about when the data segments were last analyzed for data consistency; (c) prioritizing for data consistency analysis, by the one node, the data segments; (d) selecting for data consistency analysis, by the one node, a highest priority data segment; (e) dividing into pages, by the one node, the selected highest priority data segment, wherein the pages are smaller in size than the selected highest priority data segment; (f) selecting for data consistency analysis, by the one node, a sequentially next one of the pages; (g) creating a hash value, by the one node, of the selected, sequentially next one of the pages; (h) obtaining, by the one node, a hash value of the selected, sequentially next one of the pages from each other node in the cluster of nodes containing a replica of the selected, sequentially next one of the pages; (i) determining, by the one node, that the created hash value does not match the obtained hash value by comparing, by the one node, that the created hash value to the obtained hash value; (j) obtaining, by the one node, the selected, sequentially next one of the pages and corresponding time stamp from each other node in the cluster of nodes containing the replica of the selected, sequentially next one of the pages; (k) comparing, by the one node, a time stamp of the selected, sequentially next one of the pages with the obtained time stamp from each other node in the cluster of nodes containing the replica of the selected, sequentially next one of the pages; (l) sending as an update, by the one node, the selected, sequentially next one of the pages to each other node in the cluster of nodes containing the replica of the selected, sequentially next one of the pages, when the comparison shows the selected, sequentially next one of the pages has the most current time stamp; and, (m) updating, by the one node, the selected, sequentially next one of the pages of the one node with the obtained sequentially next one of the pages having a most current time stamp and sending as an update, by the one node, the obtained sequentially next one of the pages having the most current time stamp to each other node in the cluster of nodes containing the replica of the selected, sequentially next one of the pages, except for the node in the cluster of nodes containing the obtained sequentially next one of the pages having the most current time stamp, when the comparison shows the selected, sequentially next one of the pages does not have the most current time stamp.

Plain English Translation

In distributed database systems, maintaining data consistency across multiple nodes is critical to ensure accurate and reliable operations. A method addresses this challenge by implementing a systematic approach to verify and synchronize data consistency in a cluster of nodes, where each node stores data in tables. The method involves dividing the data into smaller segments for efficient processing. A designated node retrieves metadata from a globally accessible location to determine which segments require consistency analysis. The segments are prioritized, and the highest-priority segment is selected for analysis. This segment is further divided into smaller pages, and each page is sequentially analyzed. The node creates a hash value for the current page and compares it with hash values obtained from other nodes that store replicas of the same page. If the hash values do not match, the node retrieves the page and its timestamp from all other nodes. The timestamps are compared to identify the most current version of the page. If the local page is outdated, the node updates its copy with the most current version and propagates this update to all other nodes, except the node that provided the most current version. This ensures that all replicas of the page are synchronized, maintaining data consistency across the cluster. The method efficiently handles large datasets by breaking them into manageable segments and pages, prioritizing analysis, and using hash comparisons to detect inconsistencies.

Claim 2

Original Legal Text

2. The method of claim 1 further comprising: repeating steps (f) through (l) until it is determined, by the one node, that there are no more sequentially next one of the pages to be selected; and updating the metadata in the globally available location with the results of steps (l) and (m).

Plain English Translation

This invention relates to a method for managing and processing data pages in a distributed system, particularly for efficiently tracking and updating metadata associated with those pages. The problem addressed is the need to ensure consistent and up-to-date metadata across a distributed system where multiple nodes may access and modify the same data pages. The method involves a node selecting a page from a set of pages, determining if the page is valid, and if valid, processing the page by performing one or more operations such as reading, writing, or modifying the data. The node then checks if there are additional pages to process and repeats the selection and processing steps until all pages have been handled. During this process, the node updates metadata in a globally accessible location, ensuring that the metadata reflects the current state of the processed pages. The metadata may include information such as the status, version, or ownership of the pages. The method ensures that the metadata remains consistent and accurate across the distributed system, even as multiple nodes may be concurrently accessing and modifying the data. This approach improves data integrity and reliability in distributed environments.

Claim 3

Original Legal Text

3. The method of claim 1 further comprising repeating steps (c) through (m).

Plain English Translation

A system and method for automated data processing involves capturing input data from a source, such as a sensor or database, and performing a series of operations to analyze and transform the data. The process includes preprocessing the data to remove noise or irrelevant information, applying one or more algorithms to extract features or patterns, and generating an output based on the processed data. The output may be used for decision-making, reporting, or further analysis. The method also includes validating the results to ensure accuracy and reliability. Additionally, the system may include a feedback loop to refine the processing steps based on the output or user input. The method can be repeated iteratively to continuously process new data or refine existing results. The system may be implemented in software, hardware, or a combination of both, and may be deployed in various environments, such as cloud computing, edge computing, or embedded systems. The invention addresses the need for efficient, scalable, and accurate data processing in real-time or batch processing scenarios.

Claim 4

Original Legal Text

4. The method of claim 1 wherein the data segments are 200 MegaBytes (MB) in size.

Plain English Translation

A system and method for data storage and retrieval involves dividing data into segments for efficient processing and management. The invention addresses the challenge of handling large datasets by breaking them into smaller, manageable units to improve storage efficiency, retrieval speed, and system performance. Each data segment is processed independently, allowing parallel operations and reducing the risk of data loss or corruption. In this specific implementation, the data segments are standardized at 200 Megabytes (MB) in size. This fixed segment size ensures consistency in data handling, simplifies storage allocation, and optimizes retrieval operations. The 200 MB segment size balances performance and resource utilization, providing a practical compromise between minimizing overhead and maintaining efficient data access. The system may include mechanisms for segmenting data, storing the segments in a distributed or centralized storage system, and reassembling the segments when needed for retrieval or processing. The method may also include error detection and correction techniques to ensure data integrity across all segments. This approach is particularly useful in applications requiring high-speed data access, such as cloud storage, database management, or large-scale data processing systems.

Claim 5

Original Legal Text

5. The method of claim 1 wherein prioritizing for data consistency analysis the data segments uses a Least Recently Used (LRU) schema.

Plain English Translation

A system and method for optimizing data consistency analysis in distributed storage environments addresses the challenge of efficiently verifying data integrity across multiple storage nodes. The invention prioritizes data segments for consistency checks based on their access patterns, ensuring that frequently accessed or recently modified data is analyzed first. This approach reduces the risk of data corruption going undetected while minimizing the computational overhead of full-system scans. The prioritization is implemented using a Least Recently Used (LRU) schema, which tracks the recency of data access and assigns higher priority to segments that have been accessed or modified most recently. The system dynamically adjusts the priority of data segments as access patterns change, ensuring that the most critical data is consistently monitored. This method improves the efficiency of data consistency checks, particularly in large-scale distributed storage systems where manual verification is impractical. The invention also includes mechanisms for distributing the consistency analysis workload across multiple nodes, further optimizing resource utilization. By focusing on the most relevant data segments, the system ensures timely detection of inconsistencies while reducing the overall impact on system performance.

Claim 6

Original Legal Text

6. The method of claim 1 wherein; prioritizing for data consistency analysis the data segments is performed by computing a priority score for each of the data segments; and, wherein the highest priority data segment is the data segment having a lowest priority score.

Plain English Translation

This invention relates to data consistency analysis in distributed systems, addressing the challenge of efficiently identifying and prioritizing data segments for consistency checks in large-scale data environments. The method involves computing a priority score for each data segment to determine which segments require immediate analysis. The data segment with the lowest priority score is assigned the highest priority for consistency analysis, ensuring critical or at-risk data is processed first. The priority score may be based on factors such as data age, frequency of access, or historical error rates. By dynamically prioritizing segments, the system optimizes resource usage and reduces the risk of data corruption or inconsistencies propagating through the system. The method integrates with distributed storage systems, databases, or cloud-based data platforms where maintaining data integrity is critical. The approach improves efficiency by focusing computational resources on the most critical segments, minimizing downtime and ensuring reliable data operations.

Claim 7

Original Legal Text

7. The method of claim 1 wherein the pages are 10s to 100s of KiloBytes (KB) in size.

Plain English Translation

A system and method for managing and processing large data pages in a computing environment. The technology addresses the challenge of efficiently handling data pages that are significantly larger than traditional storage units, typically ranging from tens to hundreds of kilobytes (KB) in size. These large pages improve performance by reducing the overhead associated with frequent memory access and management operations, particularly in systems requiring high-speed data processing or large-scale data storage. The method involves allocating and managing memory in units of these large pages, which are significantly bigger than conventional page sizes used in operating systems or memory management units. By using pages in the range of 10 KB to 100 KB, the system minimizes the number of page faults and memory access operations, leading to improved efficiency in data retrieval and storage. The approach is particularly useful in applications such as database management, virtual memory systems, or real-time data processing where minimizing latency and maximizing throughput are critical. The system may include mechanisms for dynamically adjusting the size of these large pages based on workload demands, ensuring optimal performance across different operational conditions. Additionally, the method may integrate with existing memory management frameworks to maintain compatibility while enhancing performance. The use of large pages reduces fragmentation and overhead, making it suitable for high-performance computing environments.

Claim 8

Original Legal Text

8. The method of claim 1 wherein sending as an update, by the one node, the selected, sequentially next one of the pages to each other node in the cluster of nodes containing the replica of the selected, sequentially next one of the pages uses a standard write process.

Plain English Translation

This invention relates to distributed database systems, specifically methods for efficiently updating replicated data across a cluster of nodes. The problem addressed is the latency and resource overhead in maintaining consistency of replicated data in distributed systems, particularly when handling large datasets divided into pages. The method involves a node in a cluster selecting a sequentially next page of data to update and sending this page to other nodes in the cluster that maintain a replica of the selected page. The update process uses a standard write operation, ensuring compatibility with existing database protocols. The system ensures that each node in the cluster receives the updated page in the correct sequence, maintaining data consistency across replicas. The method optimizes performance by minimizing network traffic and reducing the computational load on individual nodes during updates. This approach is particularly useful in high-availability systems where data must remain synchronized across multiple nodes to prevent inconsistencies and ensure reliability. The invention improves upon prior art by leveraging standard write processes, simplifying implementation while maintaining efficiency in distributed data replication.

Claim 9

Original Legal Text

9. The method of claim 1 wherein updating, by the one node, the selected, sequentially next one of the pages of the one node with the obtained sequentially next one of the pages having a most current time stamp and sending as an update, by the one node, the obtained sequentially next one of the pages having the most current time stamp to each other node in the cluster of nodes containing the replica of the selected, sequentially next one of the pages, except for the node in the cluster of nodes containing the obtained sequentially next one of the pages having the most current time stamp, uses a standard write process.

Plain English Translation

This invention relates to distributed database systems, specifically methods for updating replicated data across a cluster of nodes. The problem addressed is ensuring data consistency and efficiency in distributed environments where multiple nodes maintain replicas of the same data pages. The invention describes a method for updating a selected page in a node by obtaining the most current version of that page from another node in the cluster, where the most current version is determined by comparing timestamps. Once the most current version is identified, the node updates its local replica using a standard write process and then propagates this updated page to all other nodes in the cluster that also hold a replica of that page, except the node that originally provided the most current version. This ensures that all nodes eventually converge to the same data state while minimizing redundant updates and network traffic. The method leverages timestamp comparisons to resolve conflicts and relies on a standard write process for local updates, ensuring compatibility with existing database systems. The approach is particularly useful in high-availability systems where data consistency across replicas is critical.

Claim 10

Original Legal Text

10. The method of claim 1 wherein the method is performed by each node in the cluster of nodes.

Plain English Translation

A distributed computing system involves a cluster of nodes working together to process data. A challenge in such systems is efficiently managing and executing tasks across the nodes to ensure reliability, scalability, and performance. Existing solutions often struggle with task distribution, fault tolerance, and coordination overhead, leading to inefficiencies. This invention addresses these issues by implementing a method where each node in the cluster independently performs a set of operations. The method includes receiving a task, processing the task locally, and coordinating with other nodes to ensure consistent execution. Each node may handle task distribution, error detection, and recovery without relying on a central controller, reducing bottlenecks and improving fault tolerance. The method also supports dynamic scaling by allowing nodes to join or leave the cluster while maintaining system stability. By distributing responsibilities across nodes, the system achieves higher resilience and efficiency in task execution. The method may involve nodes exchanging status updates, synchronizing data, and redistributing tasks if a node fails. This decentralized approach minimizes single points of failure and enhances overall system robustness. The invention is applicable in distributed databases, cloud computing, and large-scale data processing environments where reliability and scalability are critical.

Claim 11

Original Legal Text

11. A non-transitory computer-readable storage medium having embodied thereon a program, the program being executable by a processor to perform a method of maintaining data consistency in a cluster of nodes where each node stores data in the form of tables, the method comprising the steps of: (a) dividing into data segments, by one node in the cluster of nodes, the data stored as tables by that one node, wherein the data segments are smaller in size than the tables; (b) loading into memory from a globally available location in the cluster of nodes, by the one node, metadata about when the data segments were last analyzed for data consistency; (c) prioritizing for data consistency analysis, by the one node, the data segments; (d) selecting for data consistency analysis, by the one node, a highest priority data segment; (e) dividing into pages, by the one node, the selected highest priority data segment, wherein the pages are smaller in size than the selected highest priority data segment; (f) selecting for data consistency analysis, by the one node, a sequentially next one of the pages; (g) creating a hash value, by the one node, of the selected, sequentially next one of the pages; (h) obtaining, by the one node, a hash value of the selected, sequentially next one of the pages from each other node in the cluster of nodes containing a replica of the selected, sequentially next one of the pages; (i) determining, by the one node, that the created hash value does not match the obtained hash value by comparing, by the one node, that the created hash value to the obtained hash value; (j) obtaining, by the one node, the selected, sequentially next one of the pages and corresponding time stamp from each other node in the cluster of nodes containing the replica of the selected, sequentially next one of the pages; (k) comparing, by the one node, a time stamp of the selected, sequentially next one of the pages with the obtained time stamp from each other node in the cluster of nodes containing the replica of the selected, sequentially next one of the pages; (l) sending as an update, by the one node, the selected, sequentially next one of the pages to each other node in the cluster of nodes containing the replica of the selected, sequentially next one of the pages, when the comparison shows the selected, sequentially next one of the pages has the most current time stamp; and, (m) updating, by the one node, the selected, sequentially next one of the pages of the one node with the obtained sequentially next one of the pages having a most current time stamp and sending as an update, by the one node, the obtained sequentially next one of the pages having the most current time stamp to each other node in the cluster of nodes containing the replica of the selected, sequentially next one of the pages, except for the node in the cluster of nodes containing the obtained sequentially next one of the pages having the most current time stamp, when the comparison shows the selected, sequentially next one of the pages does not have the most current time stamp.

Plain English Translation

This invention relates to maintaining data consistency in a distributed database cluster where data is stored as tables across multiple nodes. The problem addressed is ensuring that replicas of data segments across different nodes remain synchronized, which is critical for data integrity in distributed systems. The solution involves a method executed by a node in the cluster to analyze and reconcile data consistency by comparing hash values and timestamps of data segments and their sub-components. The method begins by dividing the node's stored tables into smaller data segments. Metadata about the last consistency analysis of these segments is loaded from a globally accessible location in the cluster. The segments are then prioritized for analysis, and the highest-priority segment is selected. This segment is further divided into smaller pages. For each page, a hash value is generated, and corresponding hash values are obtained from other nodes that store replicas of the same page. If the hash values do not match, the node retrieves the page and its timestamp from all replica-holding nodes. The timestamps are compared, and the page with the most recent timestamp is identified. If the local page is outdated, it is updated with the most recent version and propagated to all other nodes. If the local page is the most recent, it is sent to other nodes to update their replicas. This process ensures that all nodes maintain consistent and up-to-date data.

Claim 12

Original Legal Text

12. The non-transitory computer readable medium of claim 11 , wherein the method further comprises: repeating steps (f) through (l) until it is determined, by the one node, that there are no more sequentially next one of the pages to be selected; and updating the metadata in the globally available location with the results of steps (l) and (m).

Plain English Translation

This invention relates to a distributed data processing system where multiple nodes collaborate to process and update metadata for a set of data pages. The system addresses the challenge of efficiently managing and synchronizing metadata across distributed nodes in a scalable and consistent manner. The method involves a node selecting a page from a set of pages, determining if the page is valid, and if valid, processing the page to generate results. The node then checks if there are more pages to process and repeats the steps until no more pages remain. The results of the processing are then used to update metadata stored in a globally accessible location, ensuring consistency across the distributed system. The process includes steps for selecting pages, validating them, processing the data, and updating the metadata, all while maintaining synchronization among the nodes. The invention ensures that metadata remains accurate and up-to-date as pages are processed, improving the reliability of the distributed data system.

Claim 13

Original Legal Text

13. The non-transitory computer readable medium of claim 11 , wherein the method further comprises the steps of repeating steps (c) through (m).

Plain English Translation

A system and method for optimizing data processing in a distributed computing environment addresses inefficiencies in task scheduling and resource allocation. The invention focuses on dynamically adjusting computational workloads across multiple nodes to minimize latency and maximize throughput. The method involves analyzing task dependencies, predicting resource requirements, and redistributing tasks based on real-time performance metrics. It includes steps for monitoring node performance, identifying bottlenecks, and reallocating tasks to underutilized nodes. The system also incorporates adaptive load balancing, where tasks are prioritized based on urgency and resource availability. Additionally, the method may involve iterative execution of these steps to continuously optimize performance. This approach improves efficiency in distributed computing systems by dynamically responding to changing workload conditions, reducing idle time, and ensuring balanced resource utilization. The invention is particularly useful in large-scale data processing environments where static scheduling leads to inefficiencies.

Claim 14

Original Legal Text

14. The non-transitory computer readable medium of claim 11 , wherein prioritizing for data consistency analysis the data segments uses a Least Recently Used (LRU) schema.

Plain English Translation

A system and method for optimizing data consistency analysis in distributed storage environments addresses the challenge of efficiently identifying and prioritizing data segments for verification in large-scale storage systems. The invention employs a Least Recently Used (LRU) schema to prioritize data segments, ensuring that the most outdated or least accessed segments are analyzed first. This approach improves system efficiency by focusing on segments with the highest likelihood of inconsistency, reducing unnecessary processing of frequently accessed or recently verified data. The system includes a data consistency analyzer that evaluates the integrity of distributed data segments across multiple storage nodes, using the LRU schema to determine the order of analysis. The method involves tracking access patterns of data segments, ranking them based on recency of use, and systematically verifying segments starting from the least recently accessed. This prioritization minimizes resource consumption while maintaining data reliability, particularly in environments where storage nodes may experience latency or failures. The invention is applicable to distributed storage systems, cloud computing platforms, and other large-scale data management frameworks where data consistency is critical.

Claim 15

Original Legal Text

15. The non-transitory computer readable medium of claim 11 , wherein: prioritizing for data consistency analysis the data segments is performed by computing a priority score for each of the data segments; and, wherein the highest priority data segment is the data segment having a lowest priority score.

Plain English Translation

This invention relates to data consistency analysis in distributed systems, addressing the challenge of efficiently identifying and prioritizing data segments for consistency checks in large-scale data environments. The system computes a priority score for each data segment, where the segment with the lowest score is assigned the highest priority for analysis. This approach ensures that critical or frequently accessed data is evaluated first, improving system reliability and performance. The priority scoring mechanism may consider factors such as data segment age, access frequency, or error history to determine relevance. By dynamically prioritizing segments, the system optimizes resource usage and reduces the time required to detect and resolve inconsistencies. The invention is particularly useful in distributed databases, cloud storage systems, or any environment where maintaining data integrity is essential. The method involves analyzing metadata associated with each data segment to generate the priority score, ensuring that the most critical segments are processed first without manual intervention. This automated prioritization enhances scalability and efficiency in large-scale data management.

Claim 16

Original Legal Text

16. The non-transitory computer readable medium of claim 11 , wherein sending as an update, by the one node, the selected, sequentially next one of the pages to each other node in the cluster of nodes containing the replica of the selected, sequentially next one of the pages uses a standard write process.

Plain English Translation

This invention relates to distributed database systems, specifically methods for efficiently updating replicated data across a cluster of nodes. The problem addressed is the latency and complexity involved in propagating updates to replicated data pages in a distributed system, where maintaining consistency and minimizing overhead are critical. The invention describes a system where a primary node in a cluster manages a set of data pages, each replicated across multiple nodes. When an update is needed, the primary node selects the next page in a predefined sequence and sends it to all other nodes in the cluster using a standard write process. This ensures that each node receives the updated page in the same order, maintaining consistency. The system may also include mechanisms to track which pages have been updated and to handle failures or delays in propagation. The standard write process may involve direct transmission, batching, or other optimized techniques to reduce network overhead. The invention ensures that all nodes eventually receive the same updated data, preventing inconsistencies while minimizing the computational and network resources required for synchronization. This approach is particularly useful in high-availability systems where data replication is essential for fault tolerance and performance.

Claim 17

Original Legal Text

17. The non-transitory computer readable medium of claim 11 , wherein updating, by the one node, the selected, sequentially next one of the pages of the one node with the obtained sequentially next one of the pages having a most current time stamp and sending as an update, by the one node, the obtained sequentially next one of the pages having the most current time stamp to each other node in the cluster of nodes containing the replica of the selected, sequentially next one of the pages, except for the node in the cluster of nodes containing the obtained sequentially next one of the pages having the most current time stamp, uses a standard write process.

Plain English Translation

This invention relates to distributed database systems, specifically methods for updating replicated data across a cluster of nodes. The problem addressed is ensuring data consistency and efficiency in distributed systems where multiple nodes maintain replicas of the same data. The invention describes a process for updating pages of data in a distributed database, where each node in a cluster stores a replica of a selected page. When a node obtains a newer version of a page (identified by a more current timestamp), it updates its local replica and propagates the update to all other nodes in the cluster that also store a replica of that page, except the node that originally provided the newer version. This update process is performed using a standard write operation, ensuring that the most current data is consistently distributed across the cluster. The method optimizes performance by avoiding redundant updates and leveraging timestamp comparisons to determine the most recent data version. This approach is particularly useful in high-availability systems where data consistency and minimal latency are critical.

Patent Metadata

Filing Date

Unknown

Publication Date

May 26, 2020

Inventors

Sylvain Jean Lebresne

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search