Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
1. A method for retrospective snapshot creation comprising: creating, by a processor, a first snapshot that captures logical state of a data store at a first key, wherein creation of the first snapshot is based on: determining a log offset corresponding to the first key; determining existence of a second snapshot that captures logical state of the data store; creating and recording a retrospective snapshot at a last valid log address offset prior to the first key upon a determination that the second snapshot exists prior to the first key based on: determining at least one of: whether log address offsets from a first log entry of a log to a log entry of the log at the first key are contiguous; and whether log address offsets from the second snapshot to the first key are contiguous; and applying a plurality of rules for snapshots and garbage collection zones to prevent garbage collection of the data store across snapshot boundaries, wherein a first rule of the plurality of rules includes limiting garbage collection scope to a garbage collection zone to maintain snapshot fidelity.
This invention relates to data storage systems, specifically methods for creating retrospective snapshots of a data store to ensure data consistency and integrity. The problem addressed is the need to accurately capture the logical state of a data store at a specific point in time (a "key") while maintaining data consistency and preventing unintended data loss during garbage collection operations. The method involves creating a first snapshot that captures the logical state of a data store at a specified key. To do this, the system determines a log offset corresponding to the first key and checks for the existence of a second snapshot that captures an earlier state of the data store. If such a second snapshot exists, the system creates a retrospective snapshot at the last valid log address offset before the first key. This involves verifying whether the log address offsets from the first log entry to the first key are contiguous or whether the offsets from the second snapshot to the first key are contiguous. The method also applies rules to manage garbage collection zones, ensuring that garbage collection does not interfere with snapshot boundaries. A key rule limits garbage collection to specific zones to maintain snapshot fidelity, preventing data corruption or loss during the process. This approach ensures that snapshots accurately reflect the data store's state at the specified key while preserving data integrity during storage operations.
2. The method of claim 1 , wherein: log addresses are not reused, monotonically increase and are contiguous; a second rule of the plurality of rules includes avoiding garbage collection of cross-snapshot tombstones for valid snapshots; the cross-snapshot tombstones are objects across snapshot boundaries; and the retrospective snapshot is created without prior existence of the retrospective snapshot.
This invention relates to data storage systems, specifically methods for managing snapshots in a storage environment to improve efficiency and reliability. The method addresses challenges in snapshot management, such as log address reuse, garbage collection, and cross-snapshot tombstone handling. The method ensures log addresses are never reused, monotonically increase, and remain contiguous, preventing data corruption and ensuring consistent snapshot integrity. A key rule in the method avoids garbage collection of cross-snapshot tombstones for valid snapshots, preserving data references that span multiple snapshots. Cross-snapshot tombstones are objects that exist across snapshot boundaries, and their retention ensures data consistency when snapshots are accessed retrospectively. Additionally, the method allows for the creation of retrospective snapshots without prior existence, meaning snapshots can be generated from historical data even if they were not initially captured. This feature enhances flexibility in data recovery and analysis. The method optimizes storage efficiency by minimizing unnecessary garbage collection while maintaining data integrity across snapshots. The approach is particularly useful in systems requiring high reliability and consistent access to historical data.
3. The method of claim 2 , wherein: the second snapshot has a largest log address offset than other snapshots prior to the first key; a third rule of the plurality of rules includes garbage collecting intra-snapshot tombstones; and intra-snapshot tombstones are objects within a garbage collection zone.
A method for managing data snapshots in a storage system addresses the challenge of efficiently handling intra-snapshot tombstones, which are objects marked for deletion within a garbage collection zone. The method involves creating multiple snapshots of data, where each snapshot captures the state of the data at a specific point in time. The second snapshot is distinguished by having the largest log address offset among all snapshots created before a designated first key, ensuring proper ordering and consistency in the snapshot sequence. The method enforces a set of rules, including a third rule that mandates garbage collection of intra-snapshot tombstones. These tombstones are objects within a garbage collection zone that are no longer needed but remain in the snapshot due to their marked state. By applying this rule, the system ensures that unnecessary data is removed, optimizing storage efficiency and performance. The method also includes mechanisms to track and manage these tombstones, preventing them from occupying valuable storage space while maintaining data integrity across snapshots. This approach is particularly useful in systems where frequent snapshots are taken, and efficient garbage collection is critical to maintaining performance and reducing storage overhead.
4. The method of claim 3 , wherein the second snapshot is a retrospective snapshot for a log address corresponding to the first key.
A system and method for managing data snapshots in a storage environment addresses the challenge of efficiently tracking and retrieving historical data states. The invention involves capturing and storing multiple snapshots of data to enable point-in-time recovery or analysis. Specifically, the method includes generating a first snapshot of data associated with a first key, where the first snapshot represents a current state of the data. Subsequently, a second snapshot is created as a retrospective snapshot for a log address corresponding to the first key. The retrospective snapshot allows for the reconstruction of the data state at a previous point in time, enabling recovery or analysis of historical data. The method ensures that the second snapshot is linked to the log address, which serves as a reference point for tracking changes over time. This approach improves data management by providing a structured way to access and restore past data states, enhancing reliability and usability in storage systems. The invention is particularly useful in environments where data integrity and historical tracking are critical, such as in database management or file storage systems.
5. The method of claim 4 , wherein creation of the retrospective snapshot fails upon at least one of: the log address offsets from the first log entry to the log entry at the first key are non-contiguous; and the log address offsets from the second snapshot to the first key are non-contiguous.
This invention relates to data management systems, specifically ensuring data consistency in log-based storage systems. The problem addressed is the risk of creating corrupted or incomplete retrospective snapshots due to non-contiguous log address offsets, which can occur when log entries are not sequentially stored or when gaps exist between log entries and snapshot references. The method involves verifying the integrity of log entries before creating a retrospective snapshot. It checks two conditions: first, whether the log address offsets from the first log entry to the log entry containing the first key are contiguous; and second, whether the log address offsets from a second snapshot to the first key are contiguous. If either condition fails, the snapshot creation is aborted to prevent data corruption. This ensures that only valid, contiguous log sequences are used to generate snapshots, maintaining data consistency. The method is part of a broader system for managing log-based storage, where logs are used to reconstruct data states at specific points in time. By enforcing contiguous log offsets, the system avoids relying on fragmented or incomplete log entries, which could lead to incorrect snapshot generation. This approach is particularly useful in distributed or high-availability systems where log integrity is critical for reliable data recovery.
6. A computer program product for retrospective snapshot creation, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to: create, by a processor, a first snapshot that captures logical state of a data store at a first time in a time range, wherein creation of the retrospective snapshot is based on: determine, by the processor, a log offset corresponding to the first key; determine, by the processor, existence of a second snapshot that captures logical state of the data store; create and record, by the processor, a retrospective snapshot at a last valid log address offset prior to the first key upon a determination that the second snapshot exists based on: determining, by the processor, at least one of: whether log address offsets from a first log entry of a log to a log entry of the log at the first key are contiguous; and whether log address offsets from the second snapshot to the first key are contiguous; and apply, by the processor, a plurality of rules for snapshots and garbage collection zones to prevent garbage collection of the data store across snapshot boundaries, wherein a first rule of the plurality of rules includes limiting garbage collection scope to a garbage collection zone to maintain snapshot fidelity.
This invention relates to a computer program for creating retrospective snapshots of a data store, addressing the challenge of accurately capturing the logical state of a data store at a specific point in time, even when a snapshot was not originally taken at that time. The program determines a log offset corresponding to a first key and checks for the existence of a prior snapshot. If a prior snapshot exists, the program creates a retrospective snapshot at the last valid log address offset before the first key, ensuring data consistency. This involves verifying whether log address offsets from the first log entry to the first key are contiguous or whether offsets from the prior snapshot to the first key are contiguous. The program also applies rules to manage garbage collection, preventing data loss across snapshot boundaries by limiting garbage collection to specific zones, thereby maintaining snapshot fidelity. The solution ensures that retrospective snapshots can be generated without disrupting ongoing operations or compromising data integrity.
7. The computer program product of claim 6 , wherein: log addresses are not reused, monotonically increase and are contiguous; a second rule of the plurality of rules includes avoiding garbage collection of cross-snapshot tombstones for valid snapshots; the cross-snapshot tombstones are objects across snapshot boundaries; and the retrospective snapshot is created without prior existence of the retrospective snapshot.
This invention relates to a computer program product for managing data snapshots in a storage system, addressing challenges in efficient snapshot creation, garbage collection, and log address management. The system ensures log addresses are never reused, monotonically increase, and remain contiguous, preventing address conflicts and simplifying data retrieval. A key rule avoids garbage collection of cross-snapshot tombstones, which are objects spanning multiple snapshot boundaries, preserving data integrity for valid snapshots. The system also enables the creation of retrospective snapshots without requiring prior existence, allowing snapshots to be generated from historical data points that were not initially captured. This approach enhances flexibility in data recovery and analysis by permitting snapshots to be constructed from any point in time, even if not explicitly saved at that moment. The invention optimizes storage efficiency and reliability by combining strict log address management with selective garbage collection and retrospective snapshot capabilities.
8. The computer program product of claim 7 , wherein: the second snapshot has a largest log address offset than other snapshots prior to the first key; a third rule of the plurality of rules includes garbage collecting intra-snapshot tombstones; and intra-snapshot tombstones are objects within a garbage collection zone.
This invention relates to data management in distributed storage systems, specifically addressing the challenge of efficiently handling snapshots and garbage collection to optimize storage performance and reduce overhead. The system involves a computer program product that manages multiple snapshots of data, where each snapshot represents a point-in-time copy of the data. A key aspect is the use of a second snapshot that has the largest log address offset among all snapshots created before a first key, ensuring proper ordering and consistency in data recovery. The system also enforces a third rule that mandates garbage collecting intra-snapshot tombstones, which are objects marked for deletion within a designated garbage collection zone. This rule helps reclaim storage space by removing unnecessary data while maintaining data integrity. The garbage collection process is designed to operate within the constraints of the snapshot hierarchy, ensuring that only valid data is retained and obsolete data is efficiently purged. The invention improves storage efficiency by reducing redundant data and minimizing the overhead associated with managing multiple snapshots in a distributed environment.
9. The computer program product of claim 8 , wherein the second snapshot is a retrospective snapshot for a log address corresponding to the first key.
A system and method for managing data snapshots in a storage system addresses the challenge of efficiently tracking and retrieving historical data states. The invention involves capturing and storing snapshots of data at specific points in time, particularly for log-structured storage systems where data is written sequentially. The system generates a first snapshot of data associated with a first key, which represents a specific data entry or address in the storage system. Subsequently, a second snapshot is created as a retrospective snapshot for a log address corresponding to the first key. This retrospective snapshot allows the system to reconstruct or reference the state of the data at a previous point in time, enabling recovery, auditing, or analysis of historical data states. The method ensures that the second snapshot is accurately linked to the log address of the first key, maintaining data integrity and consistency. The system may also include mechanisms for compressing or deduplicating snapshot data to optimize storage efficiency. This approach is particularly useful in environments where frequent data updates occur, and historical data states must be preserved for compliance, debugging, or analytical purposes. The invention improves data management by providing a reliable method for capturing and accessing retrospective snapshots without disrupting ongoing storage operations.
10. The computer program product of claim 9 , wherein creation of the retrospective snapshot fails upon at least one of: the log address offsets from the first log entry to the log entry at the first key are non-contiguous; and the log address offsets from the second snapshot to the first key are non-contiguous.
This invention relates to data management systems, specifically ensuring data consistency in log-based storage systems. The problem addressed is the risk of creating corrupted or incomplete retrospective snapshots when log entries are not properly sequenced or when gaps exist in the log address offsets. The system involves a computer program product that manages log entries and snapshots in a storage system. It tracks log address offsets to verify continuity between log entries and snapshots. Specifically, the program checks two conditions to determine if snapshot creation should fail: first, if the log address offsets from the first log entry to the log entry at the first key are non-contiguous, indicating missing or out-of-order entries; second, if the log address offsets from a second snapshot to the first key are non-contiguous, suggesting inconsistencies in the snapshot chain. If either condition is met, the program prevents the creation of the retrospective snapshot to avoid data corruption. This mechanism ensures that only valid, contiguous log sequences are used for snapshot creation, maintaining data integrity in the storage system. The invention is particularly useful in distributed or high-availability systems where log consistency is critical.
11. The computer program product of claim 10 , further comprising program instructions executable by the processor to cause the processor to: record, by the processor, a key for each log entry in the log.
This invention relates to a computer program product for managing log data, specifically addressing the challenge of securely and efficiently recording and retrieving log entries in a distributed computing environment. The system includes a processor and a memory storing program instructions that, when executed, enable the processor to generate a log entry containing data from a distributed computing system, where the log entry includes a timestamp and a unique identifier. The processor also encrypts the log entry using a cryptographic key to ensure data integrity and security. Additionally, the system records a key for each log entry in the log, allowing for secure access and verification of the logged data. The invention further includes a method for generating and encrypting log entries, as well as a system for managing these encrypted logs. The recorded keys facilitate secure retrieval and validation of log entries, ensuring that the logged data remains tamper-proof and traceable. This approach enhances security and reliability in distributed computing environments by providing a robust mechanism for logging and verifying system activities.
12. The computer program product of claim 11 , further comprising program instructions executable by the processor to cause the processor to: locate, by the processor, a log address for a corresponding key using a secondary.
A system and method for managing data storage in a computing environment, particularly for optimizing access to stored data using address mapping techniques. The invention addresses the inefficiency in traditional data storage systems where locating specific data requires extensive searching, leading to delays and increased computational overhead. The system includes a primary storage mechanism and a secondary mapping structure that stores log addresses corresponding to keys. The secondary mapping structure allows for rapid retrieval of data by associating each key with a log address, which points to the actual storage location of the data. This reduces the time required to access data by eliminating the need for exhaustive searches through the primary storage. The system further includes a processor that executes instructions to locate a log address for a corresponding key using the secondary mapping structure, ensuring efficient data retrieval. The invention is particularly useful in environments where fast data access is critical, such as databases, file systems, or memory management systems. The secondary mapping structure can be implemented using various data structures, such as hash tables or trees, to further enhance performance. The system may also include mechanisms for updating the secondary mapping structure dynamically as data is added, modified, or deleted, ensuring that the log addresses remain accurate and up-to-date. This approach improves overall system efficiency by minimizing search times and optimizing resource utilization.
13. An apparatus comprising: a memory storing instructions; and a processor executing the instructions to create a first snapshot that captures logical state of a data store at a first key, wherein creation of the first snapshot is based on the processor: determining a log offset corresponding to the first key; determining existence of a second snapshot that captures logical state of the data store; creating and recording a retrospective snapshot at a last valid log address offset prior to the first key upon a determination that the second snapshot exists based on the processor further: determining at least one of: whether log address offsets from a first log entry of a log to a log entry of the log at the first key are contiguous; and whether log address offsets from the second snapshot to the first key are contiguous; and apply a plurality of rules for snapshots and garbage collection zones to prevent garbage collection of the data store across snapshot boundaries, wherein a first rule of the plurality of rules includes limiting garbage collection scope to a garbage collection zone to maintain snapshot fidelity.
This invention relates to data storage systems, specifically methods for creating and managing snapshots of a data store to ensure data consistency and integrity during garbage collection operations. The problem addressed is maintaining accurate snapshots while preventing data corruption during garbage collection, particularly when snapshots are taken at specific log offsets. The apparatus includes a memory and a processor that executes instructions to create a snapshot capturing the logical state of a data store at a specified key. The processor determines the log offset corresponding to the key and checks for the existence of a prior snapshot. If a prior snapshot exists, the processor creates a retrospective snapshot at the last valid log address offset before the specified key. This involves verifying whether log address offsets from the first log entry to the specified key are contiguous or whether offsets from the prior snapshot to the specified key are contiguous. The processor then applies rules to manage snapshots and garbage collection zones, ensuring that garbage collection does not occur across snapshot boundaries. A key rule limits garbage collection to a specific zone to preserve snapshot fidelity, preventing data loss or corruption during the process. This approach ensures that snapshots remain consistent and reliable while allowing efficient storage management.
14. The apparatus of claim 13 , wherein: log addresses are not reused, monotonically increase and are contiguous; a second rule of the plurality of rules includes avoiding garbage collection of cross-snapshot tombstones for valid snapshots; the cross-snapshot tombstones are objects across snapshot boundaries; and the retrospective snapshot is created without prior existence of the retrospective snapshot.
This invention relates to a data storage system that manages snapshots of data objects, addressing inefficiencies in log address management and garbage collection. The system ensures log addresses are never reused, monotonically increase, and remain contiguous, preventing fragmentation and improving storage efficiency. A key rule in the system avoids garbage collection of cross-snapshot tombstones, which are objects spanning multiple snapshot boundaries, ensuring data integrity for valid snapshots. The system also enables the creation of retrospective snapshots, which are generated without prior existence, allowing for on-demand recovery of historical data states. This approach optimizes storage performance by reducing overhead from garbage collection and ensuring consistent snapshot management. The contiguous log addressing and selective garbage collection rules enhance reliability and reduce the risk of data loss or corruption. The retrospective snapshot feature provides flexibility in data recovery, allowing users to access historical data states without pre-configured snapshots. The system is particularly useful in environments requiring high data integrity and efficient storage management.
15. The apparatus of claim 14 , wherein: the second snapshot has a largest log address offset than other snapshots prior to the first key; a third rule of the plurality of rules includes garbage collecting intra-snapshot tombstones; and intra-snapshot tombstones are objects within a garbage collection zone.
This invention relates to data storage systems, specifically optimizing snapshot management and garbage collection in log-structured storage. The problem addressed is inefficient storage utilization due to uncollected intra-snapshot tombstones—objects marked for deletion but still occupying space within a snapshot's garbage collection zone. The apparatus includes a storage system that maintains multiple snapshots of data, each associated with a log address offset. A key feature is that the second snapshot has the largest log address offset among all snapshots created before a designated first key, ensuring proper ordering for recovery. The system enforces rules for garbage collection, including a third rule that specifically targets intra-snapshot tombstones—objects within a defined garbage collection zone of a snapshot. This rule ensures that deleted objects within a snapshot are reclaimed, freeing up storage space and improving efficiency. The apparatus further includes mechanisms to identify and process these tombstones during garbage collection cycles, preventing fragmentation and maintaining performance. By focusing on intra-snapshot tombstones, the system avoids unnecessary scanning of other data regions, reducing overhead. The solution is particularly useful in environments where frequent snapshots are taken, such as virtual machine backups or database systems, where efficient storage management is critical.
16. The apparatus of claim 15 , wherein the second snapshot is a retrospective snapshot for a log address corresponding to the first key.
This invention relates to data storage systems, specifically methods for managing and retrieving data snapshots in a distributed or decentralized storage environment. The problem addressed is the inefficiency and complexity of retrieving historical data versions, particularly when dealing with large-scale distributed systems where data consistency and retrieval speed are critical. The apparatus includes a storage system configured to store data in a distributed manner, where data is organized using keys and addresses. The system captures snapshots of data at different points in time, allowing users to access historical versions. A key feature is the ability to generate a second snapshot that is retrospective, meaning it reconstructs the state of data at a specific log address corresponding to a first key. This retrospective snapshot enables efficient recovery of data as it existed at a particular point in time, improving data consistency and reducing retrieval latency. The apparatus further includes mechanisms to manage and retrieve these snapshots, ensuring that the retrospective snapshot accurately reflects the data state at the specified log address. This is particularly useful in scenarios where data integrity and historical accuracy are paramount, such as in financial systems, blockchain applications, or distributed databases. The invention enhances data retrieval performance while maintaining consistency across distributed nodes.
17. The apparatus of claim 16 , wherein creation of the retrospective snapshot fails upon at least one of: the log address offsets from the first log entry to the log entry at the first key are non-contiguous; and the log address offsets from the second snapshot to the first key are non-contiguous.
This invention relates to data storage systems, specifically ensuring the integrity of retrospective snapshots in a log-structured storage environment. The problem addressed is the risk of creating corrupted or incomplete snapshots when log entries are not sequentially stored, which can lead to data inconsistency and system failures. The apparatus includes a storage system that maintains a log of data entries, where each entry has an address offset indicating its position. The system creates snapshots by identifying key entries in the log and recording their address offsets. To ensure snapshot integrity, the apparatus verifies that the log address offsets between the first log entry and the first key entry are contiguous, meaning no gaps exist in the sequence. Similarly, it checks that the offsets between a second snapshot and the first key entry are also contiguous. If either verification fails, the snapshot creation is aborted to prevent data corruption. This mechanism prevents the use of incomplete or fragmented log data for snapshots, ensuring that only valid, contiguous log sequences are used. The invention is particularly useful in systems where data integrity is critical, such as databases or file systems that rely on log-based recovery.
18. The apparatus of claim 13 , wherein the processor further executes instructions comprising recordation of a key for each log entry in the log.
This invention relates to a data processing apparatus designed to enhance the security and traceability of log entries in a computing system. The apparatus includes a processor that executes instructions to generate and manage logs, with a focus on ensuring the integrity and authenticity of recorded data. The processor records a unique key for each log entry, which serves as a cryptographic identifier to verify the entry's validity and prevent tampering. This key may be derived from a cryptographic hash function or other secure method to ensure that any alteration to the log entry would invalidate the key, thus detecting unauthorized modifications. The apparatus also includes a memory for storing the log entries and associated keys, and an interface for receiving and transmitting data. The processor may further validate log entries by comparing recorded keys with newly generated keys, ensuring consistency and detecting discrepancies. This system is particularly useful in environments where audit trails and data integrity are critical, such as financial transactions, cybersecurity monitoring, and regulatory compliance. The invention addresses the problem of ensuring that log entries remain tamper-proof and verifiable, providing a robust mechanism for maintaining trust in recorded data.
19. The apparatus of claim 13 , wherein the processor further executes instructions comprising locating a log address for a corresponding key using a secondary index.
The invention relates to data storage systems, specifically improving the efficiency of log-structured storage by optimizing key lookups. In log-structured storage, data is written sequentially to a log, and keys are used to locate data entries. A primary challenge is efficiently retrieving data by key, especially in large-scale systems where direct lookups are impractical. The apparatus includes a processor that executes instructions to manage a log-structured storage system. The processor maintains a primary index for key-to-address mappings but also employs a secondary index to further accelerate lookups. When a key is queried, the processor first checks the primary index. If the key is not found, it then uses the secondary index to locate the log address where the key is stored. The secondary index may be structured as a hash table, a tree, or another efficient data structure to minimize search time. This dual-index approach reduces latency in key retrieval operations, particularly in systems with high write volumes or large datasets where primary index lookups alone may be insufficient. The invention also includes mechanisms to update the secondary index dynamically as new data is written to the log, ensuring the index remains current without requiring full rebuilds. This hybrid indexing strategy improves performance in distributed storage systems, databases, or file systems where fast key-based access is critical. The secondary index may be stored in memory or on fast storage media to further enhance lookup speed.
Unknown
February 4, 2020
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.