Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
1. A computerized system of concurrently synchronizing a garbage collection thread and a writer threads in a dedupe file system in a data gathering state comprising: a processor configured to execute instructions; a memory containing instructions when executed on the processor, causes the processor to perform operations that: while the dedupe file system is in a data gathering state, a garbage collector thread concurrently working with writer threads, generate, with at least one processor, a garbage list of data chunks that are candidates for deletion by a garbage collector thread, wherein the garbage collector thread enumerates all backups on the data store, and wherein the garbage collector thread traverses a list of valid backups and removes any data chunks of the list of valid backups from an eraser database of the dedupe file system; and with the writer threads, referring to the garbage list of data chunks while ingesting data by: matching the data chunks with those present in the garbage list; filtering out the matched data chunks from garbage list of data chunks; setting, with the garbage collector thread, the dedupe file system to a data deletion state; and setting the writer threads to ingest data into the dedupe file system in synchronization with garbage collector thread.
The invention relates to a computerized system for efficiently managing data deduplication in a file system by synchronizing garbage collection and data writing operations during a data gathering state. The system addresses the challenge of maintaining performance and consistency in a deduplication file system where data chunks are shared across multiple backups, requiring careful coordination between garbage collection and data ingestion to avoid data loss or corruption. The system includes a processor and memory containing instructions that, when executed, enable concurrent operation of a garbage collector thread and writer threads. The garbage collector thread generates a list of data chunks eligible for deletion by enumerating all backups in the data store and traversing a list of valid backups. It removes any data chunks from an eraser database that are still referenced in valid backups. Meanwhile, the writer threads ingest new data while referencing the garbage list, matching incoming data chunks against the list. If a match is found, the chunk is filtered out of the garbage list, preventing its deletion. Once the garbage list is processed, the system transitions to a data deletion state, where the garbage collector thread deletes the remaining unreferenced chunks. The writer threads then synchronize with the garbage collector to ensure new data is ingested without conflicts. This approach optimizes storage efficiency by safely removing redundant data while maintaining data integrity during concurrent operations.
2. A computerized system of synchronizing a garbage collection thread and a writer threads in a dedupe file system in Data deletion state comprising: a processor configured to execute instructions; a memory containing instructions when executed on the processor, causes the processor to perform operations that: provide one or more writer threads concurrently working with garbage collector thread, referring to the garbage list of data chunks while ingesting data by: matching the data chunks with those present in the garbage list; for the matched data chunk add one or more hard links to the data chunk file in a temporary location, and wherein the hard links lock the data chunk file from deletion, wherein the hard links are a directory entry that associates a name with a data chunk file on a file system of the computerized system; with a garbage collector thread: iterate through the garbage list; and obtain an exclusive access of each data chunk and delete any data chunk that is not marked by the one or more writer threads as having two hard links.
This invention relates to a computerized system for synchronizing garbage collection and writer threads in a deduplicated file system during data deletion. The system addresses the challenge of safely deleting redundant data chunks while ensuring active writer threads do not lose access to required data. The system includes a processor and memory storing instructions to manage concurrent writer and garbage collector threads. Writer threads ingest data by matching incoming data chunks against a garbage list of marked chunks. When a match is found, the writer threads add one or more hard links to the chunk's file in a temporary location. These hard links lock the file, preventing deletion by the garbage collector. Hard links are directory entries that associate a name with a file, allowing multiple references to the same data. The garbage collector thread iterates through the garbage list, obtaining exclusive access to each chunk. It deletes any chunk that lacks two hard links, indicating no active writer threads are using it. This mechanism ensures data integrity by preventing deletion of chunks still in use while efficiently reclaiming space for truly redundant data. The system optimizes storage by leveraging deduplication while maintaining thread safety during concurrent operations.
Unknown
April 14, 2020
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.