Preparing source data to be replicated in a continuous data replication environment. Certain systems and methods populate a file name database with entries having a unique file identifier descriptor (FID), short name and a FID of the parent directory of each directory or file on a source storage device. Such information is advantageously gathered during scanning of a live file system without requiring a snapshot of the source storage device. The database can be further used to generate absolute file names associated with data operations to be replayed on a destination storage device. Based on the obtained FIDs, certain embodiments can further combine write operations to be replayed on the destination storage device and/or avoid replicating temporary files to the destination system.
Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
1. A method for identifying data to be copied in a data replication system, the method comprising: obtaining with a scanning module executing on a computing device a first file identifier descriptor (FID) of a first directory on a live source file system, the first FID being one of a plurality of unique identifiers corresponding to a plurality of directories and files on the source file system; adding the first FID to a queue; storing a current journal sequence number from a file system filter driver identifying a first time; following said storing, accessing a current directory of the plurality of directories on the source file system that corresponds to a next FID stored in the queue; obtaining additional FIDs for each immediate child directory and immediate child file in the current directory; if no changes have been made to the current directory since the first time, populating a file name database with the additional FIDs of each immediate child directory and immediate child file in the current directory, adding the additional FIDs of each immediate child directory of the current directory to the queue, and removing the next FID from the queue; and if changes have been made to the first directory since the first time, repeating said storing, said accessing and said obtaining the additional FIDs.
A method for data replication involves identifying data for copying. A scanning module obtains a unique file identifier (FID) of a directory on a live file system, without requiring a snapshot. This FID is added to a queue. A journal sequence number is stored, marking a specific time. The method then accesses a directory corresponding to the next FID in the queue and retrieves FIDs for its immediate child directories and files. If no changes occurred since the stored time, a file name database is populated with these child FIDs, and the child directory FIDs are added to the queue. The processed FID is removed from the queue. If changes are detected, the process repeats from storing the journal sequence number.
2. The method of claim 1 , wherein said obtaining the first FID is performed without performing a snapshot on the source file system.
The method for data replication from the previous description obtains the file identifier (FID) of a directory on a live file system, where obtaining the FID is performed without taking a snapshot of the source file system. This avoids performance overhead and disruption of normal operations.
3. The method of claim 1 , wherein the first directory is the current directory.
The method for data replication from the first description uses the current directory as the first directory when obtaining the file identifier (FID) of a directory on a live file system. This implies the scanning starts from the currently focused directory.
4. The method of claim 1 , additionally comprising repeating said storing, said accessing and said obtaining the additional FIDs for each FID stored in the queue.
The method for data replication from the first description repeats the steps of storing the journal sequence number, accessing a directory, and obtaining file identifiers (FIDs) for each FID stored in the queue. This ensures that all directories and files in the source file system are scanned and processed for replication.
5. The method of claim 1 , wherein said populating the file name database comprises for each immediate child directory and immediate child file in the current directory and storing in the file name database: the additional FID for the immediate child directory or immediate child file; a corresponding short name for the immediate child directory or immediate child file; and the next FID as a parent directory of the immediate child directory or immediate child file.
The method for data replication from the first description populates a file name database. For each child directory and file, it stores the child's FID, its short name, and the parent directory's FID (obtained from the queue) within the file name database. This creates a hierarchical representation of the file system.
6. The method of claim 1 , wherein said changes comprise namespace changes to the current directory.
The method for data replication from the first description detects namespace changes to the current directory. These changes can include adding, deleting, or renaming files or subdirectories, which trigger a re-scanning of the changed directory.
7. The method of claim 1 , wherein the first directory comprises a root directory of the live source file system.
The method for data replication from the first description uses the root directory of the live source file system as the starting point. This ensures the entire file system is scanned, and no files or directories are missed during the replication process.
8. The method of claim 1 , additionally comprising monitoring at least one data management operation directed to first data stored in the source file system.
The method for data replication from the first description monitors data management operations directed at data stored in the source file system. These operations can include file creation, deletion, modification, or renaming. This allows tracking of changes made to the files on the system for replication to the destination system.
9. The method of claim 8 , additionally comprising replaying the at least one data management operation on replication data stored on a destination file system.
The method for data replication from the previous description replays the data management operations on replication data stored on a destination file system. This ensures the destination system maintains an up-to-date copy of the data on the source system, reflecting all changes made.
10. The method of claim 9 , additionally comprising: constructing, from information populated in the file name database, an absolute file name that corresponds to the location of the first data on the source file system; and transmitting the absolute file name to the destination system to direct said replaying of the at least one data management operation.
The method for data replication from the previous description constructs an absolute file name corresponding to the data's location on the source file system using the populated file name database. This absolute file name is then transmitted to the destination system, guiding the replay of data management operations to the correct location on the destination system.
11. A system for preparing data for replication from a source computing device in a network, the system comprising: a queue configured to store a plurality of file identifier descriptors (FIDs) each comprising a unique identifier that corresponds to one of a plurality of directories and files on a source file system; a scanning module executing on a computing device and configured to scan the source file system while in a live state and to populate the queue with the plurality of FIDs; a database comprising file name data that associates each of the plurality of FIDs with a short name and a parent FID, wherein the scanning module is further configured to populate the database with the file name data based on said scan of the source file system in the live state; and at least one database thread configured to receive a data entry identifying a data management operation associated with at least one of the plurality of directories and files on the source file system and to construct from the FID associated with the at least one directory or file an absolute file name for transmission to a destination system along with a copy of the data management operation for replying on the destination system.
A system replicates data from a source to a destination. A queue stores unique file identifiers (FIDs) for files and directories on the source file system. A scanning module scans the live source file system and populates the queue with these FIDs. A database associates each FID with a short name and parent FID, populated by the scanning module. A database thread receives data entries identifying data management operations. Using the associated FID, it constructs an absolute file name and transmits it with a copy of the operation to the destination system for replay.
12. The system of claim 11 , wherein the scanning module is further configured to: access a current directory of the plurality of directories on the source file system that corresponds to a next FID in the queue; and obtain additional FIDs for each immediate child directory and immediate child file in the current directory.
The system for data replication from the previous description has a scanning module that accesses a directory corresponding to the next file identifier (FID) in the queue. It then obtains additional FIDs for each immediate child directory and immediate child file within that directory. This implements a breadth-first traversal of the source file system.
13. The system of claim 12 , wherein the scanning module is further configured to: populate the file name database with the additional FIDs of each immediate child directory and immediate child file in the current directory, and add the additional FIDs of each immediate child directory of the current directory to the queue.
The system for data replication from the previous description has a scanning module that populates the file name database with additional FIDs for each immediate child directory and child file in the current directory. It also adds the FIDs of each immediate child *directory* to the queue for further processing. This continues the breadth-first traversal of the file system, adding only directory FIDs to the queue to avoid redundant file processing.
14. The system of claim 11 , further comprising a filter driver situated between the source file system and at least one application configured to request the data management operation.
The system for data replication from the eleventh description includes a filter driver that sits between the source file system and applications requesting data management operations. This allows the system to intercept and monitor all file system activity.
15. The system of claim 14 , wherein the filter driver is further configured to assign journal sequence numbers to each journal entry associated with a requested change to the source file system.
The system for data replication from the previous description's filter driver assigns journal sequence numbers to each journal entry, associated with requested changes to the source file system. This provides a temporal order for changes, used to ensure consistency during replication.
16. The system of claim 15 , wherein the scanning module is further configured to receive a current journal sequence number from the filter driver prior to accessing the current directory.
The system for data replication from the previous description's scanning module receives a current journal sequence number from the filter driver before accessing the current directory. This provides a snapshot in time of the file system state.
17. The system of claim 16 , wherein the scanning module is configured to repeat said accessing and obtaining when changes are detected to the current directory following a time of the current journal sequence number but prior to said obtaining.
The system for data replication from the previous description's scanning module repeats the accessing and obtaining of FIDs if changes are detected in the current directory after receiving the journal sequence number but before the obtaining of the FIDs. This ensures data consistency by re-scanning directories modified during the scanning process.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
May 27, 2011
July 16, 2013
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.