Patentable/Patents/US-20260037392-A1

US-20260037392-A1

Automated Failsafe Data Recovery

PublishedFebruary 5, 2026

Assigneenot available in USPTO data we have

InventorsYi Fang Kedar Nitin Shah Yantao Song

Technical Abstract

A system is disclosed for recovering historical table data in a database environment. The system includes at least one hardware processor and at least one memory. The memory stores instructions that, when executed, cause the system to receive a request to recover historical table data of a source table. The historical table data includes multiple partition files, and each partition file includes a deleted file designation. The system performs a recovery process on the partition files by determining a recoverable time range for the source table based on lifecycle information and restoring the partition files based on the recoverable time range. The system retrieves a schema associated with the historical table data and generates metadata corresponding to the schema. The metadata is associated with the recovered partition files to reconstruct the historical table data. This approach allows efficient and reliable recovery of deleted or lost table data.

Patent Claims

Legal claims defining the scope of protection, as filed with the USPTO.

A system comprising: at least one hardware processor; and at least one memory storing instructions that, when executed by the at least one hardware processor, cause the system to perform operations comprising: receiving a request to recover historical table data of a source table, the historical table data comprising a plurality of partition files, each partition file including a deleted file designation; performing a recovery process on the plurality of partition files to obtain recovered partition files, the recovery process comprising: determining a recoverable time range for the source table based on lifecycle information; and restoring the plurality of partition files based on the recoverable time range; retrieving a schema associated with the historical table data; generating metadata corresponding to the retrieved schema; and associating the metadata with the recovered partition files to reconstruct the historical table data.

claim 1 . The system of, wherein the recovery process further comprises: removing the deleted file designation from each of the plurality of partition files.

claim 1 . The system of, wherein the recoverable time range is determined based on a threshold period and a current timestamp associated with the request.

claim 1 . The system of, wherein the recovery process is performed during a failsafe period associated with the source table.

claim 1 verifying authorization of the request based on at least one secure credential associated with the requester. . The system of, the operations further comprising:

claim 1 generating a temporary table using the retrieved schema associated with the historical table data. . The system of, wherein the recovery process further comprises:

claim 6 retrieving a plurality of active metadata files associated with the historical table data. . The system of, the operations further comprising:

claim 7 parsing the plurality of active metadata files to determine the plurality of partition files forming the historical table data. . The system of, the operations further comprising:

claim 1 generating a notification indicating completion of the recovery process and transmitting the notification to an account associated with the requester. . The system of, the operations further comprising:

A method comprising: receiving, by at least one hardware processor, a request to recover historical table data of a source table, the historical table data comprising a plurality of partition files, each partition file including a deleted file designation; performing a recovery process on the plurality of partition files to obtain recovered partition files, the recovery process comprising: determining a recoverable time range for the source table based on lifecycle information; and restoring the plurality of partition files based on the recoverable time range; retrieving a schema associated with the historical table data; generating metadata corresponding to the retrieved schema; and associating the metadata with the recovered partition files to reconstruct the historical table data.

claim 10 . The method of, wherein the recovery process further comprises: removing the deleted file designation from each of the plurality of partition files.

claim 11 . The method of, wherein the recoverable time range is determined based on a threshold period and a current timestamp associated with the request.

claim 12 . The method of, wherein the recovery process is performed during a failsafe period associated with the source table.

claim 10 verifying authorization of the request based on at least one secure credential associated with the requester. . The method of, further comprising:

claim 10 generating a temporary table using the retrieved schema associated with the historical table data; retrieving a plurality of active metadata files associated with the historical table data; and parsing the plurality of active metadata files to determine the plurality of partition files forming the historical table data. . The method of, further comprising:

A computer-storage medium comprising instructions that, when executed by one or more processors of a machine, configure the machine to perform operations comprising: receiving a request to recover historical table data of a source table, the historical table data comprising a plurality of partition files, each partition file including a deleted file designation; performing a recovery process on the plurality of partition files to obtain recovered partition files, the recovery process comprising: determining a recoverable time range for the source table based on lifecycle information; and restoring the plurality of partition files based on the recoverable time range; retrieving a schema associated with the historical table data; generating metadata corresponding to the retrieved schema; and associating the metadata with the recovered partition files to reconstruct the historical table data.

claim 16 . The computer-storage medium of, wherein the recovery process further comprises: removing the deleted file designation from each of the plurality of partition files.

claim 17 . The computer-storage medium of, wherein the recoverable time range is determined based on a threshold period and a current timestamp associated with the request.

claim 18 . The computer-storage medium of, wherein the recovery process is performed during a failsafe period associated with the source table.

claim 16 verifying authorization of the request based on at least one secure credential associated with the requester. . The computer-storage medium of, the operations further comprising:

Detailed Description

Complete technical specification and implementation details from the patent document.

This application is a Continuation of U.S. Patent Application Serial No. 18/192,269, filed March 29, 2023, the contents of which are hereby incorporated by reference.

Embodiments of the disclosure relate generally to data recovery configurations and, more specifically, to techniques for automated failsafe data recovery in a database system.

Databases are widely used for data storage and access in computing applications. A goal of database storage is to provide enormous sums of information in an organized manner so that it can be accessed, managed, updated, and shared. In a database, data may be organized into rows, columns, and tables. Databases are used by various entities and companies for storing information that may need to be accessed or analyzed. However, databases can be susceptible to extreme operational disasters causing data loss, which can be time-consuming and challenging to recover.

Reference will now be made in detail to specific example embodiments for carrying out the inventive subject matter. Examples of these specific embodiments are illustrated in the accompanying drawings, and specific details are outlined in the following description to provide a thorough understanding of the subject matter. It will be understood that these examples are not intended to limit the scope of the claims to the illustrated embodiments. On the contrary, they are intended to cover such alternatives, modifications, and equivalents as may be included within the scope of the disclosure.

In the present disclosure, physical units of data that are stored in a data platform—and that make up the content of, e.g., database tables in customer accounts—are referred to as micro-partitions. In different implementations, a data platform may store metadata in micro-partitions as well. The term “micro-partitions” is distinguished in this disclosure from the term “files,” which, as used herein, refers to data units such as image files (e.g., Joint Photographic Experts Group (JPEG) files, Portable Network Graphics (PNG) files, etc.), video files (e.g., Moving Picture Experts Group (MPEG) files, MPEG-4 (MP4) files, Advanced Video Coding High Definition (AVCHD) files, etc.), Portable Document Format (PDF) files, documents that are formatted to be compatible with one or more word-processing applications, documents that are formatted to be compatible with one or more spreadsheet applications, and/or the like. If stored internal to the data platform, a given file is referred to herein as an “internal file” and may be stored in (or at, or on, etc.) what is referred to herein as an “internal storage location.” If stored external to the data platform, a given file is referred to herein as an “external file” and is referred to as being stored in (or at, or on, etc.) what is referred to herein as an “external storage location.” These terms are further discussed below.

Computer-readable files come in several varieties, including unstructured files, semi-structured files, and structured files. These terms may mean different things to different people. As used herein, examples of unstructured files include image files, video files, PDFs, audio files, and the like; examples of semi-structured files include JavaScript Object Notation (JSON) files, eXtensible Markup Language (XML) files, and the like; and examples of structured files include Variant Call Format (VCF) files, Keithley Data File (KDF) files, Hierarchical Data Format version 5 (HDF5) files, and the like. As known to those of skill in the relevant arts, VCF files are often used in the bioinformatics field for storing, e.g., gene-sequence variations, KDF files are often used in the semiconductor industry for storing, e.g., semiconductor-testing data, and HDF5 files are often used in industries such as the aeronautics industry, in that case for storing data such as aircraft-emissions data. Numerous other example unstructured-file types, semi-structured-file types, and structured-file types, as well as example uses thereof, could certainly be listed here as well and will be familiar to those of skill in the relevant arts. Different people of skill in the relevant arts may classify types of files differently among these categories and may use one or more different categories instead of or in addition to one or more of these.

Data platforms are widely used for data storage and data access in computing and communication contexts. Concerning architecture, a data platform could be an on-premises data platform, a network-based data platform (e.g., a cloud-based data platform), a combination of the two, and/or include another type of architecture. Concerning the type of data processing, a data platform could implement online analytical processing (OLAP), online transactional processing (OLTP), a combination of the two, and/or another type of data processing. Moreover, a data platform could be or include a relational database management system (RDBMS) and/or one or more other types of database management systems.

In a typical implementation, a data platform may include one or more databases that are respectively maintained in association with any number of customer accounts (e.g., accounts of one or more data providers), as well as one or more databases associated with a system account (e.g., an administrative account) of the data platform, one or more other databases used for administrative purposes, and/or one or more other databases that are maintained in association with one or more other organizations and/or for any other purposes. A data platform may also store metadata (e.g., account object metadata) in association with the data platform in general and in association with, for example, particular databases and/or particular customer accounts as well. Users and/or executing processes that are associated with a given customer account may, via one or more types of clients, be able to cause data to be ingested into the database, and may also be able to manipulate the data, add additional data, remove data, run queries against the data, generate views of the data, and so forth. As used herein, the terms “account object metadata” and “account object” are used interchangeably.

In an implementation of a data platform, a given database (e.g., a database maintained for a customer account) may reside as an object within, e.g., a customer account, which may also include one or more other objects (e.g., users, roles, grants, shares, warehouses, resource monitors, integrations, network policies, and/or the like). Furthermore, a given object such as a database may itself contain one or more objects such as schemas, tables, materialized views, and/or the like. A given table may be organized as a collection of records (e.g., rows) so that each includes a plurality of attributes (e.g., columns). In some implementations, database data is physically stored across multiple storage units, which may be referred to as files, blocks, partitions, micro-partitions, and/or by one or more other names. In many cases, a database on a data platform serves as a backend for one or more applications that are executing on one or more application servers.

The disclosed data recovery (DR) techniques can be configured and performed by a DR manager in a network-based database system to automate data recovery including recovery of historical table data.

Conventional techniques for recovery of historical table data in the event of system failures or user mistakes (or other data availability issues associated with extreme operational disasters) include a multi-step process that is resource-intensive, slow, and inefficient. For example, existing techniques include copying data from a source table’s files and using a custom script to create a new table. If the recovery includes a large amount of data, then data recovery can take a very long time as copying the data is time-consuming. Additionally, this conventional data recovery process has multiple manual steps which need to be handled by support engineers to run, rendering the process error-prone and difficult to sustain during a large-scale recovery. For example, if a source table is dropped and expired, recreating the table schema can be a painstaking manual process.

The disclosed DR techniques can be performed by a DR manager (e.g., as described herein) and can be used to configure the failsafe recovery process more efficiently, resulting in an efficient DR experience. More specifically, the disclosed DR techniques associates recovered data files of a table (e.g., partition files of a table) with a newly recovered table (e.g., a table having the same schema as a deleted table), without the need to copy the data from the recovered data files. Since the disclosed DR techniques include fewer manual steps than conventional DR techniques, DR using the disclosed techniques is error-proof and efficient. Additionally, since the disclosed techniques are associated with reduced involvement of database support engineers, failsafe recovery can be automated to address any increased recovery demands in a network-based database system.

1 3 FIGS.- 4 FIG. 5 FIG. 6 FIG. 7 12 FIGS.- 13 FIG. The various embodiments that are described herein are described with reference where appropriate to one or more of the various figures. An example computing environment including a DR manager configured to perform DR-related functions is discussed in connection with. Example lifecycles of data are illustrated inand. An example DR manager in communication with a metadata compactor manager and a file delete manager is illustrated in. Example DR-related functionalities performed using the disclosed techniques are discussed in connection with. A more detailed discussion of example computing devices that may be used with the disclosed techniques is provided in connection with.

1 FIG. 1 FIG. 100 102 100 100 101 102 104 101 illustrates an example computing environmentthat includes a database system in the example form of a network-based database system, according to some example embodiments. To avoid obscuring the inventive subject matter with unnecessary detail, various functional components that are not germane to conveying an understanding of the inventive subject matter have been omitted from. However, a skilled artisan will readily recognize that various additional functional components may be included as part of the computing environmentto facilitate additional functionality that is not specifically described herein. In other embodiments, the computing environment may comprise another type of network-based database system or a cloud data platform. For example, in some aspects, the computing environmentmay include a cloud computing platformwith the network-based database system, and a storage platform(also referred to as a cloud storage platform). The cloud computing platformprovides computing resources and storage resources that may be acquired (purchased) or leased and configured to execute applications and store data.

101 103 101 101 104 122 110 108 The cloud computing platformmay host a cloud computing servicethat facilitates storage of data on the cloud computing platform(e.g., data management and access) and analysis functions (e.g. SQL queries, analysis), as well as other processing capabilities (e.g., configuring replication group objects as described herein). The cloud computing platformmay include a three-tier architecture: data storage (e.g., storage platformsand), an execution platform(e.g., providing query processing), and a compute service managerproviding cloud services.

101 It is often the case that organizations that are customers of a given data platform also maintain data storage (e.g., a data lake) that is external to the data platform (i.e., one or more external storage locations). For example, a company could be a customer of a particular data platform and also separately maintain storage of any number of files—be they unstructured files, semi-structured files, structured files, and/or files of one or more other types—on, as examples, one or more of their servers and/or on one or more cloud-storage platforms such as AMAZON WEB SERVICES™ (AWS™), MICROSOFT® AZURE®, GOOGLE CLOUD PLATFORM™, and/or the like. The customer’s servers and cloud-storage platforms are both examples of what a given customer could use as what is referred to herein as an external storage location. The cloud computing platformcould also use a cloud-storage platform as what is referred to herein as an internal storage location concerning the data platform.

102 101 From the perspective of the network-based database systemof the cloud computing platform, one or more files that are stored at one or more storage locations are referred to herein as being organized into one or more of what is referred to herein as either “internal stages” or “external stages.” Internal stages are stages that correspond to data storage at one or more internal storage locations, and where external stages are stages that correspond to data storage at one or more external storage locations. In this regard, external files can be stored in external stages at one or more external storage locations, and internal files can be stored in internal stages at one or more internal storage locations, which can include servers managed and controlled by the same organization (e.g., company) that manages and controls the data platform, and which can instead or in addition include data-storage resources operated by a storage provider (e.g., a cloud-storage platform) that is used by the data platform for its “internal” storage. The internal storage of a data platform is also referred to herein as the “storage platform” of the data platform. It is further noted that a given external file that a given customer stores at a given external storage location may or may not be stored in an external stage in the external storage location – i.e., in some data-platform implementations, it is a customer’s choice whether to create one or more external stages (e.g., one or more external-stage objects) in the customer’s data-platform account as an organizational and functional construct for conveniently interacting via the data platform with one or more external files.

102 101 104 122 102 104 104 102 ® ® As shown, the network-based database systemof the cloud computing platformis in communication with the cloud storage platformsand(e.g., AWS, Microsoft Azure Blob Storage, or Google Cloud Storage). The network-based database systemis a network-based system used for reporting and analysis of integrated data from one or more disparate sources including one or more storage locations within the cloud storage platform. The cloud storage platformcomprises a plurality of computing machines and provides on-demand computer system resources such as data storage and computing power to the network-based database system.

102 108 110 112 102 The network-based database systemcomprises a compute service manager, an execution platform, and one or more metadata databases. The network-based database systemhosts and provides data reporting and analysis services to multiple client accounts.

108 102 108 108 108 The compute service managercoordinates and manages operations of the network-based database system. The compute service manageralso performs query optimization and compilation as well as managing clusters of computing services that provide compute resources (also referred to as “virtual warehouses”). The compute service managercan support any number of client accounts such as end-users providing data storage and retrieval requests, system administrators managing the systems and methods described herein, and other components/devices that interact with compute service manager.

108 114 114 102 114 108 114 114 101 103 106 115 114 The compute service manageris also in communication with a client device. The client devicecorresponds to a user of one of the multiple client accounts supported by the network-based database system. A user may utilize the client deviceto submit data storage, retrieval, and analysis requests to the compute service manager. Client device(also referred to as remote computing device or user device) may include one or more of a laptop computer, a desktop computer, a mobile phone (e.g., a smartphone), a tablet computer, a cloud-hosted computer, cloud-hosted serverless processes, or other computing processes or devices may be used (e.g., by a data provider) to access services provided by the cloud computing platform(e.g., cloud computing service) by way of a network, such as the Internet or a private network. A data consumercan use another computing device to access the data of the data provider (e.g., data obtained via the client device).

114 114 114 114 103 In the description below, actions are ascribed to users, particularly consumers and providers. Such actions shall be understood to be performed concerning client device (or devices)operated by such users. For example, a notification to a user may be understood to be a notification transmitted to the client device, input or instruction from a user may be understood to be received by way of the client device, and interaction with an interface by a user shall be understood to be interaction with the interface on the client device. In addition, database operations (joining, aggregating, analysis, etc.) ascribed to a user (consumer or provider) shall be understood to include performing such actions by the cloud computing servicein response to an instruction from that user.

114 128 130 102 106 130 132 108 In some embodiments, the client deviceis configured with an application connector, which may be configured to perform DR configuration functions (e.g., select a time-travel duration for a period during which deleted data can be recovered). In some aspects, the DR configuration functions are used to generate one or more DR configurationsfor communication to the network-based database systemvia network. For example, DR configurationscan be communicated to DR managerwithin the compute service manager.

108 112 102 112 112 104 112 112 The compute service manageris also coupled to one or more metadata databasesthat store metadata about various functions and aspects associated with the network-based database systemand its users. For example, a metadata databasemay include a summary of data stored in remote data storage systems as well as data available from a local cache. Additionally, a metadata databasemay include information regarding how data is organized in remote data storage systems (e.g., the cloud storage platform) and the local caches. Information stored by a metadata databaseallows systems and services to determine whether a piece of data needs to be accessed without loading or accessing the actual data from a storage device. In some embodiments, metadata databaseis configured to store account object metadata (e.g., account objects used in connection with a replication group object).

108 110 110 110 104 122 104 120 1 120 120 1 120 120 1 120 120 1 120 104 126 120 1 120 124 122 3 FIG. TM The compute service manageris further coupled to the execution platform, which provides multiple computing resources that execute various data storage and data retrieval tasks. As illustrated in, the execution platformcomprises a plurality of compute nodes. The execution platformis coupled to storage platformand cloud storage platforms. The storage platformcomprises multiple data storage devices-to-N. In some embodiments, the data storage devices-to-N are cloud-based storage devices located in one or more geographic locations. For example, the data storage devices-to-N may be part of a public cloud infrastructure or a private cloud infrastructure. The data storage devices-to-N may be hard disk drives (HDDs), solid-state drives (SSDs), storage clusters, Amazon S3storage systems, or any other data-storage technology. Additionally, the cloud storage platformmay include distributed file systems (such as Hadoop Distributed File Systems (HDFS)), object storage systems, and the like. In some embodiments, at least one internal stagemay reside on one or more of the data storage devices-–-N, and at least one external stagemay reside on one or more of the cloud storage platforms.

108 132 132 130 132 130 102 4 12 FIGS.- In some embodiments, the compute service managerincludes the DR manager. The DR managercomprises suitable circuitry, interfaces, logic, and/or code and is configured to perform DR-related functions which can be based (at least partially) on one or more of the DR configurations. For example, the DR managercan configure one or more of the DR functionalities discussed in connection withbased on the DR configurationsor other configurations related to data recovery within the network-based database system.

132 102 132 104 132 132 4 5 FIGS.- In some aspects, the DR managercan include one or more system functions that can be used to check the source table’s information and determine if it should be recovered by failsafe recovery (e.g., using the disclosed techniques). The different types of data recovery which may be available in the network-based database system(e.g., time-travel and failsafe recovery) are discussed in connection with. If failsafe recovery is available (e.g., to an account of a customer of the network-based database system), the DR managerwill undelete the partition files associated with the source table from internal storage (e.g., AWS S3 storage) (e.g., internal stages of storage platform), if such partition files are deleted in storage. After the partition files are made available, the DR managercan create a new table based on the source table’s metadata information. The partition files can then be associated with the new table (e.g., similar to a cloning process). Since such a cloning process is only a metadata operation without copying a significant amount of data, the DR-related functions performed by the DR managerare more efficient in comparison to conventional DR techniques which involve copying a significant amount of data.

108 104 110 104 In some embodiments, during a DR process, the compute service managercan use the disclosed DR functionalities to check if the files are still available or not in the storage platform, and recover the files if they are marked as “deleted.” After that, the execution platformcan scan those files in the storage platform.

100 In some embodiments, communication links between elements of the computing environmentare implemented via one or more data communication networks. These data communication networks may utilize any communication protocol and any type of communication medium. In some embodiments, the data communication networks are a combination of two or more data communication networks (or sub-Networks) coupled to one another. In alternate embodiments, these communication links are implemented using any type of communication medium and any communication protocol.

108 112 110 104 108 112 110 104 108 112 110 104 102 102 1 FIG. The compute service manager, metadata database(s), execution platform, and storage platform, are shown inas individual discrete components. However, each of the compute service manager, metadata database(s), execution platform, and storage platformmay be implemented as a distributed system (e.g., distributed across multiple systems/platforms at multiple geographic locations). Additionally, each of the compute service manager, metadata database(s), execution platform, and storage platformcan be scaled up or down (independently of one another) depending on changes to the requests received and the changing needs of the network-based database system. Thus, in the described embodiments, the network-based database systemis dynamic and supports regular changes to meet the current data processing needs.

102 108 108 108 108 110 108 110 112 108 110 110 104 110 104 During a typical operation, the network-based database systemprocesses multiple jobs determined by the compute service manager. These jobs are scheduled and managed by the compute service managerto determine when and how to execute the job. For example, the compute service managermay divide the job into multiple discrete tasks and may determine what data is needed to execute each of the multiple discrete tasks. The compute service managermay assign each of the multiple discrete tasks to one or more nodes of the execution platformto process the task. The compute service managermay determine what data is needed to process a task and further determine which nodes within the execution platformare best suited to process the task. Some nodes may have already cached the data needed to process the task and, therefore, be a good candidate for processing the task. Metadata stored in a metadata databaseassists the compute service managerin determining which nodes in the execution platformhave already cached at least a portion of the data needed to process the task. One or more nodes in the execution platformprocess the task using data cached by the nodes and, if necessary, data retrieved from the cloud storage platform. It is desirable to retrieve as much data as possible from caches within the execution platformbecause the retrieval speed is typically much faster than retrieving data from the cloud storage platform.

1 FIG. 101 100 110 104 110 120 1 120 104 120 1 120 104 As shown in, the cloud computing platformof the computing environmentseparates the execution platformfrom the storage platform. In this arrangement, the processing resources and cache resources in the execution platformoperate independently of the data storage devices-to-N in the cloud storage platform. Thus, the computing resources and cache resources are not restricted to specific data storage devices-to-N. Instead, all computing resources and all cache resources may retrieve data from, and store data to, any of the data storage resources in the cloud storage platform.

2 FIG. 2 FIG. 108 108 202 204 206 112 202 204 204 206 204 202 206 is a block diagram illustrating components of the compute service manager, according to some example embodiments. As shown in, the compute service managerincludes an access managerand a key managercoupled to an access metadata database, which is an example of the metadata database(s). Access managerhandles authentication and authorization tasks for the systems described herein. The key managerfacilitates the use of remotely stored credentials to access external resources such as data resources in a remote storage device. As used herein, the remote storage devices may also be referred to as “persistent storage devices” or “shared storage devices.” For example, the key managermay create and maintain remote credential store definitions and credential objects (e.g., in the access metadata database). A remote credential store definition identifies a remote credential store and includes access information to access security credentials from the remote credential store. A credential object identifies one or more security credentials using non-sensitive information (e.g., text strings) that are to be retrieved from a remote credential store for use in accessing an external resource. When a request invoking an external resource is received at run time, the key managerand access manageruse information stored in the access metadata database(e.g., a credential object and a credential store definition) to retrieve security credentials used to access the external resource from a remote credential store.

208 208 110 104 A request processing servicemanages received data storage requests and data retrieval requests (e.g., jobs to be performed on database data). For example, the request processing servicemay determine the data to process a received query (e.g., a data storage request or data retrieval request). The data may be stored in a cache within the execution platformor in a data storage device in storage platform.

210 210 A management console servicesupports access to various systems and processes by administrators and other system managers. Additionally, the management console servicemay receive a request to execute a job and monitor the workload on the system.

108 212 214 216 212 214 214 216 108 The compute service manageralso includes a job compiler, a job optimizer, and a job executor. The job compilerparses a job into multiple discrete tasks and generates the execution code for each of the multiple discrete tasks. The job optimizerdetermines the best method to execute the multiple discrete tasks based on the data that needs to be processed. Job optimizeralso handles various data pruning operations and other data optimization techniques to improve the speed and efficiency of executing the job. The job executorexecutes the execution code for jobs received from a queue or determined by the compute service manager.

218 110 218 108 110 218 110 220 110 220 A job scheduler and coordinatorsends received jobs to the appropriate services or systems for compilation, optimization, and dispatch to the execution platform. For example, jobs may be prioritized and then processed in that prioritized order. In an embodiment, the job scheduler and coordinatordetermines a priority for internal jobs that are scheduled by the compute service managerwith other “outside” jobs such as user queries that may be scheduled by other systems in the database but may utilize the same processing resources in the execution platform. In some embodiments, the job scheduler and coordinatoridentifies or assigns particular nodes in the execution platformto process particular tasks. A virtual warehouse managermanages the operation of multiple virtual warehouses implemented in the execution platform. For example, the virtual warehouse managermay generate query plans for executing received queries.

108 222 110 222 224 108 110 224 102 110 222 224 226 226 102 226 110 104 2 FIG. Additionally, the compute service managerincludes configuration and metadata manager, which manages the information related to the data stored in the remote data storage devices and the local buffers (e.g., the buffers in execution platform). Configuration and metadata manageruses metadata to determine which data files need to be accessed to retrieve data for processing a particular task or job. A monitor and workload analyzeroversees processes performed by the compute service managerand manages the distribution of tasks (e.g., workload) across the virtual warehouses and execution nodes in the execution platform. The monitor and workload analyzeralso redistributes tasks, as needed, based on changing workloads throughout the network-based database systemand may further redistribute tasks based on a user (e.g., “external”) query workload that may also be processed by the execution platform. The configuration and metadata managerand the monitor and workload analyzerare coupled to a data storage device. The data storage deviceinrepresents any data storage device within the network-based database system. For example, data storage devicemay represent buffers in execution platform, storage devices in storage platform, or any other storage device.

108 110 226 302 1 302 2 312 1 As described in embodiments herein, the compute service managervalidates all communication from an execution platform (e.g., the execution platform) to validate that the content and context of that communication are consistent with the task(s) known to be assigned to the execution platform. For example, an instance of the execution platform executing query A should not be allowed to request access to data source D (e.g., data storage device) that is not relevant to query A. Similarly, a given execution node (e.g., execution node-may need to communicate with another execution node (e.g., execution node-), and should be disallowed from communicating with a third execution node (e.g., execution node-) and any such illicit communication can be recorded (e.g., in a log or other location). Also, the information stored on a given execution node is restricted to data relevant to the current query and any other data is unusable, rendered so by destruction or encryption where the key is unavailable.

108 132 4 12 FIGS.- As previously mentioned, the compute service managerincludes the DR managerconfigured to perform the disclosed DR functionalities which are discussed in connection with at least.

3 FIG. 3 FIG. 110 110 1 301 1 2 301 2 301 110 110 104 is a block diagram illustrating components of the execution platform, according to some example embodiments. As shown in, the execution platformincludes multiple virtual warehouses, including virtual warehouse(or-), virtual warehouse(or-), and virtual warehouse N (or-N). Each virtual warehouse includes multiple execution nodes that each include a data cache and a processor. The virtual warehouses can execute multiple tasks in parallel by using multiple execution nodes. As discussed herein, the execution platformcan add new virtual warehouses and drop existing virtual warehouses in real-time based on the current processing needs of the systems and users. This flexibility allows the execution platformto quickly deploy large amounts of computing resources when needed without being forced to continue paying for those computing resources when they are no longer needed. All virtual warehouses can access data from any data storage device (e.g., any storage device in the cloud storage platform).

3 FIG. Although each virtual warehouse shown inincludes three execution nodes, a particular virtual warehouse may include any number of execution nodes. Further, the number of execution nodes in a virtual warehouse is dynamic, such that new execution nodes are created when additional demand is present, and existing execution nodes are deleted when they are no longer necessary.

120 1 120 120 1 120 120 1 120 104 120 1 120 1 FIG. 3 FIG. Each virtual warehouse is capable of accessing any of the data storage devices-to-N shown in. Thus, the virtual warehouses are not necessarily assigned to a specific data storage device-to-N and, instead, can access data from any of the data storage devices-to-N within the cloud storage platform. Similarly, each of the execution nodes shown incan access data from any of the data storage devices-to-N. In some embodiments, a particular virtual warehouse or a particular execution node may be temporarily assigned to a specific data storage device, but the virtual warehouse or execution node may later access data from any other data storage device.

3 FIG. 1 302 1 302 2 302 302 1 304 1 306 1 302 2 304 2 306 2 302 304 306 302 1 302 2 302 In the example of, virtual warehouseincludes three execution nodes-,-, and-N. Execution node-includes a cache-and a processor-. Execution node-includes a cache-and a processor-. Execution node-N includes a cache-N and a processor-N. Each execution node-,-, and-N is associated with processing one or more data storage and/or data retrieval tasks. For example, a virtual warehouse may handle data storage and data retrieval tasks associated with an internal service, such as a clustering service, a materialized view refresh service, a file compaction service, a storage procedure service, or a file upgrade service. In other implementations, a particular virtual warehouse may handle data storage and data retrieval tasks associated with a particular data storage system or a particular category of data.

1 2 312 1 312 2 312 312 1 314 1 316 1 312 2 314 2 316 2 312 314 316 3 322 1 322 2 322 322 1 324 1 326 1 322 2 324 2 326 2 322 324 326 Similar to virtual warehousediscussed above, virtual warehouseincludes three execution nodes-,-, and-N. Execution node-includes a cache-and a processor-. Execution node-includes a cache-and a processor-. Execution node-N includes a cache-N and a processor-N. Additionally, virtual warehouseincludes three execution nodes-,-, and-N. Execution node-includes a cache-and a processor-. Execution node-includes a cache-and a processor-. Execution node-N includes a cache-N and a processor-N.

3 FIG. In some embodiments, the execution nodes shown inare stateless with respect to the data being cached by the execution nodes. For example, these execution nodes do not store or otherwise maintain state information about the execution node or the data being cached by a particular execution node. Thus, in the event of an execution node failure, the failed node can be transparently replaced by another node. Since there is no state information associated with the failed execution node, the new (replacement) execution node can easily replace the failed node without concern for recreating a particular state.

3 FIG. 3 FIG. 104 104 Although the execution nodes shown ineach includes one data cache and one processor, alternative embodiments may include execution nodes containing any number of processors and any number of caches. Additionally, the caches may vary in size among the different execution nodes. The caches shown instore, in the local execution node, data that was retrieved from one or more data storage devices in the cloud storage platform. Thus, the caches reduce or eliminate the bottleneck problems occurring in platforms that consistently retrieve data from remote storage systems. Instead of repeatedly accessing data from the remote storage devices, the systems and methods described herein access data from the caches in the execution nodes, which is significantly faster and avoids the bottleneck problem discussed above. In some embodiments, the caches are implemented using high-speed memory devices that provide fast access to the cached data. Each cache can store data from any of the storage devices in the cloud storage platform.

Further, the cache resources and computing resources may vary between different execution nodes. For example, one execution node may contain significant computing resources and minimal cache resources, making the execution node useful for tasks that require significant computing resources. Another execution node may contain significant cache resources and minimal computing resources, making this execution node useful for tasks that require caching of large amounts of data. Yet another execution node may contain cache resources providing faster input-output operations, useful for tasks that require fast scanning of large amounts of data. In some embodiments, the cache resources and computing resources associated with a particular execution node are determined when the execution node is created, based on the expected tasks to be performed by the execution node.

104 120 1 Additionally, the cache resources and computing resources associated with a particular execution node may change over time based on changing tasks performed by the execution node. For example, an execution node may be assigned more processing resources if the tasks performed by the execution node become more processor-intensive. Similarly, an execution node may be assigned more cache resources if the tasks performed by the execution node require a larger cache capacity. Further, some nodes may be executing much slower than others due to various issues (e.g., virtualization issues and network overhead). In some example embodiments, the imbalances are addressed at the scan level using a file-stealing scheme. In particular, whenever a node process completes scanning its set of input files, it requests additional files from other nodes. If one of the other nodes receives such a request, the node analyzes its own set (e.g., how many files are left in the input file set when the request is received), and then transfers ownership of one or more of the remaining files for the duration of the current job (e.g., query). The requesting node (e.g., the file stealing node) then receives the data (e.g., header data) and downloads the files from the cloud storage platform(e.g., from data storage device-), and does not download the files from the transferring node. In this way, lagging nodes can transfer files via file stealing in a way that does not worsen the load on the lagging nodes.

1 2 110 1 1 2 Although virtual warehouses,, and N are associated with the same execution platform, virtual warehouses, …, N may be implemented using multiple computing systems at multiple geographic locations. For example, virtual warehousecan be implemented by a computing system at a first geographic location, while virtual warehousesand N are implemented by another computing system at a second geographic location. In some embodiments, these different computing systems are cloud-based computing systems maintained by one or more different entities.

3 FIG. 1 302 1 302 2 302 Additionally, each virtual warehouse is shown inas having multiple execution nodes. The multiple execution nodes associated with each virtual warehouse may be implemented using multiple computing systems at multiple geographic locations. For example, an instance of virtual warehouseimplements execution nodes-and-on one computing platform at a geographic location, and execution node-N at a different computing platform at another geographic location. Selecting particular computing systems to implement an execution node may depend on various factors, such as the level of resources needed for a particular execution node (e.g., processing resource requirements and cache requirements), the resources available at particular computing systems, communication capabilities of networks within a geographic location or between geographic locations, and which computing systems are already implementing other execution nodes in the virtual warehouse.

110 Execution platformis also fault-tolerant. For example, if one virtual warehouse fails, that virtual warehouse is quickly replaced with a different virtual warehouse at a different geographic location.

110 A particular execution platformmay include any number of virtual warehouses. Additionally, the number of virtual warehouses in a particular execution platform is dynamic, such that new virtual warehouses are created when additional processing and/or caching resources are needed. Similarly, existing virtual warehouses may be deleted when the resources associated with the virtual warehouse are no longer necessary.

104 In some embodiments, the virtual warehouses may operate on the same data in the cloud storage platform, but each virtual warehouse has its execution nodes with independent processing and caching resources. This configuration allows requests on different virtual warehouses to be processed independently and with no interference between the requests. This independent processing, combined with the ability to dynamically add and remove virtual warehouses, supports the addition of new processing capacity for new users without impacting the performance observed by the existing users.

In some aspects, table data can be divided into one or more micro-partitions, which are contiguous units of storage. As used herein, the terms “partition files” (or “partition data files”) and micro-partitions are interchangeable. In this regard, source table data can be stored as multiple partition files associated with the source table.

4 FIG. 4 FIG. 400 402 102 404 is diagramillustrating the lifecycle of a table, according to some example embodiments. Referring to, a source table is available to a customer account during the live stage period. In some aspects, the network-based database systemprovides customers with features to ensure the availability of their data. Specifically, during the time travel period, customer accounts can restore a historical version of the source table for up to a first threshold period (e.g., 90 days from deletion) which can be configured by the customer account.

406 102 404 406 408 During the failsafe period, the customer account can request the network-based database systemto recover the source table after the time travel periodfor up to an additional second threshold period (e.g., 7 days, or another fixed threshold, after the time travel period ends and which fixed threshold may not be changed by the customer account). After the failsafe period, the source table is purged during the purge periodand data recovery is no longer possible.

5 FIG. 5 FIG. 500 502 508 402 408 502 402 404 504 is a diagramillustrating the lifecycle of a partition data file associated with a table, according to some example embodiments. Referring to, lifecycle stages-of a partition data file correspond to the lifecycle stages-of a source table. The partition data file is created during a creation period. The partition data file of the source table is available to a customer account during the creation period. During the time travel period, a live versionof the partition table is available, and the customer accounts can use the live version of the table for up to a first threshold period (e.g., 90 days from deletion) which can be configured by the customer account.

406 506 406 1 2 During the failsafe period, the partition data file is deleted and can be referred to as a deleted version. In some aspects, a partition data file is not purged during the failsafe period. Instead, the partition data file (or each of multiple partition data files associated with a source table) can include a deleted file designation. For example, such a deleted file designation can be the partition file version number (e.g., the live version designation can be ver.and the deleted file designation can be ver.).

408 508 During the purge period, the partition data file is purged and can be referred to as a purged version.

In some aspects, source table data can be considered as successfully committed once the partition file is created and a corresponding metadata file associated with the current version of the table has a reference to the version of the partition file that was created.

6 FIG. 1 FIG. 6 FIG. 600 132 132 602 604 606 is a diagramof the data recovery managerin communication with a metadata compactor manager and a file delete manager which can be used in the network-based database system of, according to some example embodiments. Referring to, DR manageris in communication with the metadata compactor manager, the file delete manager, and the system function manager.

602 604 602 604 The metadata compactor manager(which can also be referred to as a file compactor, metadata compactor, or file compactor manager) comprises suitable circuitry, logic, interfaces, and/or code and is configured to monitor the lifecycle of partition files associated with one or more source tables and determine which files have to be deleted. The file delete managercomprises suitable circuitry, logic, interfaces, and/or code and is configured to delete the partition files that the metadata compactor managerhas determined to be deleted. In some aspects, the file delete managercan add a deleted file designation to a partition file (e.g., increase the version number of the file or use a different designation) to indicate the file as a deleted file (without purging the file). In this regard, a deleted file (which is not purged) can be recovered (or undeleted) when the deleted file designation is removed.

606 132 606 608 132 610 608 610 8 11 FIGS.- The system function managercomprises suitable circuitry, logic, interfaces, and/or code and can configure one or more system functions used by the DR managerin connection with DR-related functionalities. For example, the system function managercan configure system functionof the DR managerfor recovering failsafe data (also referred to as recover_failsafe_data function) and system functionfor undeleting files (also referred to as undelete_files system function). The use of system functionsandin connection with DR-related functions is discussed further below in connection with.

404 602 604 In some aspects, during the time travel period, a metadata layer associated with a metadata file remembers what subset of the metadata files to associate with a specific version of the source table that is relevant at a given point in time. When time crosses the associated time travel retention on the table, the metadata compactor manageridentifies what files can be safely moved into the failsafe period. In some aspects, the names of these partition files can be grouped into a metadata file of the table or persisted into a database. In some aspects, the list of the partition files is polled periodically by the file delete managerand can be subsequently moved into the failsafe window by creating a new version of the file that has a delete marker (e.g., a deleted file designation) on it.

6 FIG. 132 602 604 606 132 602 604 606 Even thoughillustrates the DR managerbeing in communication with, and separate from, the metadata compactor manager, the file delete manager, and the system function manager, the disclosure is not limited in this regard. In some embodiments, the DR managercan incorporate one or more of the functionalities performed by the metadata compactor manager, the file delete manager, and the system function manager.

7 FIG. 700 is a diagramillustrating different data recoverability cases associated with different table lifecycle stages, according to some example embodiments.

702 602 Diagramillustrates a first recoverability case when the table is still alive, compaction by the metadata compactor manageris not disabled, and the current timestamp is t. For the timestamps within the time travel range (e.g., [t - X days, t]), customers can recover their data by using a time travel SQL command directly (e.g., SQL command “SELECT/CLONE … AT | BEFORE …”).

For the timestamps within the failsafe time range (e.g., [t - X - 7 days, t - X days]), data can be recovered using the disclosed techniques.

For the timestamps beyond the failsafe period (e.g., [..., t - X - 7 days]), data may not be recovered and there can be no guarantee of recovery even if some data may not be purged yet.

704 602 Diagramillustrates a second recoverability case when the table is still alive, compaction by the metadata compactor managerwas stopped at time t0, and the current timestamp is t1. For the timestamps within the time travel range (e.g., [t0 - X days, t1]), data can be recovered by using the time travel SQL command directly (e.g., SQL command “SELECT/CLONE … AT | BEFORE …”).

For the timestamps within the failsafe time range (e.g., [t1 - X - 7 days, t0 - X days]), data can be recovered using the disclosed techniques. This time window will shrink with time going forward, as t1 is the current timestamp, and eventually, the recoverable window will disappear.

For the timestamps beyond the failsafe period (e.g., [..., t1 - X - 7 days]), data may not be recovered, and there can be no guarantee of recovery even if some data may not be purged yet.

706 602 Diagramillustrates a third recoverability case when the table is dropped at time t0, the current timestamp is t1, and compaction by the metadata compactor managerwas not stopped. For the timestamps within the time travel range (e.g., [t1 - X days, t0]), data can be recovered by using the “UNDROP …” SQL command (e.g., the command can be used to recover the table at time t0). Then, a time travel SQL command “SELECT/CLONE … AT | BEFORE …” can be executed to recover the data at a particular timestamp within the time travel range [t1 - X days, t0].

For the timestamps within the failsafe time range (e.g., [t1 - X - 7 days, t1 - X days]), data can be recovered using the disclosed techniques. However, if there was any schema change in the table, it may not work and the changes on the schema may need to be confirmed with the customer account. This time window can shrink with time going forward, as t1 is the current timestamp, and eventually, the recoverable window will disappear.

For the timestamps beyond the failsafe period (e.g., [..., t1 - X - 7 days]), data may not be recovered and there can be no guarantee of recovery even if some data may not be purged yet.

708 602 Diagramillustrates a fourth recoverability case when the table is dropped at time t0, compaction by the metadata compactor managerwas stopped at time t1, and the current timestamp is t2. For the timestamps within the time travel range (e.g., [t1 - X days, t0]), data can be recovered by using the “UNDROP …” SQL command (e.g., the SQL command can be used to recover the table at time t0). The customer can then run the time travel SQL command “SELECT/CLONE … AT | BEFORE …” to recover the data at a particular timestamp within the time travel range [t1 - X days, t0].

For the timestamps within the failsafe time range (e.g., [t2 - X - 7 days, t1 - X days]), data can be recovered using the disclosed techniques. However, if there was any schema change in the table, it may not work, and the changes on the schema may need to be confirmed with the customer account.

For the timestamps beyond the failsafe period (e.g., [..., t2 - X - 7 days]), data may not be recovered, and there can be no guarantee of recovery even if some data may not be purged yet.

In some embodiments, the following configurations for a partition file lifecycle implementation can be used for different cloud providers:

3 3 Versioning of data on AWS® S3 storage can be enabled and a life-cycle policy is associated with the data. In this regard, when a delete API is issued to an object, Scan create an “empty” version or tombstone for this data. Attempts to retrieve this data can result in an error. The lifecycle policy of table partition files indicates that once the data is tombstoned, it can be purged from Sany time after 7 days. To undelete this data, one has to explicitly delete the latest version of the object.

3 Blob storage in Azure® provides similar soft-delete functionality for objects with the ability to specify a similar life-cycle policy. Azure® may not transparently expose how it implements the soft delete (like Sdoes via versioning). However, it offers an undelete() API to recover such deleted objects.

While versioning is not leveraged, GCP has a concept of an “Event-Based Hold”. Objects that have an Event-Based Hold are not subject to life cycle policies. To emulate the behavior described in the other cloud provider sections, when an object is created an event-based hold is immediately added to it. When the object is deleted, the hold is removed and the object is made subject to the life-cycle policy which allows the cloud store to purge this object after 7 days. Thus to undelete such an object, the Event-Based Hold can be added back.

8 FIG. 6 FIG. 8 FIG. 800 800 132 602 604 610 is diagramillustrating example functionalities of the components of the data recovery manager of, according to some example embodiments. Referring to, diagramillustrates example file deletion functionalities performed by the DR managerand/or other components disclosed herein (e.g., the metadata compactor manager, the file delete manager, and the undelete_files system function).

604 604 604 As described above, the file delete manageris a component that is responsible for deleting partition files after they are no longer referenced by table metadata. In some aspects, different components convey to the file delete managerexplicitly what files need to be deleted via either a deletion list or grouping the files into a metadata file or a file deletion slice (e.g., (FILE_DELETION_SLICE). The file delete managercan scan through this list of files and issues batch delete calls which soft-delete objects in AWS/Azure/GCP.

604 602 602 604 610 For a regular table, the component that gives the file delete managerwhat files to delete is the metadata compactor manager. Even if the metadata compactor manageris stopped, the use of a push model could result in a race condition where the file delete managerun-does the work that is done by the undelete_files system function.

8 FIG. 602 604 808 802 806 810 610 812 804 814 summarizes this scenario. More specifically, the metadata compactor managercommunicates file candidates for deletion to the file delete managerat operation. An operatorcan issue a pause commandto pause metadata compaction. An undelete commandis then issued, which executes the undelete_files system function. A subsequent undelete commandis communicated to the cloud provider blob store, which returns an acknowledgment.

604 In some aspects, removing the file names from the deletion list may not be sufficient because the files can also be in deletion metadata files. Since file delete managerworks on a per-account basis, pausing it for an indefinite amount of time could have a potentially significant impact on operating costs.

132 In some aspects, the DR managercan generate a notification to the customer account communicating the data recovery request that a restore operation was successful or is guaranteed to be successful.

9 FIG. 13 FIG. 900 900 900 102 132 108 114 1300 900 900 102 is a flowchart of methodfor data recovery, according to some example embodiments. Methodmay be embodied in computer-readable instructions for execution by one or more hardware components (e.g., one or more processors) such that the operations of methodmay be performed by components of the network-based database system, such as a network node (e.g., a DR managerexecuting on a network node of the compute service manager) or a computing device (e.g., client device), one or both of which may be implemented as machineofperforming the disclosed functions. Accordingly, methodis described below, by way of example with reference thereto. However, it shall be appreciated that methodmay be deployed on various other hardware configurations and is not intended to be limited to deployment within the network-based database system.

900 At a high level, methodcan be based on the following functions: (a) retrieve the last known schema of the table for which data recovery is requested; (b) retrieve the list of the partition files for the version of the table corresponding to a historical timestamp; (c) recover the partition files by preventing them from being purged by cloud providers (e.g., this functionality can include deleting the “deleted version” of the file in S3 by removing the deleted file designation); and (d) create metadata corresponding to the recovered schema and partition files (e.g., also referred to as a clone-style recovery).

9 FIG. 608 610 The following sequence of operations illustrated insummarizes the data recovery process that is aided by system functionsand(e.g., also referred to as system$recover_failsafe_data and system$undelete_files).

902 At operation, a request to recover data (e.g., a request to recover a source table) is received and decoded. The request can be associated with recovering the source table data at time t during a failsafe period.

904 At operation, an account login is performed with corresponding account privileges. For example, a user of the customer account logs in with the necessary account privileges.

906 At operation, the functions of the metadata compactor manager are paused for the particular customer account.

908 608 At operation, the system function system$recover_failsafe_data (e.g., system function) is executed to recover the data. In some aspects, the recovered data is written into a temporary recovery table in the customer account (e.g., for security reasons). In some aspects, the temporary recovery table can be referred to as “_recoverydb._recoveryschema._recovery{table_name}” inside the customer account. If such a table already exists, then a timestamp suffix is added. In some aspects, a temporary/permanent table location can be specified to recover a table directly.

910 At operation, the functionalities of the metadata compactor manager can be resumed for the particular customer account.

912 At operation, the data is copied out of the recovery temporary table by cloning (e.g., based on performing a metadata operation associating the recovered partition files with the table), and then deletes the recovery temporary table.

10 FIG. 9 FIG. 13 FIG. 1000 1000 1000 102 132 108 114 1300 1000 1000 102 is a flowchart of methodfor recovering failsafe data which can be used by the method of, according to some example embodiments. Methodmay be embodied in computer-readable instructions for execution by one or more hardware components (e.g., one or more processors) such that the operations of methodmay be performed by components of the network-based database system, such as a network node (e.g., a DR managerexecuting on a network node of the compute service manager) or a computing device (e.g., client device), one or both of which may be implemented as machineofperforming the disclosed functions. Accordingly, methodis described below, by way of example with reference thereto. However, it shall be appreciated that methodmay be deployed on various other hardware configurations and is not intended to be limited to deployment within the network-based database system.

1002 At operation, a determination is made on whether the account (e.g., the consumer account communicating the request for data recovery of a source table) has privileges to access the source table and create a new table.

1004 4 FIG. 5 FIG. At operation, the table status is analyzed based on the table lifecycle information (e.g., using the lifecycle information ofand). For example, if the timestamp at the table can be recovered by time travel, a notification is returned so that time travel SQL commands can be executed. If the timestamp at the table passed the recoverable bound, then an error message is returned in response to the data recovery request. If the timestamp is still in the failsafe recoverable time range, then the subsequent operations can be executed.

1006 At operation, a new target table is created (e.g., table “targetTableName”) with a recovered table schema. This can be the last known schema for the source table.

1008 610 At operation, the partition files of the deleted table that belong to the specified timestamp are recovered using the undelete_files system function (e.g., system function).

1010 At operation, a metadata operation is performed to populate the new target table with the recovered partition files.

1012 At operation, a notification of the data recovery completion is generated and sent back in response to the data recovery request. If failure is detected during the data recovery, the target table is deleted. Otherwise, a notification is returned to the account sending the DR request (e.g., a notification “recovery to a table xxx is complete from table yyy at time t” is communicated back to the consumer account sending the DR request).

11 FIG. 10 FIG. 13 FIG. 1100 1100 1100 102 132 108 114 1300 1100 1100 102 is a flowchart of methodfor undeleting files which can be used by the method of, according to some example embodiments. Methodmay be embodied in computer-readable instructions for execution by one or more hardware components (e.g., one or more processors) such that the operations of methodmay be performed by components of the network-based database system, such as a network node (e.g., a DR managerexecuting on a network node of the compute service manager) or a computing device (e.g., client device), one or both of which may be implemented as machineofperforming the disclosed functions. Accordingly, methodis described below, by way of example with reference thereto. However, it shall be appreciated that methodmay be deployed on various other hardware configurations and is not intended to be limited to deployment within the network-based database system.

610 608 In some aspects, system function(e.g., the undelete_files system function) can be a building block of system function(e.g., the recover_failsafe_data system function) but can also be used as a standalone function for mitigation of data corruption events. In this regard, customer data can be recovered without having to increase retention. In some aspects, the timestamp of recovery need not be provided and all the data in the failsafe is undeleted for this account and table. In some aspects, this system function will not be exposed, and can only be invoked explicitly. This functionality can be used when there is a catastrophic data availability event.

1102 At operation, a list is created of partition files to recover. In some aspects, an iteration/review is performed of the active metadata files for the table version that is on or before the timestamp specified by the customer. Each of the metadata files is unpacked to retrieve a list of partition files that are referred to by that metadata file. If this list cannot be processed in memory in one iteration, then the function will process the files by loading chunks/subsets of file names in memory for processing.

1104 At operation, the list of file names of the partition files to process is persisted. This is the intent log of the undelete operation and can be performed through metadata files.

1106 At operation, the file delete manager for the account is notified. Before performing any undelete operations, the function notifies the file delete manager of the account. This synchronization takes the shape of a lock that exists on a per-account basis. This lock guarantees that any ongoing execution of file delete operations by the file delete manager for this account knows about the ongoing recovery operation or will acknowledge it when it runs at its scheduled time. If there are candidates in its own “deletion queue” that correspond to the list of files to undelete then it must purge them from its queue, else there is a risk that it will move these files into the failsafe zone after the undelete operation is deemed successful.

1108 3 At operation, the listed partition files are undeleted. In some aspects, the undelete function proceeds after the file delete manager knows about the recovery operation and what files it needs to ignore for an account. In some aspects, undeleting files refers to preventing them from being deleted. In Sstorage terms this would mean removing the “delete version” of the object (e.g., the deleted file designation). The list of the partition files is then partitioned into smaller chunks each of which can be submitted into a thread pool for undeletion. If all the files are undeleted successfully, then a notification is returned to the caller which can include the list of files that were prevented from being deleted.

1110 3 At operation, the list of file names is asynchronously cleaned. A file delete manager can be used to clean up the list of files. The file delete manager identifies the artifacts generated by the undelete operation that can be safely deleted and purges them from S.

12 FIG. 13 FIG. 1200 1200 1200 102 132 108 114 1300 1200 1200 102 is a flowchart of another methodfor data recovery, according to some example embodiments. Methodmay be embodied in computer-readable instructions for execution by one or more hardware components (e.g., one or more processors) such that the operations of methodmay be performed by components of the network-based database system, such as a network node (e.g., a DR managerexecuting on a network node of the compute service manager) or a computing device (e.g., client device), one or both of which may be implemented as machineofperforming the disclosed functions. Accordingly, methodis described below, by way of example with reference thereto. However, it shall be appreciated that methodmay be deployed on various other hardware configurations and is not intended to be limited to deployment within the network-based database system.

1202 At operation, a request to recover historical table data is decoded. The request can be received from an account of a data provider. The historical table data can include a plurality of partition files, and each of the plurality of partition files includes a deleted file designation.

1204 10 FIG. At operation, a recovery process of the plurality of partition files is performed based on the request, to obtain recovered partition files. An example recovery process is discussed in connection with.

1206 At operation, a schema associated with the historical table data is retrieved.

1208 At operation, metadata corresponding to the retrieved schema is generated (e.g., in the form of one or more metadata files).

1210 At operation, the metadata is associated with the recovered partition files to recover the historical table data. This can be performed as a metadata operation (e.g., cloning), without copying the underlying table data.

13 FIG. 13 FIG. 4 FIG. 12 FIG. 1300 1300 1300 1316 1300 1316 1200 1316 1300 1316 1300 114 108 110 1316 114 108 110 132 illustrates a diagrammatic representation of machinein the form of a computer system within which a set of instructions may be executed for causing machineto perform any one or more of the methodologies discussed herein, according to an example embodiment. Specifically,shows a diagrammatic representation of machinein the example form of a computer system, within which instructions(e.g., software, a program, an application, an applet, an app, or other executable code) for causing the machineto perform any one or more of the methodologies discussed herein may be executed. For example, instructionsmay cause machine 1300 to execute any one or more operations of method(or any other technique discussed herein, for example in connection with–). As another example, instructionsmay cause machineto implement one or more portions of the functionalities discussed herein. In this way, instructionsmay transform a general, non-programmed machine into a particular machine(e.g., the client device, the compute service manager, or a node in the execution platform) that is specially configured to carry out any one of the described and illustrated functions in the manner described herein. In yet another embodiment, instructionsmay configure the client device, the compute service manager, and/or a node in the execution platformto carry out any one of the described and illustrated functions in the manner described herein, which functions can be configured or performed by the DR manager.

1300 1300 1300 1316 1300 1300 1300 1316 In alternative embodiments, the machineoperates as a standalone device or may be coupled (e.g., networked) to other machines. In a networked deployment, machinemay operate in the capacity of a server machine or a client machine in a server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment. The machinemay comprise, but not be limited to, a server computer, a client computer, a personal computer (PC), a tablet computer, a laptop computer, a netbook, a smartphone, a mobile device, a network router, a network switch, a network bridge, or any machine capable of executing the instructions, sequentially or otherwise, that specify actions to be taken by the machine. Further, while only a single machineis illustrated, the term “machine” shall also be taken to include a collection of machinesthat individually or jointly execute the instructionsto perform any one or more of the methodologies discussed herein.

1300 1310 1330 1350 1302 1310 1312 1314 1316 1310 1316 1310 1300 13 FIG. Machineincludes processors, memory, and input/output (I/O) componentsconfigured to communicate with each other such as via a bus. In some example embodiments, the processors(e.g., a central processing unit (CPU), a reduced instruction set computing (RISC) processor, a complex instruction set computing (CISC) processor, a graphics processing unit (GPU), a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a radio-frequency integrated circuit (RFIC), another processor, or any suitable combination thereof) may include, for example, a processorand a processorthat may execute the instructions. The term “processor” is intended to include multi-core processorsthat may comprise two or more independent processors (sometimes referred to as “cores”) that may execute instructionscontemporaneously. Althoughshows multiple processors, machinemay include a single processor with a single core, a single processor with multiple cores (e.g., a multi-core processor), multiple processors with a single core, multiple processors with multiple cores, or any combination thereof.

1330 1332 1334 1336 1310 1302 1332 1334 1336 1316 1316 1332 1334 1338 1336 1310 1300 The memorymay include a main memory, a static memory, and a storage unit, all accessible to the processorssuch as via the bus. The main memory, the static memory, and the storage unitstores the instructionsembodying any one or more of the methodologies or functions described herein. The instructionsmay also reside, completely or partially, within the main memory, within the static memory, within machine storage mediumof the storage unit, within at least one of the processors(e.g., within the processor’s cache memory), or any suitable combination thereof, during execution thereof by the machine.

1350 1350 1300 1350 1350 1350 1352 1354 1352 1354 13 FIG. The I/O componentsinclude components to receive input, provide output, produce output, transmit information, exchange information, capture measurements, and so on. The specific I/O componentsthat are included in a particular machinewill depend on the type of machine. For example, portable machines such as mobile phones will likely include a touch input device or other such input mechanisms, while a headless server machine will likely not include such a touch input device. It will be appreciated that the I/O componentsmay include many other components that are not shown in. The I/O componentsare grouped according to functionality merely for simplifying the following discussion and the grouping is in no way limiting. In various example embodiments, the I/O componentsmay include output componentsand input components. The output componentsmay include visual components (e.g., a display such as a plasma display panel (PDP), a light-emitting diode (LED) display, a liquid crystal display (LCD), a projector, or a cathode ray tube (CRT)), acoustic components (e.g., speakers), other signal generators, and so forth. The input componentsmay include alphanumeric input components (e.g., a keyboard, a touch screen configured to receive alphanumeric input, a photo-optical keyboard, or other alphanumeric input components), point-based input components (e.g., a mouse, a touchpad, a trackball, a joystick, a motion sensor, or another pointing instrument), tactile input components (e.g., a physical button, a touch screen that provides location and/or force of touches or touch gestures or other tactile input components), audio input components (e.g., a microphone), and the like.

1350 1364 1300 1380 1370 1382 1372 1364 1380 1364 1370 1300 114 108 110 1370 114 102 104 Communication may be implemented using a wide variety of technologies. The I/O componentsmay include communication componentsoperable to couple the machineto a networkor devicesvia a couplingand a coupling, respectively. For example, communication componentsmay include a network interface component or another suitable device to interface with network. In further examples, communication componentsmay include wired communication components, wireless communication components, cellular communication components, and other communication components to provide communication via other modalities. The devicemay be another machine or any of a wide variety of peripheral devices (e.g., a peripheral device coupled via a universal serial bus (USB)). For example, as noted above, machinemay correspond to any one of the client devices, the compute service manager, or the execution platform, and devicemay include the client deviceor any other computing device described herein as being in communication with the network-based database systemor the cloud storage platform.

1330 1332 1334 1310 1336 1316 1316 1310 The various memories (e.g.,,,, and/or memory of the processor(s)and/or the storage unit) may store one or more sets of instructionsand data structures (e.g., software) embodying or utilized by any one or more of the methodologies or functions described herein. These instructions, when executed by the processor(s), cause various operations to implement the disclosed embodiments.

As used herein, the terms “machine-storage medium,” “device-storage medium,” and “computer-storage medium” mean the same thing and may be used interchangeably in this disclosure. The terms refer to single or multiple storage devices and/or media (e.g., a centralized or distributed database, and/or associated caches and servers) that store executable instructions and/or data. The terms shall accordingly be taken to include, but not be limited to, solid-state memories, and optical and magnetic media, including memory internal or external to processors. Specific examples of machine-storage media, computer-storage media, and/or device-storage media include non-volatile memory, including by way of example semiconductor memory devices, e.g., erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), field-programmable gate arrays (FPGAs), and flash memory devices; magnetic disks such as internal hard disks and removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. The terms “machine-storage media,” “computer-storage media,” and “device-storage media” specifically exclude carrier waves, modulated data signals, and other such media, at least some of which are covered under the term “signal medium” discussed below.

1380 1380 1380 1382 1382 1 3 x In various example embodiments, one or more portions of the networkmay be an ad hoc network, an intranet, an extranet, a virtual private network (VPN), a local-area network (LAN), a wireless LAN (WLAN), a wide-area network (WAN), a wireless WAN (WWAN), a metropolitan-area network (MAN), the Internet, a portion of the Internet, a portion of the public switched telephone network (PSTN), a plain old telephone service (POTS) network, a cellular telephone network, a wireless network, a Wi-Fi® network, another type of network, or a combination of two or more such networks. For example, networkor a portion of networkmay include a wireless or cellular network, and couplingmay be a Code Division Multiple Access (CDMA) connection, a Global System for Mobile communications (GSM) connection, or another type of cellular or wireless coupling. In this example, the couplingmay implement any of a variety of types of data transfer technology, such as Single Carrier Radio Transmission Technology (RTT), Evolution-Data Optimized (EVDO) technology, General Packet Radio Service (GPRS) technology, Enhanced Data rates for GSM Evolution (EDGE) technology, third Generation Partnership Project (GPP) including 3G, fourth-generation wireless (4G) networks, Universal Mobile Telecommunications System (UMTS), High-Speed Packet Access (HSPA), Worldwide Interoperability for Microwave Access (WiMAX), Long Term Evolution (LTE) standard, others defined by various standard-setting organizations, other long-range protocols, or other data transfer technology.

1316 1380 1364 1316 1372 1370 1316 1300 The instructionsmay be transmitted or received over the networkusing a transmission medium via a network interface device (e.g., a network interface component included in the communication components) and utilizing any one of several well-known transfer protocols (e.g., hypertext transfer protocol (HTTP)). Similarly, instructionsmay be transmitted or received using a transmission medium via coupling(e.g., a peer-to-peer coupling or another type of wired or wireless network coupling) to device. The terms “transmission medium” and “signal medium” mean the same thing and may be used interchangeably in this disclosure. The terms “transmission medium” and “signal medium” shall be taken to include any intangible medium that is capable of storing, encoding, or carrying the instructionsfor execution by the machine, and includes digital or analog communications signals or other intangible media to facilitate communication of such software. Hence, the terms “transmission medium” and “signal medium” shall be taken to include any form of a modulated data signal, carrier wave, and so forth. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal.

The terms “machine-readable medium,” “computer-readable medium,” and “device-readable medium” mean the same thing and may be used interchangeably in this disclosure. The terms are defined to include both machine-storage media and transmission media. Thus, the terms include both storage devices/media and carrier waves/modulated data signals.

The various operations of example methods described herein may be performed, at least partially, by one or more processors that are temporarily configured (e.g., by software) or permanently configured to perform the relevant operations. Similarly, the methods described herein may be at least partially processor-implemented. For example, at least some of the operations of the disclosed methods may be performed by one or more processors. The performance of certain operations may be distributed among the one or more processors, not only residing within a single machine but also deployed across several machines. In some example embodiments, the processor or processors may be located in a single location (e.g., within a home environment, an office environment, or a server farm), while in other embodiments the processors may be distributed across several locations.

Described implementations of the subject matter can include one or more features, alone or in combination as illustrated below by way of examples.

Example 1 is a system comprising: at least one hardware processor; and at least one memory storing instructions that cause the at least one hardware processor to perform operations comprising: decoding a request to recover historical table data, the request received from an account of a data provider, the historical table data comprising a plurality of partition files, and each of the plurality of partition files including a deleted file designation; performing based on the request, a recovery process of the plurality of partition files to obtain recovered partition files; retrieving a schema associated with the historical table data; generating metadata corresponding to the retrieved schema; and associating the metadata with the recovered partition files to recover the historical table data.

In Example 2, the subject matter of Example 1 includes operations associated with decoding the request during a failsafe period associated with the account of the data provider, the failsafe period corresponding to a duration during which the plurality of partition files are inaccessible by the data provider.

In Example 3, the subject matter of Example 2 includes subject matter where the schema is a latest schema available before a beginning time instance of the failsafe period.

In Example 4, the subject matter of Examples 2–3 includes operations associated with performing a verification the account of the data provider is authorized to generate the request to recover the historical table data based on at least one secure credential associated with sending the request.

In Example 5, the subject matter of Example 4 includes operations associated with initiating the recovery process when the verification is successful and the failsafe period has not expired.

In Example 6, the subject matter of Examples 1–5 includes subject matter where the operations for performing the recovery process further comprise generating a temporary table using the retrieved schema associated with the historical table data.

In Example 7, the subject matter of Example 6 includes operations associated with retrieving a plurality of active metadata files associated with the historical table data.

In Example 8, the subject matter of Example 7 includes operations associated with parsing the plurality of active metadata files to determine the plurality of partition files forming the historical table data.

In Example 9, the subject matter of Example 8 includes operations associated with placing a lock on previously scheduled deletions of at least one of the plurality of partition files.

In Example 10, the subject matter of Example 9 includes operations associated with removing the deleted file designation from each of the plurality of partition files to complete the recovery process.

Example 11 is a method comprising: decoding, by at least one hardware processor, a request to recover historical table data, the request received from an account of a data provider, the historical table data comprising a plurality of partition files, and each of the plurality of partition files including a deleted file designation; performing based on the request, a recovery process of the plurality of partition files to obtain recovered partition files; retrieving a schema associated with the historical table data; generating metadata corresponding to the retrieved schema; and associating the metadata with the recovered partition files to recover the historical table data.

In Example 12, the subject matter of Example 11 includes, decoding the request during a failsafe period associated with the account of the data provider, the failsafe period corresponding to a duration during which the plurality of partition files are inaccessible by the data provider.

In Example 13, the subject matter of Example 12 includes subject matter where the schema is a latest schema available before a beginning time instance of the failsafe period.

In Example 14, the subject matter of Examples 12–13 includes, performing a verification the account of the data provider is authorized to generate the request to recover the historical table data based on at least one secure credential associated with sending the request.

In Example 15, the subject matter of Example 14 includes, initiating the recovery process when the verification is successful and the failsafe period has not expired.

In Example 16, the subject matter of Examples 11–15 includes subject matter where the performing of the recovery process further comprises: generating a temporary table using the retrieved schema associated with the historical table data.

In Example 17, the subject matter of Example 16 includes, retrieving a plurality of active metadata files associated with the historical table data.

In Example 18, the subject matter of Example 17 includes, parsing the plurality of active metadata files to determine the plurality of partition files forming the historical table data.

In Example 19, the subject matter of Example 18 includes, placing a lock on previously scheduled deletions of at least one of the plurality of partition files.

In Example 20, the subject matter of Example 19 includes, removing the deleted file designation from each of the plurality of partition files to complete the recovery process.

Example 21 is a computer-storage medium comprising instructions that, when executed by one or more processors of a machine, configure the machine to perform operations comprising: decoding, by at least one hardware processor, a request to recover historical table data, the request received from an account of a data provider, the historical table data comprising a plurality of partition files, and each of the plurality of partition files including a deleted file designation; performing based on the request, a recovery process of the plurality of partition files to obtain recovered partition files; retrieving a schema associated with the historical table data; generating metadata corresponding to the retrieved schema; and associating the metadata with the recovered partition files to recover the historical table data.

In Example 22, the subject matter of Example 21 includes operations associated with decoding the request during a failsafe period associated with the account of the data provider, the failsafe period corresponding to a duration during which the plurality of partition files are inaccessible by the data provider.

In Example 23, the subject matter of Example 22 includes subject matter where the schema is a latest schema available before a beginning time instance of the failsafe period.

In Example 24, the subject matter of Examples 22–23 includes operations associated with performing a verification the account of the data provider is authorized to generate the request to recover the historical table data based on at least one secure credential associated with sending the request.

In Example 25, the subject matter of Example 24 includes operations associated with initiating the recovery process when the verification is successful and the failsafe period has not expired.

In Example 26, the subject matter of Examples 21–25 includes subject matter where the operations for performing the recovery process further comprise generating a temporary table using the retrieved schema associated with the historical table data.

In Example 27, the subject matter of Example 26 includes operations associated with retrieving a plurality of active metadata files associated with the historical table data.

In Example 28, the subject matter of Example 27 includes operations associated with parsing the plurality of active metadata files to determine the plurality of partition files forming the historical table data.

In Example 29, the subject matter of Example 28 includes operations associated with placing a lock on previously scheduled deletions of at least one of the plurality of partition files.

In Example 30, the subject matter of Example 29 includes operations associated with removing the deleted file designation from each of the plurality of partition files to complete the recovery process.

Example 31 is at least one machine-readable medium including instructions that, when executed by processing circuitry, cause the processing circuitry to perform operations to implement any of Examples 1–30.

Example 32 is an apparatus comprising means to implement any of Examples 1–30.

Example 33 is a system to implement any of Examples 1–30.

Example 34 is a method to implement any of Examples 1–30.

Although the embodiments of the present disclosure have been described concerning specific example embodiments, it will be evident that various modifications and changes may be made to these embodiments without departing from the broader scope of the inventive subject matter. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense. The accompanying drawings that form a part hereof show, by way of illustration, and not of limitation, specific embodiments in which the subject matter may be practiced. The embodiments illustrated are described in sufficient detail to enable those skilled in the art to practice the teachings disclosed herein. Other embodiments may be used and derived therefrom, such that structural and logical substitutions and changes may be made without departing from the scope of this disclosure. This Detailed Description, therefore, is not to be taken in a limiting sense, and the scope of various embodiments is defined only by the appended claims, along with the full range of equivalents to which such claims are entitled.

Such embodiments of the inventive subject matter may be referred to herein, individually and/or collectively, by the term “invention” merely for convenience and without intending to voluntarily limit the scope of this application to any single invention or inventive concept if more than one is disclosed. Thus, although specific embodiments have been illustrated and described herein, it should be appreciated that any arrangement calculated to achieve the same purpose may be substituted for the specific embodiments shown. This disclosure is intended to cover any adaptations or variations of various embodiments. Combinations of the above embodiments, and other embodiments not specifically described herein, will be apparent, to those of skill in the art, upon reviewing the above description.

In this document, the terms “a” or “an” are used, as is common in patent documents, to include one or more than one, independent of any other instances or usages of “at least one” or “one or more.” In this document, the term “or” is used to refer to a nonexclusive or, such that “A or B” includes “A but not B,” “B but not A,” and “A and B,” unless otherwise indicated. In the appended claims, the terms “including” and “in which” are used as the plain-English equivalents of the respective terms “comprising” and “wherein.” Also, in the following claims, the terms “including” and “comprising” are open-ended; that is, a system, device, article, or process that includes elements in addition to those listed after such a term in a claim is still deemed to fall within the scope of that claim.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G06F G06F11/1469 G06F16/2282 G06F2201/80 G06F2201/835

Patent Metadata

Filing Date

October 6, 2025

Publication Date

February 5, 2026

Inventors

Yi Fang

Kedar Nitin Shah

Yantao Song

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search