Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
1. A method for deduplicating data on a distributed file system, the method comprising: transmitting a write request from a client to a metadata server (“MDS”), wherein the write request comprises an object identifier associated with a data object, wherein the MDS maintains metadata identifying locations of data objects stored in object stores included in the distributed file system; receiving an object store location for an object store from the MDS and a first object designator assigned to the data object by the MDS, wherein the object store is separate from the MDS and wherein the object store stores data objects, wherein both the first object designator and the object identifier uniquely identify the data object and wherein the MDS server maps object designators to object identifiers; deduplicating the data object by: transmitting a metadata request to the object store using the object store location, wherein the metadata request includes the object identifier; receiving a metadata response from the object store; determining whether the metadata response contains a second object designator; transmitting a commit request to the MDS that includes the second object designator in response to determining the metadata response contains the second object designator, wherein the second object designator allows a number of instances of the data object in the distributed file system to be determined; and transmitting the data object that includes the first object designator to the object store in response to determining the metadata response does not contain any object designator and transmitting a commit request to the MDS that includes the first object designator.
A method for deduplicating data in a distributed file system addresses the challenge of efficiently storing and managing duplicate data objects across multiple storage locations. The system includes a metadata server (MDS) that tracks the locations of data objects in object stores and assigns unique object designators to each object. When a client sends a write request for a data object, the MDS provides the client with an object store location and a first object designator for the data object. The client then checks the object store for existing copies of the data object by sending a metadata request that includes the object identifier. If the object store responds with a second object designator, the client transmits a commit request to the MDS with this designator, allowing the system to track the number of instances of the data object. If no designator is found, the client stores the data object in the object store with the first designator and sends a commit request to the MDS with this designator. This method ensures that duplicate data objects are identified and managed efficiently, reducing storage redundancy in the distributed file system.
2. The method of claim 1 , wherein the metadata request is a HEAD request.
A system and method for efficiently retrieving metadata from a networked storage system addresses the problem of excessive data transfer when only metadata is needed. The invention involves a client device sending a metadata request to a storage server to obtain information about stored data without downloading the full data content. The metadata request is specifically a HEAD request, a type of HTTP request that retrieves only the headers (metadata) of a resource, such as file size, modification date, or content type, without transferring the actual data payload. This reduces network bandwidth usage and processing overhead, improving efficiency in scenarios where only metadata is required. The system may include a client device configured to generate and send the metadata request, a network interface for transmitting the request, and a storage server that processes the request and returns the requested metadata. The method involves the client device initiating the request, the storage server receiving and parsing the request, and the server responding with the metadata in a structured format. This approach is particularly useful in distributed storage systems, cloud computing environments, and applications requiring frequent metadata checks without full data retrieval.
3. The method of claim 1 , further comprising, when the metadata response does not include any object designator, transmitting the commit request to the MDS includes the first designator after transmitting the data object to the object store.
This invention relates to a system for managing data objects in a distributed storage environment, particularly addressing the challenge of ensuring data consistency and proper metadata tracking when storing objects. The system involves a metadata service (MDS) that manages metadata for data objects stored in an object store. When a client transmits a data object to the object store, it also sends a commit request to the MDS to update the metadata. The commit request includes a designator that uniquely identifies the data object. If the metadata response from the MDS does not include an object designator, the system transmits the commit request to the MDS again, this time including the first designator after the data object has been successfully stored in the object store. This ensures that the metadata is properly associated with the stored object, even if the initial commit request fails to include the designator. The method helps maintain data integrity by guaranteeing that metadata updates are correctly linked to their corresponding data objects, preventing inconsistencies in the storage system. The system is particularly useful in distributed environments where multiple clients may be accessing and modifying data objects simultaneously, requiring robust mechanisms to track and update metadata accurately.
4. The method of claim 1 , wherein the first object designator uniquely identifies the data object and wherein the second object designator uniquely identifies the data object.
A system and method for managing data objects in a distributed computing environment addresses the challenge of efficiently tracking and accessing data objects across multiple systems. The invention provides a mechanism for uniquely identifying data objects using two distinct object designators, ensuring consistency and reliability in data retrieval and manipulation. The first object designator serves as a primary identifier for the data object, while the second object designator provides an additional layer of uniqueness, reducing the risk of conflicts or ambiguities in object identification. This dual-identifier approach enhances data integrity by allowing systems to cross-reference and validate object identities, particularly in environments where objects may be replicated or distributed across different storage locations. The method ensures that both designators are synchronized and consistently applied, preventing discrepancies that could lead to errors in data processing or retrieval. By maintaining unique identification through two separate designators, the system improves fault tolerance and simplifies object management in complex distributed systems. The invention is particularly useful in scenarios requiring high availability and data consistency, such as cloud computing, distributed databases, or large-scale enterprise systems. The dual-identifier system minimizes the likelihood of identifier collisions and ensures that data objects remain accurately referenced across all interacting systems.
5. The method of claim 1 , wherein the object store is a cloud object store.
A system and method for managing data storage and retrieval involves a distributed object storage system that efficiently handles large-scale data operations. The system includes a distributed object store that stores data objects and a metadata service that tracks the location and attributes of these objects. The metadata service maintains a distributed index to enable fast lookups and updates, ensuring that data can be quickly accessed and modified across the distributed system. The object store is designed to handle high volumes of data with low latency, making it suitable for applications requiring scalable and reliable storage. In one implementation, the object store operates as a cloud-based object store, leveraging cloud infrastructure to provide scalable, durable, and highly available storage. The cloud object store integrates with cloud computing services, allowing users to store and retrieve data objects from remote cloud servers. This approach eliminates the need for local storage infrastructure, reducing costs and maintenance overhead while ensuring data is accessible from anywhere with an internet connection. The system also includes mechanisms for data replication and redundancy, ensuring that stored data remains available even in the event of hardware failures or network disruptions. The metadata service further optimizes performance by caching frequently accessed metadata and distributing the index across multiple nodes to balance the load and improve response times. This architecture supports seamless scaling, allowing the system to handle increasing data volumes without compromising performance.
6. A non-transitory computer readable storage medium comprising processor instructions for deduplicating data on a distributed file system, the instructions comprising: transmitting a write request from a client to a metadata server (“MDS”), wherein the write request comprises an object identifier associated with a data object, wherein the MDS maintains metadata identifying locations of data objects stored in object stores included in the distributed file system; receiving an object store location for an object store from the MDS and a first object designator assigned to the data object by the MDS, wherein the object store is separate from the MDS and wherein the object store stores data objects, wherein both the first object designator and the object identifier uniquely identify the data object and wherein the MDS server maps object designators to object identifiers; deduplication the data object by: transmitting a metadata request to the object store using the object store location, wherein the metadata request includes the object identifier; receiving a metadata response from the object store; determining whether the metadata response contains a second object designator; transmitting a commit request to the MDS that includes the second object designator in response to determining the metadata response contains the second object designator, wherein the second object designator allows a number of instances of the data object in the distributed file system to be determined; and transmitting the data object that includes the first object designator to the object store in response to determining the metadata response does not contain any object designator and transmitting a commit request to the MDS that includes the first object designator.
A distributed file system may suffer from redundant storage of identical data objects, consuming unnecessary storage space and bandwidth. This invention provides a method for deduplicating data objects in such a system by leveraging metadata servers and object stores. The system includes a metadata server (MDS) that tracks the locations of data objects across multiple object stores and assigns unique object designators to them. When a client sends a write request for a data object, the MDS provides the object store location and a first object designator for the data. The client then checks the object store for existing copies of the data by sending a metadata request with the object identifier. If the object store responds with a second object designator, indicating a duplicate exists, the client updates the MDS with this designator to track the number of instances. If no duplicate is found, the client stores the data object in the object store and updates the MDS with the first designator. This ensures efficient deduplication by preventing redundant storage while maintaining accurate metadata tracking. The system optimizes storage by reducing duplicate data objects across the distributed file system.
7. The non-transitory computer readable storage medium of claim 6 , wherein the metadata request is a HEAD request.
A system and method for optimizing data retrieval in distributed storage networks addresses inefficiencies in metadata handling, particularly in scenarios where clients frequently request metadata without needing the actual data payload. The invention involves a storage server configured to process metadata requests, such as HEAD requests, which are used to retrieve only metadata without transferring the associated data content. The server includes a request handler that identifies the type of request (e.g., HEAD) and routes it to a metadata processor. The metadata processor accesses a metadata database to retrieve the requested metadata, which may include file attributes, permissions, or other descriptive information. The system ensures that only the metadata is returned, reducing network bandwidth and processing overhead compared to traditional methods that transfer both metadata and data. The invention also includes a response generator that formats the metadata into a standardized response structure, such as an HTTP response, for transmission to the client. This approach is particularly useful in cloud storage, content delivery networks, and distributed file systems where metadata queries are common but full data transfers are unnecessary. The solution improves efficiency by minimizing unnecessary data transfers and optimizing server resources.
8. The non-transitory computer readable storage medium of claim 6 , further comprising transmitting, when the metadata response does not include any object designator, the commit request that includes the first object designator to the MDS after transmitting the data object to the object store.
This invention relates to distributed storage systems, specifically addressing the challenge of efficiently managing and retrieving metadata for data objects in a distributed environment. The system involves a metadata server (MDS) that stores metadata for data objects, including object designators that uniquely identify the objects. When a client requests metadata for a data object, the MDS may not immediately have the object designator, requiring the client to transmit a commit request that includes the object designator after the data object is stored in an object store. This ensures that the MDS can properly associate the metadata with the correct data object, even if the designator was not initially available. The system also handles cases where the metadata response includes the object designator, allowing the client to proceed without additional steps. The invention improves the reliability and efficiency of metadata management in distributed storage systems by ensuring proper synchronization between the object store and the metadata server.
9. The non-transitory computer readable storage medium of claim 6 , wherein the first and second object designators uniquely identifies the data object.
This invention relates to a computer-implemented system for managing data objects, specifically addressing the challenge of uniquely identifying and distinguishing data objects within a storage medium. The system involves a non-transitory computer-readable storage medium that stores executable instructions for processing data objects. A key feature is the use of first and second object designators, which together uniquely identify a data object. These designators ensure that each data object can be distinctly recognized within the system, preventing conflicts or ambiguities in data retrieval and manipulation. The storage medium also includes instructions for generating, storing, and retrieving these designators, enabling efficient data management. The designators may be based on attributes such as object type, creation time, or other metadata, ensuring robustness in identification. This approach enhances data integrity and accessibility in systems where multiple objects may share similar characteristics or names. The invention is particularly useful in large-scale data storage environments where precise object identification is critical for operations like version control, backup, and retrieval. By using two distinct designators, the system provides redundancy and reliability in object identification, reducing the risk of errors in data handling.
10. The non-transitory computer readable storage medium of claim 6 , wherein the object store is a cloud object store.
A system and method for managing data storage and retrieval in a distributed computing environment addresses inefficiencies in traditional storage architectures, particularly in handling large-scale data operations. The invention provides a non-transitory computer-readable storage medium containing instructions that, when executed, perform operations to optimize data storage and retrieval. The system includes an object store, which is a cloud-based storage solution, designed to handle unstructured data efficiently. The object store is configured to store data objects, each associated with metadata that describes the object's attributes, such as size, type, and access permissions. The system further includes a data processing module that interacts with the object store to perform operations like storing, retrieving, and managing data objects. The data processing module may also include functionality for data compression, encryption, and indexing to enhance performance and security. The cloud-based object store leverages distributed storage resources to provide scalability, durability, and high availability, ensuring that data is accessible and reliable across different geographic locations. The system may also include a user interface or API for users to interact with the object store, allowing them to upload, download, and manage data objects seamlessly. The invention aims to improve data management in cloud environments by providing a flexible, scalable, and efficient storage solution.
11. A system for deduplicating data on a distributed file system, the system comprising a non-transitory computer readable medium and processor enabled to execute instructions for: transmitting a write request from a client to a metadata server (“MDS”), wherein the write request comprises an object identifier associated with a data object, wherein the MDS maintains metadata identifying locations of data objects stored in object stores included in the distributed file system; receiving an object store location for an object store from the MDS and a first object designator assigned to the data object by the MDS, wherein the object store is separate from the MDS and wherein the object store stores data objects, wherein both the first object designator and the object identifier uniquely identify the data object and wherein the MDS server maps object designators to object identifiers; deduplication the data object by: transmitting a metadata request to the object store using the object store location, wherein the metadata request includes the object identifier; receiving a metadata response from the object store; determining whether the metadata response contains a second object designator; transmitting a commit request to the MDS that includes the second object designator in response to determining the metadata response contains the second object designator, wherein the second object designator allows a number of instances of the data object in the distributed file system to be determined; and transmitting the data object that includes the first object designator to the object store in response to determining the metadata response does not contain any object designator and transmitting a commit request to the MDS that includes the first object designator.
A system for deduplicating data in a distributed file system addresses the challenge of efficiently storing and managing redundant data across multiple storage nodes. The system includes a metadata server (MDS) that maintains metadata mapping object identifiers to object designators and tracks the locations of data objects in object stores. When a client sends a write request for a data object, the MDS assigns a first object designator and provides the location of an object store where the data object should be stored. The client then checks the object store for existing copies of the data object by sending a metadata request with the object identifier. If the object store responds with a second object designator, indicating an existing copy, the client notifies the MDS to update its mapping, preventing redundant storage. If no existing copy is found, the client stores the data object in the object store and updates the MDS with the first object designator. This process ensures that only unique data objects are stored, reducing storage overhead while maintaining data consistency across the distributed system. The system leverages the MDS for centralized metadata management and object stores for decentralized data storage, optimizing deduplication efficiency in large-scale distributed environments.
12. The system of claim 11 , wherein the metadata request is a HEAD request.
A system for managing data requests in a networked environment addresses the challenge of efficiently retrieving metadata without transferring the entire data payload. The system includes a server configured to receive and process metadata requests from client devices. These requests are specifically designed to request only metadata associated with a data resource, such as file attributes, headers, or status information, rather than the full content. The server processes the metadata request by extracting the relevant metadata from the data resource and transmitting it back to the client without delivering the entire resource. This approach optimizes network bandwidth and reduces latency by avoiding unnecessary data transfers. The system further includes a client device configured to generate and send metadata requests to the server, as well as receive and interpret the metadata responses. The metadata request is implemented as a HEAD request, a standard HTTP method used to retrieve header information from a server without downloading the body of the resource. This method is particularly useful for applications requiring quick status checks or resource validation without full data retrieval. The system ensures efficient metadata handling, improving performance in scenarios where only metadata is needed, such as in web caching, content delivery networks, or distributed storage systems.
13. The system of claim 11 , further comprising transmitting, when the metadata response does not include any object designator, the commit request that includes the first object designator to the MDS after transmitting the data object to the object store.
A system for managing data objects in a distributed storage environment addresses the challenge of efficiently tracking and retrieving data objects across multiple storage nodes. The system includes a metadata service (MDS) that maintains metadata for data objects stored in an object store, where each data object is associated with a unique object designator. The system receives a commit request for a data object, which includes a first object designator, and checks whether the metadata response from the MDS includes any object designator for the data object. If the metadata response does not include an object designator, the system transmits the commit request, including the first object designator, to the MDS after the data object has been transmitted to the object store. This ensures that the MDS can properly associate the data object with its designator, even if the initial metadata lookup fails. The system may also handle scenarios where the metadata response includes an object designator, in which case the commit request is transmitted to the MDS before the data object is sent to the object store. The system optimizes data consistency and retrieval by ensuring that metadata and data objects are properly synchronized in the distributed storage environment.
14. The system of claim 11 , wherein the first and second object designators uniquely identifies the data object.
A system for managing data objects in a computing environment addresses the challenge of efficiently tracking and accessing data objects within a distributed or complex system. The system includes a data storage component that stores multiple data objects, each associated with a unique identifier. A processing unit interacts with the storage component to retrieve, modify, or manage these objects based on their identifiers. The system further includes a communication interface for transmitting and receiving data objects between different system components or external systems. A user interface allows users to interact with the system, such as querying or updating data objects. The system ensures data integrity and consistency by enforcing unique identification rules for each data object, preventing conflicts or errors during operations. The unique identifiers enable precise tracking and retrieval of objects, improving system reliability and performance. This approach is particularly useful in environments where multiple users or processes access shared data, ensuring accurate and conflict-free operations. The system may also include error-handling mechanisms to detect and resolve issues related to object identification or access. By maintaining a clear and unique mapping between identifiers and data objects, the system enhances data management efficiency and reduces operational overhead.
15. The system of claim 11 , wherein the object store is a cloud object store.
A system for managing data storage includes a cloud-based object store that provides scalable and durable storage for digital objects. The object store is designed to handle large volumes of unstructured data, such as files, images, or multimedia content, with high availability and redundancy. The system integrates with a distributed computing framework to process and retrieve stored objects efficiently, leveraging cloud infrastructure for dynamic scaling and cost-effective storage. The cloud object store supports versioning, access control, and metadata management, ensuring data integrity and security. Additionally, the system may include a caching mechanism to optimize performance by reducing latency for frequently accessed objects. The cloud-based architecture allows for seamless integration with other cloud services, enabling advanced analytics, backup, and disaster recovery capabilities. This approach addresses the challenges of managing large-scale data storage in a flexible, scalable, and cost-efficient manner.
Unknown
October 13, 2020
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.