Example aspects include techniques for provisioning downstream access to requested data within a data lake with cell-level granularity. These techniques include receiving a request for downstream access to filtered data from a data lake, generating a logical view to the data lake based on the request, the logical view restricted to the filtered data, and generating a temporary storage location for storing retrieved data received from the data lake via the logical view. The techniques also include assigning a compute cluster to the logical view, and accessing, via the logical view, by the compute cluster, the filtered data including storing the filtered data within the temporary storage location.
Legal claims defining the scope of protection, as filed with the USPTO.
. A device comprising:
. The device of, wherein the at least one processor is configured to:
. The device of, wherein to generate the logical view for the data lake based on the request, the at least one processor coupled with the memory and configured to execute the instructions to:
. The device of, wherein to assign the compute cluster to the logical view, the at least one processor coupled with the memory and configured to execute the instructions to:
. The device of, wherein to assign the compute cluster to the logical view, the at least one processor coupled with the memory and configured to execute the instructions to:
. The device of, wherein to assign the compute cluster to the logical view, the at least one processor coupled with the memory and configured to execute the instructions to:
. The device of, wherein the request is a first request, and the at least one processor coupled with the memory and configured to execute the instructions to:
. The device of, wherein the at least one processor coupled with the memory and configured to execute the instructions to:
. A method comprising:
. The method of, further comprising:
. The method of, wherein generating the logical view for the data lake based on the request, comprises:
. The method of, wherein assigning the compute cluster to the logical view, comprises:
. The method of, wherein assigning the compute cluster to the logical view, comprises:
. The method of, wherein assigning the compute cluster to the logical view, comprises:
. The method of, wherein the request is a first request, and further comprising:
. The method of, further comprising:
. A non-transitory computer-readable device having instructions thereon that, when executed by at least one computing device, causes the at least one computing device to perform operations comprising:
. The non-transitory computer-readable device of, wherein the operations further comprise:
. The non-transitory computer-readable device of, wherein generating the logical view for the data lake based on the request, comprises:
. The non-transitory computer-readable device of, wherein assigning the compute cluster to the logical view, comprises:
Complete technical specification and implementation details from the patent document.
The present application for patent is a Continuation of application Ser. No. 18/328,241, entitled “ACCESS PROVISIONING FRAMEWORK WITH CELL-LEVEL SECURITY CONTROL,” filed Jun. 2, 2023, which is assigned to the assignee hereof and hereby expressly incorporated by reference herein for all purposes.
Most applications incorporate a data layer for storing information and providing the information to users and/or services. For example, many applications include database management systems for data persistence. Access to database management systems has traditionally been achieved by assigning one or more access privileges to users. As an example, a big data platform may employ file level security techniques where access is granted on a file level. In particular, a user, a process, or an application may access a file based on the privileges granted to the user, the process, or the application. However, modern data-engineering requirements have placed a greater importance on data security, but often provide overprivileged permissions that may lead to unauthorized data access.
The following presents a simplified summary of one or more implementations of the present disclosure in order to provide a basic understanding of such implementations. This summary is not an extensive overview of all contemplated implementations, and is intended to neither identify key or critical elements of all implementations nor delineate the scope of any or all implementations. Its sole purpose is to present some concepts of one or more implementations of the present disclosure in a simplified form as a prelude to the more detailed description that is presented later.
In some aspects, the techniques described herein relate to a device including: a memory storing instructions; and at least one processor coupled with the memory and configured to execute the instructions to: receive a request for downstream access to filtered data from a data lake; generate a logical view to the data lake based on the request, the logical view restricted to the filtered data; generate a temporary storage location for storing retrieved data received from the data lake via the logical view; assign a compute cluster to the logical view; and access, via the logical view, by the compute cluster, the filtered data including storing the filtered data within the temporary storage location.
In some aspects, the techniques described herein relate to a method including: receiving a request for downstream access to filtered data from a data lake; generating a logical view to the data lake based on the request, the logical view restricted to the filtered data; generating a temporary storage location for storing retrieved data received from the data lake via the logical view; assigning a compute cluster to the logical view; and accessing, via the logical view, by the compute cluster, the filtered data including storing the filtered data within the temporary storage location.
In some aspects, the techniques described herein relate to a non-transitory computer-readable device having instructions thereon that, when executed by at least one computing device, causes the at least one computing device to perform operations including: receiving a request for downstream access to filtered data from a data lake; generating a logical view to the data lake based on the request, the logical view restricted to the filtered data; generating a temporary storage location for storing retrieved data received from the data lake via the logical view; assigning a compute cluster to the logical view; and accessing, via the logical view, by the compute cluster, the filtered data including storing the filtered data within the temporary storage location.
Additional advantages and novel features relating to implementations of the present disclosure will be set forth in part in the description that follows, and in part will become more apparent to those skilled in the art upon examination of the following or upon learning by practice thereof.
The detailed description set forth below in connection with the appended drawings is intended as a description of various configurations and is not intended to represent the only configurations in which the concepts described herein may be practiced. The detailed description includes specific details for the purpose of providing a thorough understanding of various concepts. However, it will be apparent to those skilled in the art that these concepts may be practiced without these specific details. In some instances, well-known components are shown in block diagram form in order to avoid obscuring such concepts.
This disclosure describes techniques for implementing an access provisioning framework with cell-level security control. With the increase in data engineering applications, data is now a critical strategic asset that should be securely shared with customers and partners. For example, some business intelligence applications require high granularity. However, in a decoupled storage-compute architecture and with storage providing only file level security controls, enabling this level of security via storage only techniques produce inefficient solutions.
Aspects of the present disclosure provision downstream access to requested data within a data store with cell-level granularity. In particular, e.g., a cloud computing system generates a restricted view for accessing a limited set of data of a data lake based on a downstream request, reduces usage of the view to a specific cluster, and limits job execution via the cluster to a particular group of identities identified within the request. Accordingly, the present techniques inherently provide least privilege access at a cell level granularity within decoupled storage-compute architecture.
In particular, based on a role of a user, a computing system maps the user to one or more data usage scenarios and column(s) accessible by the data usage scenarios as defined by a data policy. Further, the computing system stores row level and column level security access permissions within a database, and dynamically updates the row level and column level security access permissions in response to changes within the computing system. Accordingly, the present techniques inherently provide least privilege access at a column level granularity, while reducing access leakage due to outdated access control information.
is a diagram showing an example of a cloud computing system, in accordance with some aspects of the present disclosure.
As illustrated in, the cloud computing systemincludes a cloud computing platform, a plurality of source devices()-(), a plurality of administrator devices()-(), and a plurality of client devices()-(). The cloud computing platformmay provide the source devices()-() and the client devices()-() with distributed storage and access to software, services, files, and/or data via a communications network, e.g., the Internet, intranet, etc. Some examples of the source devices()-(), the administrator devices()-(), and the client devices()-() include smartphone devices and computing devices, Internet of Things (IoT) devices, drones, robots, process automation equipment, sensors, control devices, vehicles, transportation equipment, tactile interaction equipment, virtual and augmented reality (VR and AR) devices, industrial machines, etc. Further, in some aspects, the source devices()-(), the administrator devices()-(), and the client devices()-() include one or more applications configured to interface with the cloud computing platform.
In some aspects, the cloud computing platform is a multi-tenant environment that provides the client devices()-() with distributed storage and access to software, services, files, and/or data via the one or more network(s)()-(). In a multi-tenant environment, one or more system resources of the cloud computing platformare shared among tenants but individual data associated with each tenant is logically separated. As illustrated in, the cloud computing platformmay further include a plurality of services()-() and a plurality of resources()-(). Some examples of a serviceinclude infrastructure as a service (IaaS), platform as a service (PaaS), software as a service (SaaS), database as a service (DaaS), security as a service (SECaaS, big data as a service (BDaaS), a monitoring as a service (MaaS), logging as a service (LaaS), internet of things as a service (IOTaaS), identity as a service (IDaaS), analytics as a service (AaaS), function as a service (FaaS), and/or coding as a service (CaaS). Some examples of the resources()-() may include computing units, bandwidth, data storage, application gateways, software load balancers, memory, field programmable gate arrays (FPGAs), graphics processing units (GPUs), input-output (I/O) throughput, or data/instruction cache.
Further, the cloud computing platformmay include a management module, a data module, and one or more clusters. As described herein, in some aspects, the data modulemay store data resulting from activity by the source devices()-() and/or the services()-() within a data storeof the data module. In some aspects, the data storeincludes one or more data lakes. As used herein, in some aspects, a “data lake” refers a single, centralized repository that stores both structured and unstructured data. In some aspects, a data lake enables the client devices()-() to quickly and easily store and access a wide variety of data in a single location. In some aspects, the data lake stores data in its raw or native format, usually as files or as binary large objects (blobs). In some aspects, the data lake includes database objects. For example, a database object may include a database table that organizes data in columns and rows. Each row represents a unique record, and each column represents a field within the record. For example, a table of contact addresses may include a row for each person and attributes (i.e., columns) for first name, last name, street address, city, state, and/or zip code. Further, the data modulereceives requests for data stored within the data store, and transmits responses including data stored within the data storein response to the requests. In some aspects, a “cluster” in a cloud computing environment may refer to a group of interconnected servers or virtual machines that work together to perform tasks and provide resources as a single, cohesive unit.
Further, in some aspects, the client devicesemploy the servicesto analyze the data of the data store. As an example, in some aspects, the client devicesperform business intelligence operations, big data operations, and/or analytic operations over the data storeusing the one or more services. In particular, in some aspects, the business intelligence operations, big data operations, and/or analytic operations are performed over source datareceived from the source devices()-() and/or source datagenerated in response to activity performed by the source devices()-().
The management moduleimplements an access provisioning framework that automates data access with row and column level security over data modulestypically configured to employ file level access. As illustrated in, the management moduleincludes a data management module (MM), a cluster management module (MM), and an access control management module (MM). The data management moduleconfigures secure access to entities within the data store. For example, the management modulereceives a configuration requestfrom an administrator devicesfor downstream access by one or more client devicesto data within the data store. In some aspects, the configuration requestidentifies the requested entities to be accessed within the data storeand the one or more identities (e.g., user accounts, jobs, applications, and/or client devices) that will receive access to the entities. Further, in some aspects, the configuration requestis generated via a graphical user interface (GUI) provided by the cloud computing platformto the administrator device.
In response, the data management modulegenerates logical views()-() for limited access to the requested data. In some aspects, the logical viewprovides cell-level access to the requested data. Further, the cell-level access is implemented via table access control, cluster access control, cluster visibility control, and/or job access control. For example, in some aspects, the data management moduleprovides row-level security via one or more filter operations defined to limit the rows provided via the view and provides column-level security via the requested attributes of the requested data to achieve cell-level access. In addition, the data management modulecreates a temporary storage locationfor storing data retrieved via the logical viewand a staging folderfor storing query scripts (e.g., SQL scripts) employed during jobs over the logical view. As another example, in some aspects, the data access of members of the user group is configured so that the members are only able to access a clusterand/or have visibility to a clustercorresponding to a logical viewlimited to one or more particular cells. For example, the requested data is stored within the temporary storage location, and accessed by the one or more client devicesdownstream from the temporary storage location. Additionally, the data management modulemanages metadatacorresponding to the data store. For example, the data management moduletracks the details of the entities of the data storeand generates the metadatareflecting the details of the entities of the data store.
The cluster management modulegenerates and/or assigns the one or more clusters()-(). As used herein, in some aspects, a “cluster” (compute cluster) refers to a set of computation resources and configurations on which run data engineering, data science, and data analytics workloads, such as production ETL pipelines, streaming analytics, ad-hoc analytics, and machine learning. Further, as described herein, each clusteris assigned to a logical view. In particular, each clusteris limited to accessing a single logical viewgenerated by the data management module, and writes the data of the logical viewto the temporary storage locationassociated with the logical view.
In some aspects, the cluster management moduledetermines the type of clusterto generate for a logical viewbased on one or more characteristics of the requested data associated with the viewwithin the configuration request. For example, the cluster management moduleidentifies the size of the entities associated with the configuration requestwithin the metadata, determines a static weight for each entity based upon the size of the entity, sums the static weights to determine to the total weight value of the entities, calculates the request complexity based on the total weight value (e.g., multiply the total weight value by a complexity multiplier), and compares to the request complexity to one or more predefined threshold values to determine the type of cluster to generate for a configuration request. Once the cluster type is determined, the cluster management moduleassigns the clusterto the view generated for a configuration request. Further, in some aspects, the cluster management modulereconfigures the assigned clusteror reassigns the logical viewto a new clusterin response to recalculating the request complexity in response to changes in the size of the corresponding entities and the recalculated request complexity being greater than or less than the previously-calculated request complexity.
The access control management modulemanages access within the cloud computing platform. For example, the access control management moduleconfigures access to a logical viewby the clusterassigned to the logical viewby the cluster management module. For instance, the access control management modulecreates, edits, and removes user informationthat defines permissions for accessing the requested data of a configuration request. For example, the access control management modulecreates a user group (e.g. access control security group) within user informationfor the one or more user accounts, jobs, applications, and/or client devicesidentified within the configuration request. Further, the access control management moduleprovides the members of the user group with one or more privileges for employing the clusterto access the requested data within the temporary storage locationvia the logical view, which retrieves the requested data from the data storeand stores the requested data within the temporary storage location. In addition, the privileges provided to the members of the user group are dynamically modified in response to updates to the user group. For example, upon removal of an account from a user group granted access to a particular view, the account automatically loses access to the particular view. As another example, upon addition of an account to a user group granted access to a particular view, the account automatically gains access to the particular view. Further, any changes to the view also cause updates to the data access privileges of the members of the user group.
In some aspects, once the management modulehas provisioned access in response to the configuration request, the cloud computing platformsecurely executes jobs over the requested data within the storage locationin response to requestsreceived from the client devicesassociated with the accounts identified within the configuration request, and transmits responsesincluding the results of the jobs to the client devices. As used herein, in some aspects, a “job’ refers to a plurality of computation units. Further, in some aspects, a job defines, schedules, monitors, and controls operations performed by a cluster.
Further, in some aspects, the management moduledeprovisions access to previously-requested data. For example, in some aspects, the provisioned access expires based upon a predefined time period. In some other examples, the management modulereceives a deprovisioning request from an administrator devicesto deprovisions access to previously-requested data. In response to a deprovisioning request, the cluster management moduleunassigns the cluster, the data management moduledeletes the storage locationand staging folder, and the access control management moduleremoves permissions assigned to the clusterand the identities granted access to the cluster.
illustrates is a flow diagram illustrating an example method for implementing downstream access with cell-level granularity within the cloud computing system, in accordance with some aspects of the present disclosure. As illustrated in, in some aspects, at step, the management modulereceives a downstream request from an administrator device. If the request is for new downstream access, at step, the management moduleperforms the following cloud management tasks: create a service principal identifier (SPN) that is used to identify the one or more user accounts, applications, and/or client devicesidentified within a configuration requestto receive the requested downstream access, create security information within a protected store (e.g., cryptographic keys, security policies, certificates, hardware secrets, and passwords within a key vault), add the service principal identifier to a security group, create a staging folder, and generate permissions for the scripts stored within the staging folderto have the requested downstream access. Next, at step, the management moduleperforms the following cloud environment tasks: generate a clusterfor providing downstream access, create a data store group corresponding to the service principal, add one or more accounts to the data store group, and assign permissions to the data store group and cluster. Next, at step, the management moduleperforms the following data tasks: create a downstream database for the requested data, create a downstream folder (e.g., a storage location) for the requested data, create a raw table within the data storefor the source dataassociated with the configuration request, create a logical viewcorresponding to the requested data, create a writeback table for the requested data, and/or grant permission to the database, tables, and logical view.
If the request is for updating the source devices()-() and/or data store objects (e.g., tables, columns, filters) to read from the data store, at step, the management moduleperforms the following data management tasks: create a downstream database for the requested data, create downstream folders for the requested data, create a raw table within the data store, create a view corresponding to the requested data, create a writeback table for the requested data, grant permission to the database, tables, and logical view.
If the request is to remove downstream access, at step, the management moduleperforms the following deconstruction tasks: delete the downstream database, security group privileges, service principal identifier privileges, and cluster privileges, remove the staging folder, and delete the service principal identifier and security information. (e.g., key vault information)
The described processes inbelow are illustrated as a collection of blocks in a logical flow graph, which represent a sequence of operations that can be implemented in hardware, software, or a combination thereof. In the context of software, the blocks represent computer-executable instructions stored on one or more computer-readable storage media that, when executed by one or more processors, perform the recited operations. Generally, computer-executable instructions include routines, programs, objects, components, data structures, and the like that perform particular functions or implement particular abstract data types. The order in which the operations are described is not intended to be construed as a limitation, and any number of the described blocks can be combined in any order and/or in parallel to implement the processes. The operations described herein may, but need not, be implemented using the cloud computing platform. By way of example and not limitation, the methodis described in the context of. For example, the operations may be performed by the services()-(), the resources()-(), the management module, the data management module, the cluster management module, the access control management module, the data module, and/or the clusters.
is a flow diagram illustrating an example methodfor provisioning downstream access to requested data within a data lake with cell-level granularity, in accordance with some aspects of the present disclosure.
At block, the methodmay include receiving a request for downstream access to filtered data from a data lake. For example, the management modulereceives a configuration requestfor providing one or more users access to a plurality cells of a plurality of tables of a data storevia an analytics application.
Accordingly, the cloud computing platform, the cloud computing device, and/or the processorexecuting the management modulemay provide means for receiving a request for downstream access to filtered data from a data lake.
At block, the methodmay include generating a logical view to the data lake based on the request, the logical view restricted to the filtered data. For example, the management modulegenerates a logical viewfor accessing the plurality of the cells of the plurality of tables. In some aspects, the logical viewis defined using a select statement that identifies particular attributes with a filter operation (e.g., a WHERE clause in SQL) in order provide cell-level security.
Accordingly, the cloud computing platform, the cloud computing device, the management module, and/or the processorexecuting the data management modulemay provide means for generating a logical view to the data lake based on the request, the logical view restricted to the filtered data.
At block, the methodmay include generating a temporary storage location for storing retrieved data received from the data lake via the logical view. For example, the management modulegenerates a storage locationfor storing data retrieved using the logical view.
Accordingly, the cloud computing platform, the cloud computing device, the management module, and/or the processorexecuting the data management modulemay provide means for generating a temporary storage location for storing retrieved data received from the data lake via the logical view.
At block, the methodmay include assigning a compute cluster to the logical view. For example, the management modulegenerates a clusterand assigns the cluster to the logical view. In some aspects, the type of cluster is determined based on the one or more entities identified within the configuration request. Further, in some instances, assigning the clusterto the logical viewincludes exclusively providing privileges for the clusterto access the logical view, while otherwise denying the clusteraccess to the data storeor any other logical views.
Accordingly, the cloud computing platform, the cloud computing device, the management module, and/or the processorexecuting the cluster management modulemay provide means for assigning a compute cluster to the logical view.
At block, the methodmay include generating, based on the request, a user group for a downstream organization, the user group providing read-only access to the temporary storage location. For example, the management modulecreates a security group for the one or more one or more users and analytics application Further, the security group is providing read-only access to the storage locationassociated with the configuration request.
Accordingly, the cloud computing platform, the cloud computing device, the management module, and/or the processorexecuting the access control management module may provide means for generating, based on the request, a user group for a downstream organization, the user group providing read-only access to the temporary storage location.
At block, the methodmay include accessing, via the logical view, by the compute cluster, the filtered data and storing the filtered data within the temporary storage location. For example, in some aspects, the clusterreceives a requestto execute a job over the logical view, retrieves the data associated with the job via the logical view, and stores the data within the temporary storage location. Further, the clusterexecutes the job and transmits a responseto the client deviceincluding the results to the job.
Accordingly, the cloud computing platform, the cloud computing device, the management module, and/or the processorexecuting clustermay provide means for accessing, via the logical view, by the compute cluster, the filtered data and storing the filtered data within the temporary storage location.
In additional aspect, the methodincludes receiving from an application associated with the user group, a request for the filtered data within the temporary storage location; and transmitting the filtered data to the application associated with the user group. Accordingly, the cloud computing platform, the cloud computing device, and/or the processorexecuting the management modulemay provide means for receiving, from an application associated with the user group, a request for the filtered data within the temporary storage location; and transmitting the filtered data to the application associated with the user group.
In additional aspect, the methodincludes wherein generating the logical view for the data lake based on the request comprises generating the logical view to provide row-level access and column-level access to the data lake. Accordingly, the cloud computing platform, the cloud computing device, the management module, and/or the processorexecuting the data management modulemay provide means for generating the logical view to provide row-level access and column-level access to the data lake.
In additional aspect, the methodincludes wherein assigning the compute cluster to the logical view, comprises limiting access and visibility of the compute cluster to the user group; and limiting data access to the logical view to the compute cluster. Accordingly, the cloud computing platform, the cloud computing device, the management module, and/or the processorexecuting the data management moduleand the cluster management modulemay provide means for limiting access and visibility of the compute cluster to the user group; and limiting data access to the logical view to the compute cluster.
In additional aspect, the methodincludes wherein assigning the compute cluster to the logical view, comprises determining a cluster type of the compute cluster based upon the request; and generating the compute cluster having the cluster type. Accordingly, the cloud computing platform, the cloud computing device, the management module, and/or the processorexecuting the cluster management modulemay provide means for determining a cluster type of the compute cluster based upon the request; and generating the compute cluster having the cluster type.
In additional aspect, the methodincludes wherein assigning the compute cluster to the logical view, comprises determining one or more entities corresponding to the filtered data; calculating a complexity score based upon an entity size of each entity of the one or more entities; determining a cluster type of the compute cluster based upon complexity score; and generating the compute cluster having the cluster type. Accordingly, the cloud computing platform, the cloud computing device, the management module, and/or the processorexecuting the cluster management modulemay provide means for determining one or more entities corresponding to the filtered data; calculating a complexity score based upon an entity size of each entity of the one or more entities; determining a cluster type of the compute cluster based upon complexity score; and generating the compute cluster having the cluster type.
In additional aspect, the methodincludes wherein the request is a first request, and further including: receiving a second request that modifies one or more entities identified within the first request; and resizing the compute cluster in response to second request. Accordingly, the cloud computing platform, the cloud computing device, the management module, and/or the processorexecuting the cluster management modulemay provide means for receiving a second request that modifies one or more entities identified within the first request; and resizing the compute cluster in response to second request.
In additional aspect, the methodincludes deleting the logical view, user group, the compute cluster and temporary storage location based upon an expiration of the request. Accordingly, the cloud computing platform, the cloud computing device, and/or the processorexecuting the management modulemay provide means for deleting the logical view, user group, the compute cluster and temporary storage location based upon an expiration of the request.
In additional aspect, the methodincludes wherein the data lake is a cloud-based centralized repository of structured and unstructured data. Accordingly, the cloud computing platform, the cloud computing device, and/or the processorexecuting the management modulemay provide means for wherein the data lake is a cloud-based centralized repository of structured and unstructured data.
While the operations are described as being implemented by one or more computing devices, in other examples various systems of computing devices may be employed. For instance, a system of multiple devices may be used to perform any of the operations noted above in conjunction with each other. For example, a car with an internal computing device along with a mobile computing device may be employed in conjunction to perform these operations.
Referring now to, a cloud computing device(e.g., cloud computing platform) in accordance with an implementation includes additional component details as compared to. In one example, the cloud computing deviceincludes the processorfor carrying out processing functions associated with one or more of components and functions described herein. The processorcan include a single or multiple set of processors or multi-core processors. Moreover, the processormay be implemented as an integrated processing system and/or a distributed processing system. In an example, the processorincludes, but is not limited to, any processor specially programmed as described herein, including a controller, microcontroller, a central processing unit (CPU), a graphics processing unit (GPU), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a system on chip (SoC), or other programmable logic or state machine. Further, the processorincludes other processing components such as one or more arithmetic logic units (ALUs), registers, or control units.
Unknown
October 9, 2025
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.