An organization's system is configured to label a given document based on an on-cloud classification service, while maintaining confidentiality of the document's content from all entities external to the organization, including: (a) an encoder configured to receive the given document, and to create an embedding of the given document; (b) a deconvolution unit having a neural network, wherein weights of neurons within the neural network are defined relative to a key, the deconvolution unit is configured to receive the embedding, deconvolve the embedding, thereby to create a scrambled document which is then sent to the on-cloud classification service; (c) a pre-trained internal inference network, configured to: (i) receive from the on-cloud service a cloud-classification of the scrambled document, (ii) to also receive a copy of the embedding, and (ii) to identify, given the received cloud-classification and the embedding copy, a true label of the given document.
Legal claims defining the scope of protection, as filed with the USPTO.
. An organization's system configured to label a given document based on an on-cloud classification service, while maintaining confidentiality of the given document's content from all entities external to the organization, comprising:
. The system of, wherein said embedding is a reduced size of said given document, and wherein said scrambled document is of increased size compared to said embedding.
. The system of, wherein a type of said given document is selected from text, table, and image.
. The system of, wherein the internal inference network is a machine-learning network that is trained by: (i) a plurality of documents and respective true labels, and (ii) a plurality of respective cloud classifications resulting from submission each of the plurality of said documents, respectively, to a portion of the system that includes said encoder, said deconvolution unit, and said cloud classification service.
. The system of, wherein said key is periodically altered, and wherein said internal inference network is re-trained upon each key alteration.
. The system of, particularly adapted for labeling a text document, wherein:
. The system of, particularly adapted for labeling a given table-type document, wherein:
. The system of, wherein:
. A method enabling an organization to label a given document based on an on-cloud classification service, while maintaining confidentiality of the given document's content from all entities external to the organization, comprising:
. The method of, wherein said embedding is a reduced size of said document, and wherein said scrambled document is of increased size compared to said embedding.
. The method of, wherein a type of said given document is selected from text, table, and image.
. The method of, wherein the internal inference network is a machine-learning network that is trained by (i) a plurality of documents and respective true labels, and (ii) a plurality of cloud classifications resulting from said encoding, deconvolution, and transfer of same documents, respectively, through said cloud classification service.
. The method of, further comprising periodically altering said key, and further re-training said internal inference network upon each key alteration.
. A multi-organization system for commonly training a common on-cloud classification service by labeled given documents submitted from all organizations, while maintaining confidentiality of the documents' contents of each organization from all entities external to that organization, comprising:
. The multi-organization system of, wherein upon completion of the common training by labeled documents from all organizations, said common on-cloud classification service is ready to provide confidential documents classification to each said organizations.
. The multi-organization system of, wherein during real-time labeling of new documents, each organization's sub-system comprising:
. The system of, wherein said embedding is a reduced size of said new document, and wherein said scrambled image is of increased size compared to said embedding.
. The system of, wherein a type of said document is selected from text, table, and image.
. The system of, wherein the internal inference network of each organization is a machine-learning network that is trained by a plurality of documents and respective true labels, and a plurality of respective common cloud classification vectors resulting from said encoding, deconvolution, and submission to the common cloud classification service.
. The system of, wherein said key in each organization is periodically altered, and each organization's internal inference network is re-trained upon each key alteration.
. An organization's system configured to label a given document based on an on-cloud classification service, while maintaining confidentiality of the given document's content from all entities external to the organization, comprising:
. The system of, wherein said activations vector is a vector compressed relative to the entire activations created during the passage of the embedding through the deconvolution unit, and wherein said compression is performed by a second encoder.
. The system of, wherein said second encoder is a trained or untrained encoder.
. The system of, wherein said embedding is a reduced size of said given document, and wherein said scrambled document is of increased size compared to said embedding.
. The system of, wherein a type of said given document is selected from text, table, and image.
. The system of, wherein the internal inference network is a machine-learning network that is trained by: (i) a plurality of documents embeddings and respective true labels, (ii) said activations vectors, respectively, and (iii) a plurality of respective cloud classifications resulting from submission each of the plurality of said documents, respectively, to a portion of the system that includes said first encoder, said deconvolution unit, and said cloud classification service.
. The system of, particularly adapted for labeling a text document, wherein:
. The system of, particularly adapted for labeling a given table-type document, wherein:
. The system of, wherein:
. A method enabling an organization to label a given document based on an on-cloud classification service, while maintaining confidentiality of the given document's content from all entities external to the organization, comprising:
. The method of, wherein said activations vector is a vector compressed relative to the entire activations created during the passage of the embedding through the deconvolution unit, and wherein said compression is performed by a second encoder.
. The method of, wherein said embedding is a reduced size of said document, and wherein said scrambled document is of increased size compared to said embedding.
. The method of, wherein a type of said given document is selected from text, table, and image.
. The method of, wherein the internal inference network is a machine-learning network that is trained by (i) a plurality of documents and respective true labels, and (ii) a plurality of cloud classifications resulting from said encoding, deconvolution, and transfer of same documents, respectively, through said cloud classification service.
. A multi-organization system for commonly training a common on-cloud classification service by labeled given documents submitted from all organizations, while maintaining confidentiality of the documents' contents of each organization from all entities external to that organization, comprising:
. The multi-organization system of, wherein upon completion of the common training by labeled classification vectors from all organizations, said common on-cloud classification service is ready to provide confidential documents' classifications to each said organizations.
. The multi-organization system of, wherein during run-time labeling of new documents, each organization's sub-system comprising:
. The system of, wherein said on-cloud classification service, during training, further receives scrambled documents created by the deconvolution unit, and during run-time, the on-cloud classification service also further receives scrambled documents that are created by the deconvolution unit.
. The system of, wherein said embedding is a reduced size of said new document, and wherein said scrambled image is of increased size compared to said embedding.
. The system of, wherein a type of said document is selected from text, table, and image.
Complete technical specification and implementation details from the patent document.
This application is a continuation-in-part of International Application No. PCT/IL2022/051112 filed Oct. 20, 2022, which designated the U.S. and claims priority to IL 287685 filed Oct. 28, 2021, the entire contents of each of which are hereby incorporated by reference.
The invention relates, in general, to the field of machine learning services conducted between two parties. More particularly, the invention relates to a system structure enabling the provision of machine learning services from the cloud to organizations while maintaining the privacy of the interchanged material between the organization and the service provider.
Recent advances in cloud-based machine learning services (CMLS) capabilities have allowed individuals and organizations to access state-of-the-art algorithms that were only accessible to a few until recently. These capabilities, however, come with two significant risks: a) the leakage of sensitive data which is sent for processing by the cloud, and b) the ability of third parties to gain insight into the organization's data by analyzing the output of the cloud's machine learning model.
The term “organization” used herein, refers to any entity, business or otherwise, owned or operated by one or more people. This term should not limit the invention to any type or size of organization.
The accuracy of a machine learning system typically depends on the volume of training it has experienced, among other parameters. However, while many organizations need to classify their documents with high accuracy, their capability of using those high-accuracy machine learning systems that are publicly available in the cloud is limited, mainly due to privacy or secrecy regulations. They are therefore forced to develop and use in-house resources. For example, in many cases, a hospital requiring classifying its images between benign or malignant cannot utilize external resources (even those owned by other hospitals), given the requirement to strictly maintain its patients' data private. When used herein, the term “cloud” refers to a computing facility or service operated by an entity other than the client.
In another aspect, large cloud enterprises own or have access to a vast number (hundreds of millions, even billions) of documents (such as text documents or images). For example, Google Inc. has trained machine-learning systems using a considerable portion of the publicly available internet documents, resulting in a highly accurate classification system. Having such an incredible classification system, Google Inc., like other cloud enterprises, offers its classification capabilities and pre-trained models in the form of remote services over the cloud. To enjoy such services' capabilities, the customer must transfer its documents to the cloud before receiving a respective classification vector for each document sent.
However, as noted, many organizations cannot utilize these high-accuracy and pre-trained services offered over the cloud, given the requirement to strictly keep patients' or customers' privacy or their own commercial secrets confidential.
The prior art has offered three options to allow organizations to use machine learning services over the cloud while maintaining privacy:
It is an object of the invention to provide a system that enables sending documents to a cloud's machine learning service for classification while maintaining strict privacy and secrecy of the data throughout the entire process.
Another object of the invention is to apply said system's capability to various types of documents, such as image, text, or table-type documents.
It is still another object of the invention to provide a joint system enabling a plurality of separate organizations to train a common-cumulative cloud service and utilize this common service to provide document classification in a manner that each organization keeps its own data secret and private both to the cloud provider and to the other organizations sharing this joint service.
It is still another object of the invention to provide a system that operates in a real one-time pad configuration, where the key is randomly modified for each specific document sent to the cloud, while maintaining a high classification quality.
It is still another object of the invention to provide said system with a simple structure, high reliability, and ease of training.
Other objects and advantages of the invention become apparent as the description proceeds.
The invention relates to an organization's system configured to label a given document based on an on-cloud classification service, while maintaining confidentiality of the document's content from all entities external to the organization, comprising: (a) an encoder configured to receive the given document, and to create an embedding of the given document; (b) a deconvolution unit having a neural network, wherein weights of neurons within the neural network are defined relative to a key, said deconvolution unit being configured to receive said embedding, deconvolve the embedding, thereby to create a scrambled document which is then sent to the on-cloud classification service; (c) a pre-trained internal inference network, configured to: (i) receive from said on-cloud service a cloud-classification of said scrambled document, (ii) to also receive a copy of said embedding, and (ii) to infer, given said received cloud-classification and said embedding copy, a true label of the given document.
In an embodiment of the invention, the embedding is a reduced size of the given document, and the scrambled document is of increased size compared to said embedding.
In an embodiment of the invention, the type of said given document is selected from text, table, and image.
In an embodiment of the invention, the internal inference network is a machine-learning network trained by (a) a plurality of documents and respective true labels and (b) a plurality of respective cloud classifications resulting from the submission of the same documents, respectively, to a portion of the system that includes said encoder, said deconvolution unit, and said cloud classification service.
In an embodiment of the invention, the key is periodically altered, and the internal inference network is re-trained upon each key alteration.
The invention also relates to a method enabling an organization to label a given document based on an on-cloud classification service while maintaining the confidentiality of the given document's content from all entities external to the organization, comprising: (a) encoding said given document, resulting in an embedding of the given document; (b) deconvolving said embedding by use of a deconvolution unit comprising a neural network, wherein weights of neurons within the neural network are defined relative to a key, thereby to create a scrambled document, and sending the scrambled document to the on-cloud classification service; (c) using a pre-trained internal inference network to: (i) receive from said on-cloud service a cloud-classification of said scrambled document, (ii) to also receive a copy of said embedding, and (iii) to infer, given said received cloud-classification and said embedding copy, a true label of said given document.
In an embodiment of the invention, the embedding is a reduced size of the document, wherein the scrambled document is of increased size compared to said embedding.
In an embodiment of the invention, the type of the given document is selected from text, table, and image.
In an embodiment of the invention, the internal inference network is a machine-learning network trained by a plurality of documents and respective true labels and a plurality of respective cloud classifications resulting from the encoding, deconvolution, and submission to the cloud classification service.
In an embodiment of the invention, the method further comprises periodically altering the key and re-training the internal inference network upon each key alteration.
The invention also relates to a multi-organization system for commonly training a common on-cloud classification service by labeled given documents submitted from all organizations, while maintaining the confidentiality of the documents' contents of each organization from all entities external to that organization, comprising: (A) a training sub-system in each organization comprising: (a) an encoder configured to receive a given document, and to create an embedding of the given document; (b) a deconvolution unit having a neural network, wherein weights of neurons within the neural network are defined relative to a key, said deconvolution unit being configured to receive said embedding, deconvolve the embedding, thereby to create a scrambled document which is then sent for training to the common on-cloud classification service, together with the respective label of that given document.
In an embodiment of the invention, upon completion of the common training by labeled documents from all organizations, the common on-cloud classification service is ready to provide confidential document classifications to each said organizations.
In an embodiment of the invention, during real-time labeling of new documents, each organization's sub-system comprising: (a) an encoder configured to receive a new un-labeled document and to create an embedding of the new document; (b) a deconvolution unit having a neural network, wherein weights of neurons within the neural network are defined relative to a key, said deconvolution unit being configured to receive said embedding, deconvolve the embedding, thereby to create a scrambled document which is then sent to the on-cloud classification service; (c) a pre-trained internal inference network, configured to (i) receive from said on-cloud service a common cloud-classification vector of said scrambled document, (ii) to also receive a copy of said embedding, and (iii) to infer, given said received common cloud-classification vector and said embedding copy, a true label of said un-labeled document.
In an embodiment of the invention, the embedding is a reduced size of the new document, and the scrambled image is of increased size compared to said embedding.
In an embodiment of the invention, the document type is selected from text, table, and image.
In an embodiment of the invention, the internal inference network of each organization is a machine-learning network that is trained by (a) a plurality of documents and respective true labels and (b) a plurality of respective common cloud classification vectors resulting from the encoding, deconvolution, and transfer through the common cloud classification service.
In an embodiment of the invention, the key in each organization is periodically altered, and each organization's internal inference network is re-trained upon each key alteration.
In an embodiment of the invention, the system is adapted for labeling a text document, wherein: (a) said text document is separated into a plurality of sentences; (b) each sentence is inserted separately into said encoder as a given document; and (c) the pre-trained internal inference network infers a true label of each said sentences, respectively.
In an embodiment of the invention, the system is adapted for labeling a given table-type document, wherein: (a) the encoder has the form of a row/tuple to image converter; (b) the encoder receives at its input separately each row of said given table-type document; and (c) the pre-trained internal inference network infers a true label of each said rows, respectively.
In an embodiment of the invention: (a) additional documents, whose labels are known, respectively, are fed into the encoder in addition to the given document; (b) a concatenation unit is used to concatenate distinct embeddings created by the encoder for said given document and said additional documents, thereby to form a combined vector V; (c) the combined vector V is fed into the deconvolution unit; and (d) the pre-trained internal inference network, is configured to: (i) receive from the on-cloud service a cloud-classification of said scrambled document, (ii) to also receive a copy of the embedding, and a label of each of the additional documents; and (iii) to infer a true label of the given document based on the received cloud-classification, the labels of each the additional documents, and the embedding copy.
The invention also relates to an organization's system configured to label a given document based on an on-cloud classification service, while maintaining confidentiality of the given document's content from all entities external to the organization, comprising: (A) a first encoder configured to receive the given document, and to create an embedding of the given document; (B) a deconvolution unit having a neural network, wherein weights of neurons within the neural network are defined relative to a key, the deconvolution unit being configured to receive the embedding, deconvolve the embedding, thereby to create a scrambled document which is then sent to the on-cloud classification service; (C) a pre-trained internal inference network, configured to: (a) receive from the on-cloud service a cloud-classification of the scrambled document, (b) to also receive a copy of the embedding, (c) to also receive activations vector reflecting activations created at the deconvolution unit during transfer of the embedding through it, and (d) to infer, given the received cloud-classification, the embedding copy, and the activations vector, a true label of the given document; wherein the key is a unique key which is randomly generated for each document.
In an embodiment of the invention, the activations vector is a vector compressed relative to the entire activations created during the passage of the embedding through the deconvolution unit, and wherein the compression is performed by a second encoder.
In an embodiment of the invention, the second encoder is a trained or untrained encoder.
In an embodiment of the invention, the embedding is a reduced size of the given document, and wherein the scrambled document is of increased size compared to the embedding.
In an embodiment of the invention, a type of the given document is selected from text, table, and image.
In an embodiment of the invention, the internal inference network is a machine-learning network that is trained by: (i) a plurality of documents embeddings and and respective true labels, (ii) the activations vectors, respectively, and (iii) a plurality of respective cloud classifications resulting from submission each of the plurality of the documents, respectively, to a portion of the system that includes the first encoder, the deconvolution unit, and the cloud classification service.
The invention also relates to a method for enabling an organization to label a given document based on an on-cloud classification service, while maintaining confidentiality of the given document's content from all entities external to the organization, comprising: (A) encoding the given document, resulting in an embedding of the given document; (B) deconvolving the embedding by use of a deconvolution unit comprising a neural network, wherein weights of neurons within the neural network are defined relative to a key, thereby to create a scrambled document, and sending the scrambled document to the on-cloud classification service; and (C) using a pre-trained internal inference network to: (a) receive from the on-cloud service a cloud-classification of the scrambled document, (b) to also receive a copy of the embedding, (c) to also receive activations vector reflecting activations created at the deconvolution unit during transfer of the embedding through it, and (d) to infer, given the received cloud-classification, the embedding copy, and the activations vector, a true label of the given document; wherein the key is a unique key which is randomly generated for each document.
In an embodiment of the invention, the activations vector is a vector compressed relative to the entire activations created during the passage of the embedding through the deconvolution unit, and wherein the compression is performed by a second encoder.
In an embodiment of the invention, the embedding is a reduced size of the document, and wherein the scrambled document is of increased size compared to the embedding.
In an embodiment of the invention, a type of the given document is selected from text, table, and image.
In an embodiment of the invention, the internal inference network is a machine-learning network that is trained by (i) a plurality of documents and respective true labels, and (ii) a plurality of cloud classifications resulting from the encoding, deconvolution, and transfer of same documents, respectively, through the cloud classification service.
The invention also relates to a multi-organization system for commonly training a common on-cloud classification service by labeled given documents submitted from all organizations, while maintaining confidentiality of the documents' contents of each organization from all entities external to that organization, comprising: a training sub-system in each organization comprising: (a) a first encoder configured to receive a given document, and to create an embedding of the given document; (b) a deconvolution unit having a neural network, wherein weights of neurons within the neural network are defined relative to a key, the deconvolution unit being configured to receive the embedding, deconvolve the embedding, thereby to create an activations vector which is then sent for training to the common on-cloud classification service, together with the respective label of that given document; wherein the key is a unique key which is randomly generated for each document.
In an embodiment of the invention, upon completion of the common training by labeled classification vectors from all organizations, the common on-cloud classification service is ready to provide confidential documents' classifications to each said organizations.
In an embodiment of the invention, during run-time labeling of new documents, each organization's sub-system comprising: (A) a first encoder configured to receive a new un-labeled document, and to create an embedding of the new document; (B) a deconvolution unit having a neural network, wherein weights of neurons within the neural network are defined relative to the key, said deconvolution unit being configured to receive the embedding, deconvolve the embedding, thereby to create an activations vector which is then sent to the on-cloud classification service, which given the activations vector, returns the label of the document.
In an embodiment of the invention, the on-cloud classification service, during training, further receives scrambled documents created by the deconvolution unit, and during run-time, the on-cloud classification service also further receives scrambled documents that are created by the deconvolution unit.
In an embodiment of the invention, the embedding is a reduced size of the new document, and wherein the scrambled image is of increased size compared to the embedding.
Unknown
March 3, 2026
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.