Methods, systems, apparatuses, and computer program products are described herein that enable execution of a large AI model on a memory-constrained target device that is communicatively connected to a parameter server, which stores a master copy of the AI model. The AI model may be dissected into smaller portions (e.g., layers or sub-layers), and each portion may be executed as efficiently as possible on the target device. After execution of one portion of the AI model is finished, another portion of the AI model may be downloaded and executed at the target device. To improve efficiency, the input samples may be divided into microbatches, and a plurality of microbatches executing in sequential order may form a minibatch. The size of the group of microbatches or minibatch can be adjusted to reduce the communication overhead. Multi-level parallel parameters reduction may be performed at the parameter server and the target device.
Legal claims defining the scope of protection, as filed with the USPTO.
3. The system of claim 2, wherein the weight updater is further configured to update the AI model with the average of the received gradients.
4. The system of claim 2, wherein the set of microbatches comprises a plurality of microbatches that are configured to be executed in sequential order, the set of microbatches forming a minibatch that comprises a number of samples per update for training of the AI model.
5. The system of claim 2, wherein the microbatch size is configurable based on a rate of executing the set of microbatches at the target device and a rate of communication between the target device and the parameter server.
7. The system of claim 1, wherein the transmitter is further configured to transmit another portion of the AI model to another target device; and the weight updater is further configured to receive gradients from the another target device to perform reduction of parameters for the another portion of the AI model.
11. The method of claim 9, wherein the set of microbatches comprises a plurality of microbatches that are configured to be executed in sequential order, the set of microbatches forming a minibatch that comprises a number of samples per update for training of the AI model.
12. The method of claim 9, wherein the microbatch size is configurable based on a rate of executing the set of microbatches at the target device and a rate of communication between the target device and the parameter server.
18. The computer program product of claim 16, wherein the set of microbatches comprises a plurality of microbatches that are configured to be executed in sequential order, the set of microbatches forming a minibatch that comprises a number of samples per update for training of the AI model.
19. The computer program product of claim 15, wherein the microbatch size is configurable based on a rate of executing the set of microbatches at the target device and a rate of communication between the target device and the parameter server.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
September 30, 2019
September 6, 2022
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.