A machine learning-based service processes requests using compressed and complete models to provide faster response times for servicing requests to process data. Initially, a host processes data using a compressed model that is stored in the host's memory and then switches to a larger, more accurate complete model after it is loaded into the host's memory. A host of the machine-learning based service may receive one or more requests to process data. In response, the host uses a compressed version of a model to begin processing the data. The host starts loading the complete version of the model into the host's memory. When the complete version of the model is loaded into memory, the host switches to process a remaining portion of the data using the complete version of the model.
Legal claims defining the scope of protection, as filed with the USPTO.
14. The method as recited in claim 6, wherein the request indicates the model.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
March 5, 2019
September 13, 2022
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.