In an approach for compressing a neural network, a processor receives a neural network, wherein the neural network has been trained on a set of training data. A processor receives a compression ratio. A processor compresses the neural network based on the compression ratio using an optimization model to solve for sparse weights. A processor re-trains the compressed neural network with the sparse weights. A processor outputs the re-trained neural network.
Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
2. The method of claim 1, wherein the compression ratio is a ratio of zero weights to nonzero weights that is based on an amount of memory storage available for running the re-trained neural network on a computing device.
A method for optimizing neural network compression involves adjusting the compression ratio of a neural network model based on available memory storage on a computing device. The compression ratio is defined as the ratio of zero weights to non-zero weights in the neural network. The method includes re-training the neural network to achieve a desired compression ratio by selectively setting weights to zero, thereby reducing the model's memory footprint. The re-training process ensures that the neural network maintains its performance while minimizing memory usage. The compression ratio is dynamically adjusted according to the available memory storage on the target computing device, allowing the neural network to be efficiently deployed on devices with varying memory constraints. This approach enables efficient neural network execution on resource-constrained devices by balancing model size and computational efficiency. The method may also include techniques for pruning weights, fine-tuning the network, and optimizing the network architecture to further enhance compression without significant performance degradation. The goal is to maximize memory efficiency while preserving the neural network's accuracy and functionality.
3. The method of claim 2, wherein the computing device is a mobile device.
A mobile device is used to implement a method for processing data. The method involves receiving input data, analyzing the input data to determine a set of parameters, and generating an output based on the determined parameters. The mobile device may include a processor and a memory storing instructions that, when executed by the processor, cause the device to perform the method. The input data may be received from a user interface, a sensor, or an external source. The analysis of the input data may involve applying one or more algorithms to extract relevant information. The output may be displayed on a screen, transmitted to another device, or stored for later use. The method may also include additional steps such as validating the input data, adjusting the parameters based on user feedback, or optimizing the output for specific conditions. The mobile device may further include communication modules to facilitate data exchange with other devices or networks. The method is designed to efficiently process data in real-time or near real-time, leveraging the computational and connectivity capabilities of the mobile device.
5. The method of claim 1, wherein re-training the compressed neural network with the sparse weights comprises minimizing to a pre-defined value an average squared difference between a label for an input of the neural network and final output of the neural network based on the set of sparse weights.
This invention relates to improving the performance of compressed neural networks by re-training them with sparse weights. Neural networks often require significant computational resources, so compression techniques are used to reduce their size and improve efficiency. However, compression can degrade accuracy. The invention addresses this by re-training the compressed network with sparse weights, ensuring the network maintains high accuracy while remaining efficient. The method involves re-training a compressed neural network by minimizing the average squared difference between the predicted output and the true label for a given input. This optimization process adjusts the sparse weights to reduce prediction errors. The re-training ensures that the compressed network performs similarly to the original, uncompressed model, balancing computational efficiency with accuracy. The technique is particularly useful in applications where neural networks must operate under resource constraints, such as edge devices or real-time systems. By fine-tuning the sparse weights, the method prevents accuracy loss that typically occurs during compression, making it suitable for deployment in environments where both performance and efficiency are critical. The approach leverages standard optimization techniques but applies them specifically to sparse weight configurations, ensuring the network remains lightweight while maintaining predictive power.
7. The computer program product of claim 6, wherein the compression ratio is a ratio of zero weights to nonzero weights that is based on an amount of memory storage available for running the re-trained neural network on a computing device.
This invention relates to optimizing neural network models for efficient deployment on computing devices with limited memory resources. The problem addressed is the high memory consumption of neural networks, particularly when deployed on edge devices or resource-constrained environments. The solution involves a method for compressing a neural network by reducing the number of nonzero weights while maintaining model performance. The compression ratio, defined as the ratio of zero weights to nonzero weights, is dynamically adjusted based on the available memory storage of the target computing device. This ensures the re-trained neural network fits within the device's memory constraints without excessive degradation in accuracy. The technique involves training the neural network to sparsify its weights, where a higher compression ratio corresponds to more aggressive weight pruning. The system evaluates the available memory of the target device and selects an appropriate compression ratio to balance memory efficiency and model accuracy. This approach enables efficient deployment of neural networks on devices with varying memory capacities, such as mobile devices, embedded systems, or IoT devices, where memory resources are limited. The method ensures the neural network remains functional and accurate while minimizing memory usage.
8. The computer program product of claim 6, wherein the computing device is a mobile device.
A system and method for optimizing data processing in a computing environment involves a computing device that receives a data request, processes the request using a predefined algorithm, and generates an output. The computing device includes a processor and memory storing instructions for executing the algorithm, which may involve data filtering, transformation, or analysis. The system ensures efficient data handling by dynamically adjusting processing parameters based on input characteristics. In some implementations, the computing device is a mobile device, enabling on-device data processing without relying on external servers. This reduces latency and improves privacy by minimizing data transmission. The method is particularly useful in scenarios where real-time processing is required, such as mobile applications handling user-generated data. The system may also include error handling mechanisms to ensure robustness in varying network conditions. By integrating the processing logic directly into the mobile device, the solution addresses challenges related to bandwidth limitations and data security concerns in mobile computing environments.
10. The computer program product of claim 6, wherein the program instructions to re-train the compressed neural network with the sparse weights comprises program instructions to minimize to a pre-defined value an average squared difference between a label for an input of the neural network and final output of the neural network based on the set of sparse weights.
This invention relates to optimizing neural network training by compressing the network's weights while maintaining accuracy. The problem addressed is the computational inefficiency and resource demands of training large neural networks, particularly when deploying them in resource-constrained environments. The solution involves retraining a compressed neural network with sparse weights to minimize the difference between predicted and actual outputs. The compressed neural network is initially trained with a reduced set of weights, where many weights are set to zero (sparse weights). During retraining, the network is adjusted to ensure that the average squared difference (mean squared error) between the predicted output and the true label for a given input is minimized to a predefined threshold. This process balances computational efficiency with model accuracy, allowing the network to perform well even with fewer parameters. The retraining step ensures that the sparse weight configuration does not degrade performance, making the network suitable for applications requiring low-latency or low-power processing. The method is particularly useful in edge computing, mobile devices, and embedded systems where computational resources are limited. By optimizing the network's structure while maintaining accuracy, the invention enables efficient deployment of neural networks in real-world applications.
12. The computer system of claim 11, wherein the compression ratio is a ratio of zero weights to nonzero weights that is based on an amount of memory storage available for running the re-trained neural network on a computing device.
A neural network compression system optimizes memory usage by reducing the number of non-zero weights in a trained neural network model. The system addresses the challenge of deploying large neural networks on resource-constrained devices by selectively pruning weights to minimize memory footprint while maintaining model accuracy. The compression process involves identifying and removing weights with minimal impact on performance, thereby increasing the ratio of zero weights to non-zero weights. This ratio is dynamically adjusted based on the available memory storage of the target computing device, ensuring efficient execution without excessive computational overhead. The system further includes a re-training mechanism to fine-tune the pruned model, compensating for any accuracy loss due to weight removal. By balancing compression efficiency and model performance, the system enables neural networks to operate effectively on devices with limited memory resources. The compression ratio is determined by analyzing the device's memory constraints and optimizing the weight distribution accordingly, ensuring optimal resource utilization.
13. The computer system of claim 12, wherein the computing device is a mobile device.
A mobile computing device system is designed to enhance user interaction with digital content by dynamically adjusting display parameters based on environmental conditions. The system includes a mobile device equipped with sensors to detect ambient light levels, device orientation, and user proximity. The device processes sensor data to determine optimal display settings, such as brightness, contrast, and color temperature, to improve visibility and reduce eye strain. The system also adjusts touch sensitivity and input responsiveness based on detected environmental factors, ensuring consistent performance in varying conditions. Additionally, the device may modify content presentation, such as text size or layout, to enhance readability and usability. The system continuously monitors environmental changes and updates display parameters in real-time to maintain optimal user experience. This approach improves usability in diverse environments, such as outdoor settings with high ambient light or low-light indoor conditions, while conserving battery life by avoiding unnecessary power consumption. The system integrates seamlessly with existing mobile device hardware and software, providing automatic adjustments without manual user intervention.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
March 13, 2019
April 23, 2024
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.