Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
1. A convolutional operation device, comprising: a Dynamic Voltage Frequency Scaling device, an instruction storage unit, a control unit, a data access unit, an interconnection module, a primary computation module and multiple secondary computation modules, wherein the instruction storage unit is configured to store an instruction read in by the data access unit; the control unit is configured to read the instruction from the instruction storage unit and decode the instruction into a control signal for controlling an operation of other modules, where the other modules comprise the data access unit, the primary computation module, and the multiple secondary computation modules; the data access unit is configured to perform data or instruction read/write operation between an external address space and the convolutional operation device; the multiple secondary computation modules are configured to implement convolutional operation of input data and convolution kernels in a convolutional neural network algorithm; the interconnection module is configured for data transfer between the primary computation module and the secondary computation modules; the primary computation module is configured to splice intermediate vectors of all the input data into an intermediate result and perform subsequent operation on the intermediate result; the Dynamic Voltage Frequency Scaling device is configured to acquire working state information of the convolutional operation device and send voltage frequency scaling information to the convolutional operation device according to the working state information of the convolutional operation device, where the voltage frequency scaling information is configured to instruct the convolutional operation device to scale its working voltage or working frequency, wherein the primary computation module includes a first storage unit, a first computation unit, and a first data dependency relationship judgment unit.
A convolutional operation device is designed to efficiently perform convolutional operations in neural networks, addressing the need for high-performance, energy-efficient processing in deep learning applications. The device includes a Dynamic Voltage Frequency Scaling (DVFS) unit that monitors the device's working state and adjusts voltage and frequency to optimize power consumption and performance. An instruction storage unit holds instructions fetched by a data access unit, which manages data and instruction transfers between the device and external memory. A control unit decodes instructions into control signals that govern the operation of other modules, including the data access unit, a primary computation module, and multiple secondary computation modules. The secondary computation modules execute convolutional operations on input data and convolution kernels, while an interconnection module facilitates data transfer between the primary and secondary modules. The primary computation module processes intermediate results by splicing intermediate vectors and performing subsequent operations. It includes a first storage unit for data buffering, a first computation unit for arithmetic operations, and a first data dependency relationship judgment unit to manage data dependencies. This architecture enhances computational efficiency and adaptability in neural network processing.
2. The convolutional operation device of claim 1 , wherein the primary computation module is further configured to: add the intermediate result and offset data and perform an activation operation.
A convolutional operation device processes input data by performing convolution operations, which involve multiplying input data with filter weights and summing the results. The device includes a primary computation module that generates an intermediate result by multiplying input data with filter weights and summing the products. The primary computation module is further configured to add an offset (bias) value to the intermediate result and apply an activation function to the sum. The activation function introduces non-linearity to the output, enabling the device to model complex patterns in the input data. This operation is essential in neural networks, particularly convolutional neural networks (CNNs), where such computations are performed repeatedly to extract features from input data. The device may also include a secondary computation module that performs additional computations, such as pooling or normalization, to further process the output of the primary computation module. The overall system efficiently performs convolution operations, including bias addition and activation, to enhance the performance of machine learning models.
3. The convolutional operation device of claim 1 , wherein the multiple secondary computation modules are configured to: concurrently compute respective output scalars using the same input data and their respective convolutional kernels.
This invention relates to a convolutional operation device designed to improve computational efficiency in neural network processing. The device addresses the challenge of performing multiple convolutional operations in parallel, which is critical for accelerating deep learning tasks such as image recognition and natural language processing. The device includes a primary computation module and multiple secondary computation modules. The primary module generates input data for the secondary modules, which are specialized for performing convolutional operations. Each secondary module is configured to compute an output scalar using the same input data but with its own unique convolutional kernel. This concurrent processing allows the device to perform multiple convolutional operations simultaneously, significantly reducing the time required for neural network computations. The secondary modules operate independently, enabling flexible and scalable parallel processing. The invention enhances computational throughput by leveraging parallelism at the hardware level, making it suitable for high-performance computing applications.
4. The convolutional operation device of claim 1 , wherein an active function active used by the primary computation module is any nonlinear function including sigmoid, tanh, relu, and softmax.
A convolutional operation device performs computations for neural network processing, particularly in tasks like image recognition or signal processing. The device includes a primary computation module that executes convolutional operations using a set of filters applied to input data. The primary computation module applies an activation function to the results of these operations to introduce nonlinearity, enhancing the network's ability to learn complex patterns. The activation function can be any nonlinear function, such as sigmoid, tanh, ReLU, or softmax, each of which serves different purposes in neural network training. Sigmoid and tanh functions map outputs to a bounded range, useful for probability estimation or gradient stabilization, while ReLU (Rectified Linear Unit) introduces sparsity and computational efficiency by zeroing out negative values. Softmax is used for multi-class classification, converting outputs into probability distributions. The device may also include a secondary computation module for additional processing, such as pooling or normalization, to further refine the output. The use of configurable activation functions allows the device to adapt to various neural network architectures and tasks, improving flexibility and performance in deep learning applications.
5. The convolutional operation device of claim 1 , wherein the interconnection module forms a data path of continuous or discrete data between the primary computation module and the multiple secondary computation modules, and the interconnection module is any structure in a tree structure, a ring structure, a grid structure, a hierarchical interconnection structure, and a bus structure.
This invention relates to a convolutional operation device designed to enhance data processing efficiency in neural networks. The device addresses the challenge of optimizing data flow between computation modules to improve performance in convolutional neural networks (CNNs). The core of the invention is an interconnection module that dynamically manages data paths between a primary computation module and multiple secondary computation modules. These data paths can handle continuous or discrete data streams, ensuring flexible and efficient data transfer. The interconnection module can be configured in various topologies, including tree, ring, grid, hierarchical, or bus structures, allowing customization based on specific computational needs. The primary computation module performs initial processing tasks, while the secondary modules handle subsequent computations, with the interconnection module facilitating seamless data exchange. This modular design enables scalable and adaptable architectures, improving throughput and reducing latency in CNN operations. The invention is particularly useful in applications requiring high-performance convolutional computations, such as image recognition and real-time processing systems.
6. The convolutional operation device of claim 1 , wherein: the first storage unit is configured to cache the input data and output data used by the primary computation module in a computation process; the first computation unit is configured to complete various computational functions of the primary computation module; and the first data dependency relationship judgment unit is configured as a port through which the first computation unit reads and writes the first storage unit to ensure data read/write consistency of the first storage unit, and configured to read an input neuron vector from the first storage unit, to send the input neuron vector to the multiple secondary computation modules through the interconnection module, and to send an intermediate result vector from the interconnection module to the first computation unit.
This invention relates to a convolutional operation device for neural network computations, addressing the need for efficient data management and processing in deep learning systems. The device includes a primary computation module that performs core convolutional operations, supported by a first storage unit that caches input and output data used during computations. A first computation unit within the primary module handles various computational tasks, while a first data dependency relationship judgment unit acts as an interface between the computation unit and storage, ensuring data consistency during read/write operations. This unit also reads input neuron vectors from storage, distributes them to multiple secondary computation modules via an interconnection module, and routes intermediate results back to the primary computation unit. The secondary modules further process the data, enhancing parallelism and computational efficiency. The interconnection module facilitates data flow between the primary and secondary modules, optimizing resource utilization. The system ensures synchronized data access and minimizes bottlenecks, improving overall performance in convolutional neural network (CNN) operations. This design is particularly useful in hardware accelerators for deep learning, where efficient data handling and parallel processing are critical.
7. The convolutional operation device of claim 1 , wherein each secondary computation module of the multiple secondary computation modules includes: a second computation unit configured to receive the control signal sent by the control unit and perform arithmetic logical operation; a second data dependency relationship judgment unit configured to perform a read/write operation on a second storage unit and a third storage unit in a computation process to ensure read/write consistency of the second storage unit and the third storage unit; the second storage unit configured to cache the input data and the output scalar obtained by computation of the secondary computation module; and the third storage unit configured to cache the convolutional kernel required by the secondary computation module in the computation process.
This invention relates to a convolutional operation device designed to improve computational efficiency and data consistency in convolutional neural networks (CNNs). The device addresses the challenge of maintaining data integrity and synchronization during parallel convolution operations, which are critical for deep learning tasks. The device includes multiple secondary computation modules, each responsible for executing arithmetic logical operations on input data using convolutional kernels. Each module contains a second computation unit that processes data based on control signals from a central control unit. To ensure accurate and synchronized data access, a second data dependency relationship judgment unit manages read/write operations between a second storage unit and a third storage unit. The second storage unit caches input data and output scalars generated during computation, while the third storage unit stores the convolutional kernels required for the operations. This modular design allows for parallel processing while preventing data conflicts, enhancing overall computational performance in CNN applications. The invention optimizes memory access and arithmetic operations, making it suitable for high-performance computing environments.
8. The convolutional operation device of claim 7 , wherein the first data dependency relationship judgment unit and the second data dependency relationship judgment unit are configured to: ensure the read/write consistency, judge whether a dependency relationship is formed between data of a control signal which has yet not been performed and a control signal which is under execution, if a dependency relationship is not formed between data of a first control signal that has yet not been performed and a second control signal that is under execution, allow the control signal to be sent immediately, and allow the control signal to be sent only after all control signals the control signal depends on are performed if a dependency relationship is formed between data of the first control signal and the second control signal.
This invention relates to a convolutional operation device designed to manage data dependencies in control signals to ensure read/write consistency during execution. The device includes a first and second data dependency relationship judgment unit that evaluate whether a dependency exists between a control signal that has not yet been executed and one that is currently being executed. If no dependency is found, the pending control signal is allowed to proceed immediately. However, if a dependency is detected, the pending control signal is only executed after all dependent control signals have completed. This mechanism prevents data inconsistencies by enforcing an orderly execution sequence based on dependency relationships. The device ensures that operations relying on prior data are not executed prematurely, maintaining data integrity in convolutional operations. The judgment units dynamically assess dependencies to optimize performance while guaranteeing correct execution order. This approach is particularly useful in parallel processing environments where multiple control signals may be processed concurrently, as it balances efficiency with data consistency. The invention addresses the challenge of managing dependencies in real-time systems where control signals must be executed in a specific sequence to avoid errors.
9. The convolutional operation device of claim 1 , wherein the data access unit reads in at least one of the input data, the offset data, and the convolutional kernels from the external address space.
This invention relates to a convolutional operation device used in deep learning systems, particularly for accelerating convolutional neural network (CNN) computations. The device addresses the challenge of efficiently accessing and processing large datasets, convolutional kernels, and offset data required for CNN operations, which can bottleneck performance in traditional architectures. The device includes a data access unit that reads input data, offset data, and convolutional kernels from an external address space, such as memory or storage. This allows the device to dynamically fetch necessary data during operation, enabling flexible and scalable processing. The data access unit ensures that the required data is available for subsequent processing stages, such as convolution operations, without relying solely on pre-loaded internal storage. The convolutional operation device also includes a data processing unit that performs convolution operations using the fetched data. This unit applies the convolutional kernels to the input data, adjusting computations based on the offset data to handle operations like padding or stride adjustments. The device may further include a control unit that manages data flow and synchronization between the data access and processing units, ensuring efficient execution of convolution operations. By externalizing data storage and dynamically accessing it as needed, the device improves memory efficiency and reduces latency in CNN computations. This approach is particularly useful in high-performance computing environments where large datasets and complex models are processed. The invention enhances the adaptability and scalability of convolutional operation devices in modern AI systems.
10. The convolutional operation device of claim 1 , wherein the Dynamic Voltage Frequency Scaling device includes: an information acquisition unit configured to acquire the working state information of the convolutional operation device in real time; and a voltage frequency scaling unit configured to send the voltage frequency scaling information to the convolutional operation device according to the working state information of the convolutional operation device, where the voltage frequency scaling information is configured to instruct the convolutional operation device to scale its working voltage or working frequency, wherein the voltage frequency scaling information includes first voltage frequency scaling information.
This invention relates to a convolutional operation device with dynamic voltage and frequency scaling to optimize power efficiency. Convolutional operations, commonly used in deep learning and image processing, require significant computational resources and power. The challenge is to balance performance with energy consumption, especially in resource-constrained environments. The device includes a Dynamic Voltage Frequency Scaling (DVFS) system that adjusts the working voltage or frequency of the convolutional operation device in real time. The DVFS system has an information acquisition unit that continuously monitors the device's working state, such as computational load, temperature, or power consumption. Based on this data, a voltage frequency scaling unit generates scaling instructions to optimize performance and energy use. The scaling instructions include first voltage frequency scaling information, which specifies adjustments to the device's voltage or frequency to match current demands. This dynamic adjustment ensures efficient operation under varying workloads, reducing unnecessary power consumption while maintaining performance. The system is particularly useful in embedded systems, mobile devices, or edge computing applications where power efficiency is critical.
11. The convolutional operation device of claim 10 , wherein the working state information of the convolutional operation device includes an operating speed of the convolutional operation device, and the voltage frequency scaling unit is configured to: if the operating speed of the convolutional operation device is higher than a target speed, send the first voltage frequency scaling information to the convolutional operation device, where the first voltage frequency scaling information is configured to instruct the convolutional operation device to decrease its working frequency or working voltage, and the target speed is an operating speed of the convolutional operation device if a user requirement is met.
A convolutional operation device includes a voltage frequency scaling unit that dynamically adjusts the device's working frequency or voltage based on its operating speed. The device monitors its working state, including the current operating speed, and compares it to a target speed. The target speed is defined as the operating speed that meets a user's performance requirements. If the device's operating speed exceeds this target speed, the voltage frequency scaling unit sends first voltage frequency scaling information to the device, instructing it to reduce its working frequency or voltage. This adjustment optimizes power consumption while maintaining performance efficiency. The device may also include a data processing unit that processes input data and a convolutional operation unit that performs convolution operations on the processed data. The voltage frequency scaling unit ensures the device operates at an optimal power-performance balance by dynamically scaling voltage and frequency based on real-time performance demands. This approach enhances energy efficiency without compromising computational accuracy or user-defined performance thresholds.
12. The convolutional operation device of claim 10 , wherein the working state information of the convolutional operation device includes an operating speed of the data access unit and an operating speed of the primary computation module, the voltage frequency scaling information includes second voltage frequency scaling information, and the voltage frequency scaling unit is further configured to: in response to determining, according to the operating speed of the data access unit and the operating speed of the primary computation module, that a running time of the data access unit exceeds a running time of the primary computation module, send the second voltage frequency scaling information to the primary computation module, where the second voltage frequency scaling information is configured to instruct the primary computation module to decrease its working frequency or working voltage, wherein the voltage frequency scaling information includes third voltage frequency scaling information.
This invention relates to a convolutional operation device with dynamic voltage and frequency scaling to optimize power efficiency. The device includes a data access unit for retrieving input data, a primary computation module for performing convolution operations, and a voltage frequency scaling unit that adjusts the operating parameters of these components. The device monitors working state information, including the operating speeds of both the data access unit and the primary computation module. If the data access unit's running time exceeds that of the primary computation module, the voltage frequency scaling unit sends second voltage frequency scaling information to the primary computation module, instructing it to reduce its working frequency or voltage to balance the workload and conserve power. The voltage frequency scaling unit also generates third voltage frequency scaling information to further adjust the device's power consumption based on operational conditions. This adaptive scaling ensures efficient resource utilization while maintaining performance, particularly in scenarios where data access bottlenecks occur. The invention addresses the challenge of optimizing energy efficiency in convolutional neural network accelerators by dynamically adjusting computational resources to match data processing demands.
13. The convolutional operation device of claim 12 , wherein the voltage frequency scaling unit is further configured to: in response to determining, according to the operating speed of the data access unit and the operating speed of the primary computation module, that the running time of the primary computation module exceeds the running time of the data access unit, send the third voltage frequency scaling information to the data access unit, where the third voltage frequency scaling information is configured to instruct the data access unit to decrease its working frequency or working voltage.
This invention relates to a convolutional operation device designed to optimize performance by dynamically adjusting voltage and frequency settings. The device includes a data access unit for retrieving input data and a primary computation module for performing convolution operations. To balance workload between these components, the device incorporates a voltage frequency scaling unit that monitors their operating speeds. If the computation module takes longer to process data than the data access unit, the scaling unit sends control signals to reduce the data access unit's working frequency or voltage, preventing idle time and improving energy efficiency. The scaling unit also adjusts the computation module's settings if it operates faster than the data access unit, ensuring synchronized operation. This dynamic scaling mechanism enhances overall system efficiency by matching the performance of both units to the workload demands, reducing power consumption without sacrificing processing speed. The invention is particularly useful in applications requiring real-time convolution operations, such as image processing or neural network computations, where balancing computational load and energy use is critical.
14. The convolutional operation device of claim 12 , wherein the working state information of the convolutional operation device includes working state information of at least one unit in the instruction storage unit, the control unit, the data access unit, the interconnection module, the primary computation module, and the multiple secondary computation modules, wherein a count of the at least one unit is less than or equal to a count of the multiple secondary computation modules plus five, the voltage frequency scaling information includes fourth voltage frequency scaling information, and the voltage frequency scaling unit is configured to: in response to determining, according to the working state information of a target unit, that the target unit is in an idle state, send the fourth voltage frequency scaling information to the target unit, where the fourth voltage frequency scaling information is configured to instruct the target unit to decrease its working frequency or working voltage, and the target unit is any one of the at least one units.
This invention relates to a convolutional operation device with dynamic voltage and frequency scaling capabilities to optimize power consumption. The device includes multiple interconnected units such as an instruction storage unit, a control unit, a data access unit, an interconnection module, a primary computation module, and multiple secondary computation modules. The device monitors the working state of these units, including whether they are idle or active. To reduce power consumption, the device scales down the voltage or frequency of any idle unit. The voltage frequency scaling unit generates scaling instructions based on the working state of each unit, ensuring that only active units operate at higher performance levels while idle units are throttled. The number of monitored units is limited to the count of secondary computation modules plus five, ensuring efficient resource management. This approach dynamically adjusts power usage based on real-time operational demands, improving energy efficiency without compromising performance when needed. The invention is particularly useful in systems requiring high computational efficiency with adaptive power management.
15. The convolutional operation device of claim 14 , wherein the voltage frequency scaling information includes fifth voltage frequency scaling information, and the voltage frequency scaling unit is further configured to: in response to determining, according to the working state information of the target unit, that the target unit returns to a working state, send the fifth voltage frequency scaling information to the target unit, where the fifth voltage frequency scaling information is configured to instruct the target unit to increase its working voltage or working frequency.
A convolutional operation device is designed to optimize power consumption and performance in computing systems, particularly those involving deep learning or neural network processing. The device addresses the challenge of balancing energy efficiency with computational speed by dynamically adjusting voltage and frequency settings of its components based on their operational states. The device includes a voltage frequency scaling unit that monitors the working state of target units (such as processing elements or memory modules) and adjusts their voltage or frequency accordingly. When a target unit is idle or in a low-activity state, the scaling unit reduces its voltage or frequency to conserve power. Conversely, when the unit returns to an active working state, the scaling unit sends voltage frequency scaling information to increase the unit's voltage or frequency, thereby enhancing performance. This adaptive scaling ensures efficient resource utilization while maintaining responsiveness. The device may also include additional components like a control unit for managing operations and a storage unit for storing scaling parameters. The overall system dynamically optimizes power consumption and performance based on real-time operational demands.
16. A method for performing a single-layer convolutional neural network forward operation, comprising: pre-storing an input/output instruction at a starting address of an instruction storage unit; if operation is started, a control unit reading the input/output instruction from the starting address of the instruction storage unit, and a data access unit reading, according to a control signal decoded from the input/output instruction, all corresponding convolutional neural network computational instructions from an external address space, and caching all the instructions in the instruction storage unit; the control unit reading in a next input/output instruction from the instruction storage unit, and the data access unit reading all data required by a primary computation module from the external address space to a first storage unit of the primary computation module according to a control signal decoded from the next input/output instruction; the control unit reading in another input/output instruction from the instruction storage unit, and the data access unit reading convolutional kernel data required by secondary computation modules from the external address space according to a control signal decoded from another input/output instruction; the control unit reading in a next CONFIG instruction from the instruction storage unit, and the convolutional operation device configuring various constants required by computation of a present layer of a neural network according to a control signal decoded from the next CONFIG instruction; the control unit reading in a next COMPUTE instruction from the instruction storage unit, and the primary computation module sending input data in a convolutional window to the multiple secondary computation modules through an interconnection module according to a control signal decoded from the next COMPUTE instruction, storing the input data in second storage unit of the multiple secondary computation modules, and moving the convolutional window according to the instruction; computation units of the multiple secondary computation modules reading convolutional kernels from a third storage unit according to the control signal decoded from the COMPUTE instruction, reading the input data from the second storage units, completing convolutional operation of the input data and the convolutional kernels, and returning obtained output scalars through the interconnection module; splicing the output scalars returned by the multiple secondary computation modules into complete intermediate vectors step by step in the interconnection module; the primary computation module obtaining the intermediate vectors returned by the interconnection module, moving the convolutional window to traverse all the input data, splicing all the returned intermediate vectors into an intermediate result, reading offset data from the first storage unit according to the control signal decoded from the COMPUTE instruction, and adding the offset data and the intermediate result together to obtain an offset result through a vector addition unit; and then an activation unit activating the offset result and writing final output data back into the first storage unit; the control unit reading in yet another input/output instruction from the instruction storage unit, and the data access unit storing the output data of the first storage unit to a specified address of the external address space according to a control signal decoded from the next input/output instruction, then ending the operation.
This invention relates to a method for performing a single-layer convolutional neural network (CNN) forward operation using a specialized hardware architecture. The method addresses the computational inefficiency and latency issues in traditional CNN implementations by optimizing data flow and instruction handling. The process begins by pre-storing an input/output instruction at a starting address in an instruction storage unit. Upon operation initiation, a control unit reads this instruction and directs a data access unit to fetch all convolutional neural network computational instructions from an external memory, caching them in the instruction storage unit. Subsequent input/output instructions are then read to load required input data into a primary computation module's first storage unit. Another instruction retrieves convolutional kernel data for secondary computation modules. A CONFIG instruction configures constants needed for the current neural network layer. A COMPUTE instruction triggers the primary computation module to send input data within a convolutional window to multiple secondary computation modules via an interconnection module. These modules store the input data, read corresponding convolutional kernels, perform convolution operations, and return output scalars. The interconnection module splices these scalars into intermediate vectors, which the primary computation module combines into an intermediate result. Offset data is then added to this result, followed by activation processing to produce final output data, which is stored back in the first storage unit. A final input/output instruction directs the data access unit to write the output data to a specified external memory address, completing the operation. This method enhances efficiency by m
17. The method of claim 16 , wherein the method further includes: acquiring working state information of the convolutional operation device in real time; and sending voltage frequency scaling information to the convolutional operation device according to the working state information of the convolutional operation device, where the voltage frequency scaling information is configured to instruct the convolutional operation device to scale its working voltage or working frequency.
This invention relates to optimizing the performance of convolutional operation devices, such as those used in deep learning or image processing systems, by dynamically adjusting their voltage and frequency based on real-time working state information. Convolutional operations are computationally intensive and often require significant power, leading to inefficiencies in energy consumption and performance. The invention addresses this by monitoring the device's working state in real time, which may include metrics like computational load, temperature, or power consumption. Based on this data, the system generates voltage frequency scaling (VFS) information, which instructs the device to adjust its operating voltage or frequency. This dynamic scaling ensures that the device operates at optimal efficiency, balancing performance and power consumption. For example, under heavy workloads, the device may increase voltage or frequency to maintain performance, while under lighter loads, it may reduce these parameters to conserve energy. The method ensures adaptive and efficient operation of convolutional operation devices in varying workload conditions.
18. The method of claim 17 , wherein the working state information of the convolutional operation device includes an operating speed of the convolutional operation device, the voltage frequency scaling information includes first voltage frequency scaling information, and sending the voltage frequency scaling information to the convolutional operation device according to the working state information of the convolutional operation device includes: if the operating speed of the convolutional operation device is higher than a target speed, sending the first voltage frequency scaling information to the convolutional operation device, where the first voltage frequency scaling information is configured to instruct the convolutional operation device to decrease its working frequency or working voltage, and the target speed is an operating speed of the convolutional operation device if a user requirement is met.
This invention relates to optimizing the performance and energy efficiency of convolutional operation devices, such as those used in deep learning or neural network processing. The problem addressed is the inefficient use of power in convolutional operation devices, which often operate at fixed or suboptimal voltage and frequency settings, leading to unnecessary energy consumption or performance bottlenecks. The invention provides a method for dynamically adjusting the voltage and frequency scaling of a convolutional operation device based on its working state. The working state includes the device's operating speed, which is compared to a target speed that ensures user requirements are met. If the device's operating speed exceeds this target speed, the system sends first voltage frequency scaling information to the device. This information instructs the device to reduce its working frequency or voltage, thereby conserving energy while maintaining performance. The target speed is determined as the operating speed that satisfies the user's performance demands, ensuring that adjustments do not compromise functionality. This dynamic scaling helps balance power consumption and computational efficiency, particularly in resource-constrained or energy-sensitive applications.
19. The method of claim 18 , wherein the working state information of the convolutional operation device includes an operating speed of the data access unit and an operating speed of the primary computation module, the voltage frequency scaling information includes second voltage frequency scaling information, and sending the voltage frequency scaling information to the convolutional operation device according to the working state information of the convolutional operation device further includes: sending the second voltage frequency scaling information to the primary computation module, according to the operating speed of the data access unit and the operating speed of the primary computation module, in response to a running time of the data access unit being determined to exceed a running time of the primary computation module, where the second voltage frequency scaling information is configured to instruct the primary computation module to decrease its working frequency or working voltage, wherein the voltage frequency scaling information includes third voltage frequency scaling information.
This invention relates to optimizing power consumption in convolutional operation devices, particularly by dynamically adjusting voltage and frequency scaling based on operational states. The technology addresses inefficiencies in conventional systems where fixed voltage and frequency settings lead to unnecessary power consumption or performance bottlenecks. The method involves monitoring the working state of a convolutional operation device, including the operating speeds of its data access unit and primary computation module. Voltage frequency scaling information is generated and sent to the device to adjust its performance. Specifically, if the data access unit's running time exceeds that of the primary computation module, the primary computation module receives second voltage frequency scaling information instructing it to reduce its working frequency or voltage. This ensures balanced power usage and prevents the computation module from operating at higher speeds than necessary when the data access unit is the limiting factor. Additionally, the voltage frequency scaling information may include third voltage frequency scaling information for further adjustments. The approach improves energy efficiency without compromising computational throughput.
20. The method of claim 19 , wherein sending the voltage frequency scaling information to the convolutional operation device according to the working state information of the convolutional operation device further includes: according to the operating speed of the data access unit and the operating speed of the primary computation module, in response to the running time of the primary computation module being determined to exceed the running time of the data access unit, sending the third voltage frequency scaling information to the data access unit, where the third voltage frequency scaling information is configured to instruct the data access unit to decrease its working frequency or working voltage.
This invention relates to optimizing the performance of convolutional operation devices, particularly in scenarios where computational and data access units operate at mismatched speeds. The problem addressed is inefficiency in processing due to imbalances between the operating speeds of the data access unit and the primary computation module, leading to idle time or bottlenecks. The solution involves dynamically adjusting the voltage and frequency of the data access unit based on the working state of the convolutional operation device. Specifically, when the primary computation module's runtime exceeds that of the data access unit, the system sends voltage frequency scaling information to the data access unit, instructing it to reduce its working frequency or voltage. This adjustment ensures that the data access unit operates more efficiently, preventing unnecessary power consumption and improving overall system performance. The method leverages real-time monitoring of operating speeds and runtime comparisons to make these adjustments, ensuring optimal resource utilization. This approach is particularly useful in high-performance computing environments where energy efficiency and processing speed are critical.
21. The method of claim 19 , wherein the working state information of the convolutional operation device includes working state information of at least one units in the instruction storage unit, the control unit, the data access unit, the interconnection module, the primary computation module, and the multiple secondary computation modules, wherein a count of the at least one units is less than or equal to a count of the multiple secondary computation modules plus five, the voltage frequency scaling information includes fourth voltage frequency scaling information, and sending the voltage frequency scaling information to the convolutional operation device according to the working state information of the convolutional operation device further includes: according to the working state information of a target unit, in response to the target unit being determined to be in an idle state, sending the fourth voltage frequency scaling information to the target unit, where the fourth voltage frequency scaling information is configured to instruct the target unit to decrease its working frequency or working voltage, where the target unit is any one of the at least one units.
The invention relates to optimizing power consumption in convolutional operation devices, particularly in neural network accelerators. Convolutional operations are computationally intensive, and these devices often include multiple processing units that can operate at varying frequencies and voltages. The problem addressed is inefficient power usage, where idle or underutilized units continue to operate at high frequencies and voltages, wasting energy. The method involves monitoring the working state of various units within the convolutional operation device, including the instruction storage unit, control unit, data access unit, interconnection module, primary computation module, and multiple secondary computation modules. The working state information indicates whether a unit is active or idle. Based on this information, voltage and frequency scaling instructions are sent to the device. Specifically, if a unit (referred to as the target unit) is determined to be idle, it receives scaling instructions to reduce its working frequency or voltage. The number of monitored units is limited to the count of secondary computation modules plus five, ensuring scalability and efficiency. This approach dynamically adjusts power consumption by scaling down idle units, thereby improving energy efficiency without compromising performance. The method is particularly useful in edge computing and embedded systems where power efficiency is critical.
22. The method of claim 21 , wherein the voltage frequency scaling information includes fifth voltage frequency scaling information, and sending the voltage frequency scaling information to the convolutional operation device according to the working state information of the convolutional operation device further includes: according to the working state information of the target unit, in response to the target unit being determined to return to a working state, sending the fifth voltage frequency scaling information to the target unit, where the fifth voltage frequency scaling information is configured to instruct the target unit to increase its working voltage or working frequency.
This invention relates to dynamic voltage and frequency scaling in convolutional operation devices, such as those used in neural network processing. The problem addressed is optimizing power consumption and performance by adjusting voltage and frequency settings based on the device's working state. The method involves monitoring the working state of a convolutional operation device, which includes multiple units, and dynamically scaling voltage and frequency parameters to improve efficiency. Specifically, when a target unit within the device transitions back to an active working state, the system sends voltage and frequency scaling information to that unit. This information instructs the unit to increase its working voltage or frequency, enhancing computational performance when needed. The scaling parameters are determined based on the unit's current working state, ensuring adaptive adjustments to balance power efficiency and processing speed. The method supports real-time optimization by continuously assessing the device's operational conditions and applying appropriate scaling configurations. This approach is particularly useful in energy-constrained environments where dynamic performance tuning is required.
Unknown
September 8, 2020
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.