Convolutional Layer Acceleration Unit, Embedded System Having the Same, and Method for Operating the Embedded System

PublishedFebruary 14, 2023

Assigneenot available in USPTO data we have

Technical Abstract

Patent Claims

15 claims

Legal claims defining the scope of protection, as filed with the USPTO.

2. The method of claim 1, wherein the LISF corresponds to a system comprising a platform, a device, a context, a command queue, and a kernel.

3. The method of claim 2, wherein the platform corresponds to a heterogeneous platform using at least one Central Processing Unit (CPU) and one Graphics Processing Unit (GPU).

4. The method of claim 2, wherein the device comprises actual processors for performing the mathematical operations.

5. The method of claim 2, wherein the context comprises an entity for managing the resources in a device set.

6. The method of claim 2, wherein the command queue comprises an entity for executing a kernel and performing memory mapping/unmapping and synchronization.

7. The method of claim 2, wherein the kernel comprises a code running on the device.

8. The method of claim 1, wherein the mathematical operations include operations in a neural network.

9. The method of claim 1, wherein the parallelization managing FE allocates a device memory, copies data from a host to a device, sets a kernel, and again copies results of an operation.

10. The method of claim 9, wherein instances of the kernel are executed in parallel while each of the instances is processing a single work item.

11. The method of claim 9, wherein instances of the kernel are executed together as multiple work items as a part of a work group.

12. The method of claim 11, wherein an instance of each kernel in the work group communicates with an additional instance.

13. The method of claim 9, wherein the parallelization managing FE manages a parallel-processing queue for performing parallel processing depending on a number of devices in the embedded system.

15. The method of claim 14, wherein the parallelization managing FE divides rows of matrix A by a number of OpenCL devices, and a size of a sub-matrix resulting from division is determined by a number of corresponding OpenCL devices and a number of usable OpenCL devices.

16. The method of claim 1, wherein the acceleration managing FE shares a memory between a host and devices to minimize the cost of the mathematical operations, each device performs mathematical routines without copying data between the host and the device by accessing the host's a vector and a matrix using a memory address.

17. The method of claim 1, wherein the acceleration managing FE determines on a size of a work group to allow each device to perform parallel processing.

Patent Metadata

Filing Date

Unknown

Publication Date

February 14, 2023

Inventors

Seung-Tae HONG

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search