2-D graphical symbols for representing semantic meaning of a video clip

PublishedMarch 22, 2022

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

P feature encoding values are obtained for each of the Q frames in a video clip by image transformations of each frame along with performing computations of a specific succession of convolution and pooling layers of a CNN based deep learning model followed with operations of a nested invariance pooling layer. Each feature encoding value is then converted from real number to a corresponding integer value within a range designated for color display intensity according to a quantization scheme. A 2-D graphical symbol that contains N×N pixels is formed by placing respective color display intensities into the N×N pixels according to a data arrangement pattern for representing all frames of the video clip in form of P×Q feature encoding values, such that the 2-D graphical symbol possesses a semantic meaning of the video clip that can be recognized via image classification task using another trained CNN based deep learning model.

Patent Claims

13 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A method of creating a two-dimension (2-D) graphical symbol for representing semantic meaning of a video clip comprising: receiving a video stream in a computing system configured for performing computations of Cellular Neural Networks or Cellular Nonlinear Networks (CNN) based deep learning models, wherein the video stream includes a number of frames with each frame containing a 2-D image in time order; extracting a video clip from the received the video stream, the video clip containing Q frames, where Q is a positive integer; converting each frame to a resolution suitable as an input image to a first CNN based deep learning model; obtaining a vector of P feature encoding values of each frame by a set of image transformations of each frame along with performing computations of a specific succession of convolution and pooling layers of the first CNN based deep learning model followed with operations of a nested invariance pooling layer, wherein the feature encoding values are real numbers, and P is a multiple of 512; converting each of the P feature encoding values from the real number to a corresponding integer value within a range designated for color display intensity in accordance with a quantization scheme; and forming a two-dimension (2-D) graphical symbol that contains N×N pixels by placing respective color display intensities into the N×N pixels according to a data arrangement pattern for representing all frames of the video clip in form of P×Q feature encoding values, such that the 2-D graphical symbol possesses a semantic meaning of the video clip and the semantic meaning can be recognized via a second CNN based deep learning model with a set of trained filter coefficients, where N is a positive integer.

2. The method of claim 1 , wherein the semantic meaning of the video clip comprises an action.

3. The method of claim 1 , wherein the Q frames are sequentially chosen from the video stream.

4. The method of claim 1 , wherein the Q frames are arbitrarily chosen from the video stream and rearranged in time order.

5. The method of claim 1 , wherein the CNN based deep learning model is based on VGG(Visual Geometry Group)-16 model that contains 13 convolution layers and 5 max pooling layers.

6. The method of claim 1 , wherein the quantization scheme is a non-linear quantization based on K-means clustering of each of the P feature encoding values obtained using a training dataset.

7. The method of claim 1 , wherein the quantization scheme is a linear quantization based on boundaries determined by empirical observations of all of the feature encoding values obtained using a training dataset.

8. The method of claim 1 , wherein the data arrangement pattern for representing all frames of the video clip comprises arranging all of the P feature encoding values of each frame in a square format such that there are Q square images contained in the 2-D graphical symbol.

9. The method of claim 8 , wherein the Q square images are separated from one another by at least one pixel.

10. The method of claim 1 , wherein the data arrangement pattern for representing all frames of the video clip comprises arranging each of the P feature encoding values of all Q frames in a rectangular format such that there are P rectangular images contained in the 2-D graphical symbol.

11. The method of claim 10 , wherein the P rectangular images are separated from one another by at least one pixel.

12. The method of claim 1 , wherein the Q frames are so chosen that the P feature encoding values of all Q frames can be fit within the 2-D graphical symbol.

13. The method of claim 1 , wherein the computing system comprises a semi-conductor chip containing digital circuits dedicated for performing the convolutional neural networks algorithm.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G06V G06F H04N

Patent Metadata

Filing Date

November 5, 2019

Publication Date

March 22, 2022

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search