P feature encoding values are obtained for each of the Q frames in a video clip by image transformations of each frame along with performing computations of a specific succession of convolution and pooling layers of a CNN based deep learning model followed with operations of a nested invariance pooling layer. Each feature encoding value is then converted from real number to a corresponding integer value within a range designated for color display intensity according to a quantization scheme. A 2-D graphical symbol that contains N×N pixels is formed by placing respective color display intensities into the N×N pixels according to a data arrangement pattern for representing all frames of the video clip in form of P×Q feature encoding values, such that the 2-D graphical symbol possesses a semantic meaning of the video clip that can be recognized via image classification task using another trained CNN based deep learning model.
Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
1. A method of creating a two-dimension (2-D) graphical symbol for representing semantic meaning of a video clip comprising: receiving a video stream in a computing system configured for performing computations of Cellular Neural Networks or Cellular Nonlinear Networks (CNN) based deep learning models, wherein the video stream includes a number of frames with each frame containing a 2-D image in time order; extracting a video clip from the received the video stream, the video clip containing Q frames, where Q is a positive integer; converting each frame to a resolution suitable as an input image to a first CNN based deep learning model; obtaining a vector of P feature encoding values of each frame by a set of image transformations of each frame along with performing computations of a specific succession of convolution and pooling layers of the first CNN based deep learning model followed with operations of a nested invariance pooling layer, wherein the feature encoding values are real numbers, and P is a multiple of 512; converting each of the P feature encoding values from the real number to a corresponding integer value within a range designated for color display intensity in accordance with a quantization scheme; and forming a two-dimension (2-D) graphical symbol that contains N×N pixels by placing respective color display intensities into the N×N pixels according to a data arrangement pattern for representing all frames of the video clip in form of P×Q feature encoding values, such that the 2-D graphical symbol possesses a semantic meaning of the video clip and the semantic meaning can be recognized via a second CNN based deep learning model with a set of trained filter coefficients, where N is a positive integer.
This invention relates to a method for generating a two-dimensional (2D) graphical symbol that represents the semantic meaning of a video clip. The method addresses the challenge of condensing video content into a compact, visually interpretable form while preserving its semantic information. The process begins by receiving a video stream containing multiple frames, each representing a 2D image in chronological order. A video clip consisting of Q frames is extracted from the stream, where Q is a positive integer. Each frame is then converted to a resolution suitable for input into a first deep learning model based on Cellular Neural Networks (CNN) or Cellular Nonlinear Networks (CNN). The model processes each frame through a series of convolution and pooling layers, followed by a nested invariance pooling layer, to extract a vector of P feature encoding values, where P is a multiple of 512. These real-number feature values are quantized into integer values within a designated range for color display intensity. The resulting feature encoding values are arranged into an N×N pixel grid, forming a 2D graphical symbol that encapsulates the semantic meaning of the video clip. This symbol can be recognized and interpreted by a second CNN-based deep learning model with trained filter coefficients, enabling semantic understanding of the original video content. The method ensures that the graphical symbol retains meaningful information from the video clip while being compact and visually interpretable.
2. The method of claim 1 , wherein the semantic meaning of the video clip comprises an action.
This invention relates to video analysis systems that extract and interpret semantic meaning from video clips, particularly identifying actions within the video content. The system processes video data to detect and classify actions performed by objects or individuals in the video. The method involves analyzing visual and temporal features of the video clip to determine the semantic meaning, which includes recognizing specific actions such as walking, running, or object manipulation. The system may use machine learning models, such as convolutional neural networks (CNNs) or recurrent neural networks (RNNs), trained on labeled video datasets to accurately classify actions. The extracted action data can be used for applications like video surveillance, activity recognition, or human-computer interaction. The invention improves upon existing systems by enhancing the accuracy and efficiency of action recognition in video content, addressing challenges in dynamic environments where actions may vary in speed, angle, or context. The method may also incorporate additional contextual information, such as object interactions or environmental factors, to refine action classification. This approach enables real-time or batch processing of video data to provide meaningful insights into the actions depicted.
3. The method of claim 1 , wherein the Q frames are sequentially chosen from the video stream.
This invention relates to video processing, specifically methods for selecting key frames (Q frames) from a video stream to improve compression efficiency or other video analysis tasks. The problem addressed is the need for an effective and systematic way to choose representative frames from a video sequence, which is crucial for applications like video compression, content-based retrieval, or frame interpolation. The invention provides a method where Q frames are selected sequentially from the video stream, meaning they are chosen in the order they appear in the original video without skipping or reordering. This sequential selection ensures that the chosen frames maintain temporal coherence, which is important for preserving the natural flow of the video. The method may involve analyzing the video stream to identify frames that meet certain criteria, such as motion activity, scene changes, or other features, but the selection itself is done in a sequential manner. This approach can be combined with other techniques, such as frame differencing or motion estimation, to further refine the selection process. The invention aims to balance computational efficiency with the accuracy of frame representation, making it suitable for real-time or resource-constrained applications.
4. The method of claim 1 , wherein the Q frames are arbitrarily chosen from the video stream and rearranged in time order.
This invention relates to video processing, specifically techniques for handling and rearranging key frames (Q frames) in a video stream. The problem addressed is the need to efficiently manage and reorganize key frames to improve video encoding, decoding, or analysis. Key frames are critical for video compression and playback, as they provide reference points for predictive coding. However, their arbitrary placement in a video stream can complicate processing tasks. The method involves selecting key frames from a video stream and rearranging them in chronological order. This ensures that the key frames follow a logical sequence, which can enhance compression efficiency, reduce decoding errors, and simplify video editing or analysis. The rearrangement process may involve detecting key frames, extracting them from their original positions, and reordering them based on their timestamps or other temporal metadata. This approach is particularly useful in applications where key frames are scattered or irregularly distributed, such as in adaptive bitrate streaming or video editing workflows. By reorganizing the key frames, the method improves the consistency and reliability of video processing operations.
5. The method of claim 1 , wherein the CNN based deep learning model is based on VGG(Visual Geometry Group)-16 model that contains 13 convolution layers and 5 max pooling layers.
This invention relates to a deep learning-based image processing system that uses a convolutional neural network (CNN) to analyze and classify images. The system addresses the challenge of accurately identifying and categorizing objects within images, particularly in applications requiring high precision, such as medical imaging, autonomous vehicles, or industrial inspection. The core of the system is a CNN model based on the VGG-16 architecture, which consists of 13 convolutional layers and 5 max-pooling layers. The convolutional layers extract hierarchical features from the input image, while the max-pooling layers reduce spatial dimensions, enhancing computational efficiency and robustness to variations in object appearance. The model is trained on labeled datasets to learn discriminative features, enabling it to classify images into predefined categories with high accuracy. The system may include preprocessing steps to normalize and enhance input images, improving feature extraction. The CNN model processes the preprocessed images, generating feature maps that are passed through fully connected layers for final classification. The architecture ensures deep feature learning, capturing both low-level (edges, textures) and high-level (shapes, objects) patterns. This approach improves upon traditional image classification methods by leveraging the VGG-16 model's deep architecture, which has proven effective in various computer vision tasks. The system can be adapted for real-time applications by optimizing the model for speed without sacrificing accuracy.
6. The method of claim 1 , wherein the quantization scheme is a non-linear quantization based on K-means clustering of each of the P feature encoding values obtained using a training dataset.
A method for improving data encoding efficiency in machine learning or signal processing systems addresses the challenge of optimizing quantization schemes to reduce computational overhead while preserving accuracy. The method involves applying a non-linear quantization technique to feature encoding values derived from a dataset. Specifically, the quantization scheme is determined by performing K-means clustering on the feature encoding values obtained from a training dataset. This clustering process groups similar feature values into distinct clusters, allowing for a non-linear mapping that better captures the underlying data distribution compared to linear quantization methods. By using K-means clustering, the method adaptively assigns quantization levels based on the natural structure of the data, reducing distortion and improving reconstruction quality. The approach is particularly useful in applications where memory and computational resources are limited, such as edge devices or real-time processing systems. The method ensures that the quantization process is tailored to the specific characteristics of the training data, enhancing performance without requiring extensive manual tuning.
7. The method of claim 1 , wherein the quantization scheme is a linear quantization based on boundaries determined by empirical observations of all of the feature encoding values obtained using a training dataset.
This invention relates to a method for improving feature encoding in machine learning systems by using a linear quantization scheme based on empirically determined boundaries. The method addresses the problem of efficiently representing high-dimensional feature data while preserving discriminative information, which is critical for accurate model performance. Traditional quantization techniques often rely on arbitrary or fixed boundaries, leading to suboptimal encoding and reduced model accuracy. The method involves first obtaining feature encoding values from a training dataset. These values are analyzed to identify key statistical properties, such as distribution patterns or critical thresholds, which define the boundaries for quantization. The boundaries are then used to partition the feature space into discrete intervals, where each interval corresponds to a quantized value. This linear quantization approach ensures that the most discriminative regions of the feature space are preserved, improving the model's ability to distinguish between different classes or patterns. The method may also include preprocessing steps, such as normalization or dimensionality reduction, to enhance the quality of the feature encoding values before quantization. Additionally, the quantization scheme can be adapted dynamically based on new data, allowing the system to refine its boundaries over time. This adaptive capability ensures that the model remains accurate as the underlying data distribution evolves. The result is a more efficient and accurate representation of features, leading to improved performance in machine learning tasks.
8. The method of claim 1 , wherein the data arrangement pattern for representing all frames of the video clip comprises arranging all of the P feature encoding values of each frame in a square format such that there are Q square images contained in the 2-D graphical symbol.
This invention relates to video data encoding and representation, specifically addressing the challenge of efficiently compressing and organizing video frame data for storage or transmission. The method involves a structured approach to encoding video frames by extracting and arranging feature encoding values in a compact, two-dimensional graphical symbol. Each frame of the video clip is processed to generate P feature encoding values, which are then organized into a square format. These square arrangements are combined to form a larger two-dimensional graphical symbol containing Q square images, where each square image corresponds to a frame's encoded features. This approach enables efficient data representation by leveraging spatial organization, reducing redundancy, and facilitating faster retrieval or reconstruction of video frames. The method is particularly useful in applications requiring low-latency video processing, such as real-time streaming or embedded systems with limited computational resources. By structuring the encoded data in a square format, the invention improves storage efficiency and simplifies the decoding process, making it suitable for various multimedia and communication systems.
9. The method of claim 8 , wherein the Q square images are separated from one another by at least one pixel.
In the field of image processing, particularly in systems that analyze or manipulate multiple images, a challenge arises when images are too closely spaced, leading to interference or overlap that can degrade performance. To address this, a method involves generating a set of Q square images, where each square image is separated from adjacent square images by at least one pixel. This separation ensures that the images do not overlap or interfere with one another, improving accuracy in subsequent processing steps. The method may include generating these square images from a larger input image or dataset, where each square image is extracted or synthesized with precise spatial boundaries. The separation of at least one pixel between images prevents edge effects and ensures that each image can be processed independently without interference from neighboring images. This technique is particularly useful in applications such as machine vision, medical imaging, or automated inspection systems where precise image analysis is critical. By maintaining this minimum separation, the method enhances the reliability and accuracy of image-based analysis or recognition tasks.
10. The method of claim 1 , wherein the data arrangement pattern for representing all frames of the video clip comprises arranging each of the P feature encoding values of all Q frames in a rectangular format such that there are P rectangular images contained in the 2-D graphical symbol.
This invention relates to video data compression and representation, specifically improving the efficiency of encoding video frames using a structured data arrangement pattern. The problem addressed is the need for compact and efficient representation of video data, particularly for applications requiring low-bandwidth transmission or storage. The method involves encoding a video clip by extracting feature encoding values from each frame. These values are derived from key features of the frames, such as motion vectors, color histograms, or other relevant descriptors. The encoded values are then organized into a rectangular format, where each of the P feature encoding values from all Q frames is arranged in a two-dimensional (2-D) graphical symbol. This arrangement ensures that the entire video clip is represented as a structured, compact symbol, enabling efficient storage or transmission. The rectangular format allows for easy reconstruction of the original video frames by decoding the feature values from the 2-D symbol. This approach reduces redundancy and improves compression efficiency compared to traditional video encoding methods. The method is particularly useful in applications where bandwidth or storage constraints are critical, such as video streaming, surveillance systems, or real-time communication. The structured arrangement also facilitates faster processing and retrieval of video data.
11. The method of claim 10 , wherein the P rectangular images are separated from one another by at least one pixel.
This invention relates to image processing, specifically methods for arranging multiple rectangular images within a larger image or display area. The problem addressed is the efficient and visually distinct placement of multiple rectangular images to prevent overlap or visual confusion while optimizing space utilization. The method involves arranging P rectangular images within a defined area such that each image is separated from every other image by at least one pixel. This separation ensures clear visual distinction between adjacent images, reducing the risk of misinterpretation or visual clutter. The arrangement may involve adjusting the position, size, or orientation of the images to maintain the minimum pixel separation while fitting all P images within the available space. The method may also include dynamically adjusting the separation distance based on factors such as image content, display resolution, or user preferences. The technique is particularly useful in applications like image tiling, dashboard displays, or multi-image presentations where clarity and organization are critical. The separation constraint ensures that even at high densities, individual images remain distinguishable.
12. The method of claim 1 , wherein the Q frames are so chosen that the P feature encoding values of all Q frames can be fit within the 2-D graphical symbol.
A method for encoding data into a two-dimensional graphical symbol, such as a QR code, involves selecting a subset of frames (Q frames) from a sequence of frames. The selection is based on ensuring that the encoded feature values (P feature encoding values) of all Q frames can be represented within the constraints of the 2-D graphical symbol. This method addresses the challenge of efficiently encoding data into a compact graphical format while maintaining readability and error correction capabilities. The selection process ensures that the encoded information remains within the symbol's capacity, avoiding overflow or distortion. The method may include preprocessing steps to extract or generate the P feature encoding values from the frames, followed by an optimization step to determine the optimal subset of Q frames that meet the encoding constraints. The resulting graphical symbol retains the necessary data while adhering to the physical and structural limitations of the 2-D symbol format. This approach is particularly useful in applications where data must be encoded into small, high-density symbols for quick scanning and decoding.
13. The method of claim 1 , wherein the computing system comprises a semi-conductor chip containing digital circuits dedicated for performing the convolutional neural networks algorithm.
A semiconductor chip is designed to accelerate convolutional neural network (CNN) algorithms by integrating specialized digital circuits. These circuits are optimized to perform the computationally intensive operations required by CNNs, such as convolution, pooling, and activation functions, with high efficiency. The chip includes dedicated hardware components that handle matrix multiplications, data movement, and memory access patterns specific to CNN workloads, reducing latency and power consumption compared to general-purpose processors. The architecture may feature parallel processing units, on-chip memory buffers, and optimized data pathways to minimize bottlenecks during neural network inference or training. This hardware acceleration enables real-time processing of large-scale CNN models in applications such as image recognition, autonomous systems, and edge computing devices. The chip may also include interfaces for high-speed data input/output and support for various CNN frameworks, ensuring compatibility with existing software ecosystems. By offloading CNN computations from general-purpose CPUs or GPUs, the chip improves energy efficiency and performance for AI workloads.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
November 5, 2019
March 22, 2022
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.