Legal claims defining the scope of protection, as filed with the USPTO.
1. A method to detect a logo in images in video frames selected from a video stream, comprising: applying a saliency analysis and segmentation of selected regions in a selected video frame to determine segmented likely logo regions; processing the segmented likely logo regions with feature matching using correlation to generate a first match, neural network classification using a convolutional neural network to generate a second match, and text recognition using character segmentation and string matching to generate a third match; and deciding a most likely logo match by combining results from the first match, the second match, and the third match.
2. The method of claim 1 , wherein the saliency analysis comprises: applying a discrete cosine transform (OCT) on the segmented likely logo regions of an image in a selected video frame to determine spectral saliency of each segmented likely logo region.
3. The method of claim 1 , wherein saliency detection comprises: applying a discrete cosine transform (DCT) on the segmented likely logo regions of an image in a selected video frame to determine spectral saliency of each likely logo region; and measuring multi-scale similarity at two higher scales and a smaller scale of the spectral saliency of each likely logo region.
4. The method of claim 3 , wherein the multi-scale similarity measures include orientation gradient histograms, hue, saturation, value (HSV) histograms, and stroke width transform (SWT) statistics which include total number of strokes, number of horizontal strokes, number of vertical strokes, stroke density, and number of loops.
5. The method of claim 1 , wherein segmentation comprises: applying a stroke width transform (SWT) analysis to the selected regions to generate SWT statistics; applying a graph based segmentation algorithm to establish word boxes around likely logo character strings; and analyzing each of the word boxes to produce a set of character segmentations to delineate the characters in the likely logo character strings.
6. The method of claim 1 further comprising: combining neighboring keypoint regions with consistent aspect ratios and size to generate a new keypoint and region.
7. The method of claim 1 further comprising: detecting and combining edge segments in a keypoint region; and binning sample points on selected edges according to angle and distance with reference to a dominant orientation of the selected edges.
8. The method of claim 1 further comprising: using multiple text classifiers for robust logo text detection.
9. The method of claim 1 further comprising: using stroke heuristics to select the text classifier.
10. The method of claim 1 further comprising: using N-gram matching to recognize a segment.
11. An apparatus comprising: at least one processor; and a memory in communication with the at least one processor, the memory including non-transitory computer-readable code which, when executed, cause the at least one processor to at least: apply a saliency analysis and segmentation of selected regions in a selected video frame to determine segmented likely logo regions; process the segmented likely logo regions with feature matching using correlation to generate a first match, neural network classification using a convolutional neural network to generate a second match, and text recognition using character segmentation and string matching to generate a third match; and decide a most likely logo match by combining results from the first match, the second match, and the third match.
12. The apparatus of claim 11 , wherein the saliency analysis comprises: applying a discrete cosine transform (OCT) on the segmented likely logo regions of an image in a selected video frame to determine spectral saliency of each segmented likely logo region.
13. The apparatus of claim 11 , wherein saliency detection comprises: applying a discrete cosine transform (DCT) on the segmented likely logo regions of an image in a selected video frame to determine spectral saliency of each likely logo region; and measuring multi-scale similarity at two higher scales and a smaller scale of the spectral saliency of each likely logo region.
14. The apparatus of claim 13 , wherein the multi-scale similarity measures include orientation gradient histograms, hue, saturation, value (HSV) histograms, and stroke width transform (SWT) statistics which include total number of strokes, number of horizontal strokes, number of vertical strokes, stroke density, and number of loops.
15. A non-transitory computer-readable storage medium storing code which, when executed, cause a machine to at least: apply a saliency analysis and segmentation of selected regions in a selected video frame to determine segmented likely logo regions; process the segmented likely logo regions with feature matching using correlation to generate a first match, neural network classification using a convolutional neural network to generate a second match, and text recognition using character segmentation and string matching to generate a third match; and decide a most likely logo match by combining results from the first match, the second match, and the third match.
16. The computer-readable storage medium of claim 15 , wherein segmentation comprises: applying a stroke width transform (SWT) analysis to the selected regions to generate SWT statistics; applying a graph based segmentation algorithm to establish word boxes around likely logo character strings; and analyzing each of the word boxes to produce a set of character segmentations to delineate the characters in the likely logo character strings.
17. The computer-readable storage medium of claim 15 further comprising: combining neighboring keypoint regions with consistent aspect ratios and size to generate a new keypoint and region.
18. The computer-readable storage medium of claim 15 further comprising detecting and combining edge segments in a keypoint region; and binning sample points on selected edges according to angle and distance with reference to a dominant orientation of the selected edges.
19. The computer-readable storage medium of claim 15 further comprising: using multiple text classifiers for robust logo text detection.
20. The computer-readable storage medium of claim 15 further comprising: using stroke heuristics to select the text classifier.
Unknown
June 26, 2018
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.