A coding and decoding method for images or videos is provided by embodiments of the present invention to improve coding and decoding efficiency. The method includes: establishing a visual dictionary, wherein, the visual dictionary includes one or more visual words; extracting features from a specific object in an image; determining whether there is a visual word in the visual dictionary matching the specific object by using a feature matching method; obtaining the index of the visual word matched and a geometric relationship between the specific object and the visual word matched, wherein, the geometric relationship is represented by a project parameter; entropy coding the index of the visual word matched and the project parameter instead of entropy coding the specific object.
Legal claims defining the scope of protection, as filed with the USPTO.
1. A coding method for images or videos, comprising: establishing a visual dictionary, wherein, the visual dictionary comprises one or more visual words; extracting features from a specific object in an image; determining whether there is a visual word in the visual dictionary matching the specific object by using a feature matching method; obtaining the index of the visual word matched and a geometric relationship between the specific object and the visual word matched; wherein, the geometric relationship is represented by a project parameter; entropy coding the index of the visual word matched and the project parameter instead of entropy coding the specific object.
2. The method of claim 1 , further comprising: calculating differences between the image and the visual word matched; coding the differences by using a sparse coding method or a traditional coding method to obtain residuals; entropy coding the residuals with the index of the visual word matched and the project parameter.
3. The method of claim 1 , wherein, each visual word comprises a visual object or a texture object, and corresponding features thereof.
4. The method of claim 1 , wherein, the project parameter comprises magnification, deflation, rotation, affine, relative position.
5. The method of claim 1 , wherein, determining whether there is a visual word in the visual dictionary matching the specific object comprises: comparing extracted local features of the specific object with local features of a visual word in the visual dictionary to obtain a local feature pair which comprises two identical or similar local features respectively extracted from the specific object and obtained from the visual word; calculating geometric distributions of the local features corresponding to the local feature pair, respectively in the specific object and in the visual word; determining whether the geometric distributions of the local features corresponding to the local feature pair, respectively in the specific object and the visual word, are consistent; considering the visual word as matching the specific object if the two geometric distributions are consistent.
6. The method of claim 5 , wherein, before comparing extracted local features of the specific object with local features of a visual word in a visual dictionary, the method further comprises: combining the local features of each specific object to obtain a global feature; searching the visual dictionary for a candidate visual word with the most similar global feature with that of the specific object.
7. The method of claim 6 , wherein, SIFT algorithm is used to extract the local features of the specific object.
8. A decoding method for images or videos, comprising: entropy decoding a code stream of an image to obtain an index and a project parameter of a visual word; obtaining an image of a visual object from a visual dictionary according to the index of the visual word; adjusting the image of the visual object with reference to the project parameter; overlapping adjusted images of all of visual objects to obtain a decoded image.
9. The method of claim 8 , further comprising: entropy decoding the code stream to obtain residuals; reversely decoding the residuals to obtain differences between the image to be decoded and the visual word; overlapping the adjusted image of all of the visual objects and the differences to obtain a decoded image.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
November 6, 2014
February 23, 2016
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.