A look-ahead system and method for pan and zoom detection in video sequences is disclosed. The system and method use motion vectors in a reference coordinate system to identify pans and zooms in video sequences. The identification of pans and zooms enables parameter switching for improved encoding in various video standards (e.g., H.264) and improved video retrieval of documentary movies and other video sequences in video databases or other storage devices.
Legal claims defining the scope of protection, as filed with the USPTO.
1. A method of detecting at least one of a pan and a zoom in a video sequence, comprising: receiving a video sequence, the video sequence comprising a plurality of frames; determining a set of motion vectors for each frame in the plurality of the frames; determining a motion angle for each motion vector; detecting a scene cut in the plurality of the frames based on the motion vectors of the frames; selecting a set of frames from the plurality of the frames responsive to detection of the scene cut; identifying at least two largest regions in each frame, wherein a first largest region includes motion vectors having a first orientation and the second largest region includes motion vectors having a second orientation; determining percentages of each frame covered by each of the at least two largest regions; determining a statistical measure of the motion vectors for at least one of the two largest regions using a look-ahead detector; and comparing the percentages and statistical measure to threshold values to identify whether the plurality of frames includes at least one of a pan and a zoom.
2. The method of claim 1 , wherein selecting a set of frames from the plurality of the frames responsive to detection of the scene cut comprises: selecting a set of video frames from the plurality of the frames of that includes all the frames in the video sequence up to and including a frame just before the detected scene cut.
3. The method of claim 1 , wherein frame differences and motion vectors of the plurality of the frames are used to detect a scene cut.
4. The method of claim 1 , wherein the motion angles are computed in one from the group of coordinate systems consisting of polar, Cartesian, spherical and cylindrical coordinate systems.
5. The method of claim 1 , wherein the percentages of each frame covered by the at least two largest regions are determined from the number of pixels in each region as a percentage of the total number of pixels in a frame.
6. The method of claim 1 , wherein the statistical measure is a variance of the motion angles of the motion vectors within at least one of the identified two largest regions in the frame.
7. A non-transitory computer-readable storage medium storing executable computer program code for detecting at least one of a pan and a zoom in a video sequence, comprising computer program code for: receiving a video sequence, the video sequence comprising a plurality of frames; determining a set of motion vectors for each frame in the plurality of the of frames; determining a motion angle for each motion vector; detecting a scene cut in the plurality of the frames based on the motion vectors of the frames; selecting a set of frames from the plurality of the frames responsive to detection of the scene cut; identifying at least two largest regions in each frame, wherein a first largest region includes motion vectors having a first orientation and the second largest region includes motion vectors having a second orientation; determining percentages of each frame covered by each of the at least two largest regions; determining a statistical measure of the motion vectors for at least one of the two largest regions using a look-ahead detector; and comparing the percentages and statistical measure to threshold values to identify whether the plurality of frames includes at least one of a pan and a zoom.
8. The computer-readable storage medium of claim 7 , wherein the computer program code for selecting a set of frames from the plurality of the frames responsive to detection of the scene cut comprises code for: selecting a set of video frames from the plurality of the frames of that includes all the frames in the video sequence up to and including a frame just before the detected scene cut.
9. The computer-readable storage medium of claim 7 , wherein frame differences and motion vectors of the plurality of the frames are used to detect a scene cut.
10. The computer-readable storage medium of claim 7 , wherein the motion angles are computed in one from the group of coordinate systems consisting of polar, Cartesian, spherical and cylindrical coordinate systems.
11. The computer-readable storage medium of claim 7 , wherein the percentages of each frame covered by the at least two largest regions are determined from the number of pixels in each region as a percentage of the total number of pixels in a frame.
12. The computer-readable storage medium of claim 7 , wherein the statistical measure is a variance of the motion angles of the motion vectors within at least one of the identified two largest regions in the frame.
13. A method of detecting at least one of a pan and a zoom in a video sequence, comprising: detecting motion vectors to detect a scene cut in a plurality of frames; selecting a set of frames from the video sequence in response to detection of a scene cut; identifying at least two regions in each selected frame, a first region having motion vectors of a first orientation and a second region having motion vectors of a second orientation; determining amounts of each frame covered by each of the at least two regions; and comparing the detected amounts to threshold values, and based on comparisons, classifying the video sequence as having a pan or zoom.
14. The method of claim 13 , wherein selecting a set of frames from the plurality of the frames responsive to detection of the scene cut comprises: selecting a set of video frames from the plurality of the frames of that includes all the frames in the video sequence up to and including a frame just before the detected scene cut.
15. The method of claim 13 , wherein frame differences and motion vectors of the plurality of the frames are used to detect a scene cut.
16. The method of claim 13 , wherein the motion angles are computed in one from the group of coordinate systems consisting of polar, Cartesian, spherical and cylindrical coordinate systems.
17. The method of claim 13 , wherein the amounts of each frame covered by the at least two regions are determined from the number of pixels in each region as a percentage of the total number of pixels in a frame.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
January 11, 2010
January 7, 2014
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.