US-9609307

Method of converting 2D video to 3D video using machine learning

PublishedMarch 28, 2017

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

Machine learning method that learns to convert 2D video to 3D video from a set of training examples. Uses machine learning to perform any or all of the 2D to 3D conversion steps of identifying and locating objects, masking objects, modeling object depth, generating stereoscopic image pairs, and filling gaps created by pixel displacement for depth effects. Training examples comprise inputs and outputs for the conversion steps. The machine learning system generates transformation functions that generate the outputs from the inputs; these functions may then be used on new 2D videos to automate or semi-automate the conversion process. Operator input may be used to augment the results of the machine learning system. Illustrative representations for conversion data in the training examples include object tags to identify objects and locate their features, Bézier curves to mask object regions, and point clouds or geometric shapes to model object depth.

Patent Claims

14 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A machine learning method of converting 2D video to 3D video, comprising: obtaining a training set comprising a plurality of conversions, each conversion comprising a 2D scene comprising one or more 2D frames; a corresponding 3D conversion dataset that describes conversion of said 2D scene to 3D, comprising inputs and outputs for 2D to 3D conversion steps, said 2D to 3D conversion steps comprising obtaining said one or more 2D frames; locating and identifying an object in one or more object frames within said one or more 2D frames, each object frame containing an image of at least a portion of said object; generating an object mask for said object in said one or more object frames, said object mask identifying one or more masked pixels representing said object in said one or more object frames; generating an object depth model that assigns a pixel depth to one or more of said one or more masked pixels; generating a stereoscopic image pair for each of said one or more object frames based on said object depth model, said stereoscopic image pair comprising a left image and a right image; and, generating one or more gap filling pixel values for one or more missing pixels in said left image or in said right image; training a machine learning system on said training set; obtaining a 2D video; applying said machine learning system to said 2D video to automatically perform one or more of said 2D to 3D conversion steps on said 2D video; and, accepting input from an operator to modify or complete one or more of said 2D to 3D conversion steps on said 2D video.

2. The method of claim 1 wherein said machine learning system performs said generating an object mask for said object in said one or more object frames; and, said corresponding 3D conversion dataset comprises a masking input comprising an identity of said object; and, a location of one or more feature points of said object in said one or more object frames; and, a masking output comprising a path comprising one or more segments, each segment comprising a curve defined by one or more control points, wherein said path is a boundary of said object mask.

3. The method of claim 1 wherein said machine learning system performs said generating an object depth model; and, said corresponding 3D conversion dataset comprises an object depth model input comprising said object mask; and, an object depth model output comprising one or more regions within said object mask; and, a planar or curved 3D surface associated with each of said one or more regions.

4. The method of claim 1 wherein said machine learning system performs said generating an object depth model; and, said corresponding 3D conversion dataset comprises an object depth model input comprising said object mask; and, an object depth model output comprising a point cloud of 3D points, each of said 3D points associated with a pixel within said object mask.

5. The method of claim 1 wherein said machine learning system performs said generating one or more gap filling pixel values; said generating one or more gap filling pixel values comprises generating a clean plate frame from one or more of said one or more 2D frames; and, copying pixel values from said clean plate frame to said one or more missing pixels; and, said corresponding 3D conversion dataset comprises a clean plate input comprising one or more of said one or more 2D frames; and, a clean plate output comprising said clean plate frame associated with said one or more 2D frames.

6. The method of claim 1 wherein said machine learning system performs said generating an object mask for said object in said one or more object frames; said generating an object depth model; said generating one or more gap filling pixel values; and, wherein said generating one or more gap filling pixel values comprises generating a clean plate frame from one or more of said one or more 2D frames; and, copying pixel values from said clean plate frame to said one or more missing pixels; and, said corresponding 3D conversion dataset comprises a masking input comprising an identity of said object; and, a location of one or more feature points of said object in said one or more object frames; a masking output comprising a path comprising one or more segments, each segment comprising a curve defined by one or more control points, wherein said path is a boundary of said object mask; an object depth model input comprising said object mask; an object depth model output comprising one or more of a region model comprising one or more regions within said object mask; a planar or curved 3D surface associated with each of said one or more regions; and, a point cloud of 3D points, each of said 3D points associated with a pixel within said object mask; a clean plate input comprising one or more of said one or more 2D frames; and, a clean plate output comprising said clean plate frame associated with said one or more 2D frames.

7. The method of claim 1 , wherein said generating an object mask for said object in said one or more object frames comprises defining a 3D space associated with said one or more object frames; obtaining a 3D object model of said object; and, defining a position and orientation of said 3D object model in said 3D space that aligns said 3D object model with said image of at least a portion of said object in said one or more object frames; and, said assigns a pixel depth to one or more of said one or more masked pixels comprises associates a point in said 3D object model in said 3D space with each masked pixel; and, assigns a depth of said point in said 3D space to said pixel depth for the associated masked pixel.

8. The method of claim 7 , wherein said obtaining a 3D object model of said object comprises obtaining 3D scanner data captured from said object; and, converting said 3D scanner data into said 3D object model.

9. The method of claim 8 , wherein said obtaining said 3D scanner data comprises obtaining data from a time-of-flight system or a light-field system.

10. The method of claim 8 , wherein said obtaining said 3D scanner data comprises obtaining data from a triangulation system.

11. The method of claim 8 , wherein said converting said 3D scanner data into said 3D object model comprises retopologizing said 3D scanner data to form said 3D object model from a reduced number of polygons or parameterized surfaces.

12. The method of claim 7 , further comprising dividing said 3D object model into object parts, wherein said object parts may have motion relative to one another; augmenting said 3D object model with one or more degrees of freedom that reflect said motion relative to one another of said object parts; and, determining values of each of said one or more degrees of freedom that align said image of said at least a portion of said object in a plurality of frames of said one or more object frames with said 3D object model modified by said values of said one or more degrees of freedom.

13. The method of claim 12 , wherein said determining values of each of said one or more degrees of freedom comprises selecting one or more features in each of said object parts, each having coordinates in said 3D object model; determining pixel locations of said one or more features in said one or more object frames; and, calculating a position and orientation of one of said object parts and calculating said values of each of said one or more degrees of freedom to align a projection of said coordinates in said 3D model onto a camera plane with said pixel locations in said one or more object frames.

14. The method of claim 13 , wherein said determining pixel locations of said one or more features in said one or more object frames comprises selecting said pixel locations in one or more key frames; and, tracking said features across one or more non-key frames using a computer.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

H04N G06T

Patent Metadata

Filing Date

December 14, 2015

Publication Date

March 28, 2017

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search