There are disclosed various methods, apparatuses and computer program products for video encoding and decoding. In some embodiments the method for video encoding comprises obtaining compressed volumetric video data representing a three-dimensional scene or object (71); capsulating the compressed volumetric video data into a data structure (72); obtaining data of a two-dimensional projection of at least a part of the three-dimensional scene as seen from a certain viewport (73); and including the data of the two-dimensional projection into the data structure (74).
Legal claims defining the scope of protection, as filed with the USPTO.
1. An apparatus comprising: at least one processor; and at least one memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to: obtain compressed volumetric video data representing a three-dimensional scene or object; capsulate the compressed volumetric video data into a data structure as a track or a multi-track; project at least a part of the three-dimensional scene as seen from a certain viewport to one or more two-dimensional projections; obtain data of the one or more two-dimensional projections; include the data of the one or more two-dimensional projections into the data structure,, characterized in that the apparatus is further caused to: include information of the viewport into a timed metadata track to signal viewport information including viewport position, angle and camera parameters.
2. The apparatus according to claim 1, wherein the apparatus is further caused to: include the data of the one or more two-dimensional projections with higher quality or resolution than other capsulated volumetric video data.
3. The apparatus according to claim 1 wherein the apparatus is further caused to: form point cloud encoded data units representing a dataset of points within the three-dimensional scene or object; map the point cloud encoded data units to individual tracks within the data structure based on their types; construct a viewport track based on the data of the one or more two-dimensional projections; and construct a viewport synchronization track comprising projection data linking a two-dimensional viewport track and one or more three-dimensional viewing tracks.
4. The apparatus according to claim 3 wherein the apparatus is further caused to: include virtual camera information related to the viewport for the viewport synchronization track.
5. The apparatus according to claim 3 wherein the apparatus is further caused to: include viewport tracks with data of a viewport synchronization track associated with the viewport track.
6. The apparatus according to claim 1 wherein the apparatus is further caused to: include the data of the one or more two-dimensional projections into the data structure as one or more previously-rendered two-dimensional videos of one or more three-dimensional scenes or objects including the three-dimensional scene or object.
7. The apparatus according to claim 1 wherein the apparatus is further caused to: include data of two or more alternatives for the one or more two-dimensional projections into the data structure.
8. A method comprising: obtaining compressed volumetric video data representing a three-dimensional scene or object; capsulating the compressed volumetric video data into a data structure as a track or a multi-track; projecting at least a part of the three-dimensional scene as seen from a certain viewport to one or more two-dimensional projections; obtaining data of the one or more two-dimensional projections; including the data of the one or more two-dimensional projections into the data structure,, characterized in that the method further comprises: including information of the viewport into a timed metadata track to signal viewport information including viewport position, angle and camera parameters.
9. The method according to claim 8 further comprising: including the data of the one or more two-dimensional projections with higher quality or resolution than other capsulated volumetric video data.
10. The method according to claim 8 further comprising: forming point cloud encoded data units representing a dataset of points within the three-dimensional scene or object; mapping the point cloud encoded data units to individual tracks within the data structure based on their types; constructing a viewport track based on the data of the one or more two-dimensional projections; and constructing a viewport synchronization track comprising projection data linking a two-dimensional viewport track and one or more three-dimensional viewing tracks.
11. The method according to claim 10 further comprising: including virtual camera information related to the viewport for the viewport synchronization track.
12. The method according to claim 10 further comprising: including viewport tracks with data of a viewport synchronization track associated with the viewport track.
13. The method according to claim 8 further comprising: including the data of the one or more two-dimensional projections into the data structure as one or previously-rendered two-dimensional videos of one or more three-dimensional scenes or objects including that three-dimensional scene or object.
14. The method according to claim 8 further comprising: including data of two or more alternatives for the one or more two-dimensional projections into the data structure.
15. An apparatus comprising: at least one processor; and at least one memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to: receive a data structure as a track or multi-track comprising compressed volumetric video data representing a three-dimensional scene or object and data of a two-dimensional projection of at least a part of the three-dimensional scene as seen from a certain viewport; and select one of: the three-dimensional scene or object, or the two-dimensional projection for presentation,, characterized in that the apparatus is further caused to: receive information of the viewport from a timed metadata track to obtain viewport information including viewport position, angle and camera parameters.
16. The apparatus according to claim 15 wherein the apparatus is further caused to: render the data of the two-dimensional projection with higher quality or resolution than other capsulated volumetric video data.
17. The apparatus according to claim 15 wherein the apparatus is further caused to: reconstruct from a viewport synchronization track projection data linking two-dimensional viewport track and one or more three-dimensional viewing tracks; reconstruct data of the two-dimensional projection from a viewport track; map point cloud decoded data units from individual tracks within the data structure based on their types; and reconstruct the three-dimensional scene or object based on the point cloud decoded data units.
18. The apparatus according to claim 15 wherein the apparatus is further caused to: receive virtual camera information related to the viewport from the viewport synchronization track.
19. The apparatus according claim 15 wherein the apparatus is further caused to: receive data of the two-dimensional projection from the data structure as one or more pre-rendered two-dimensional videos of one or more three-dimensional scenes or objects including the three-dimensional scene or object.
20. A method for decoding comprising: receiving a data structure as a track or a multi-track comprising compressed volumetric video data representing a three-dimensional scene or object and data of a two-dimensional projection of at least a part of the three-dimensional scene as seen from a certain viewport; and selecting one of: the three-dimensional scene or object, or the two-dimensional projection for presentation,, characterized in that the method further comprises: receiving information of the viewport from a timed metadata track to obtain viewport information including viewport position, angle and camera parameters.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
December 7, 2020
January 14, 2025
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.