Point-Based Modeling of Human Clothing

PublishedJuly 15, 2025

Assigneenot available in USPTO data we have

InventorsArtur Andreevich GRIGOREV Victor Sergeevich LEMPITSKY llya Dmitrievich ZAKHARKIN Kirill Yevgenevich MAZUR

Technical Abstract

Patent Claims

8 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A method for training a draping network for modeling an outfit on a person, the outfit corresponding to a body pose and a body shape of a person, the method comprises: providing a set of frames of persons, each person wearing an outfit and the frames being video sequences in which each person makes movements; obtaining for each frame among the frames Skinned Multi-Person Linear (SMPL) meshes corresponding to a pose and a shape of a body of a person included in the frame; obtaining, for each frame among the frames, outfit mesh corresponding to the pose and the shape of the body of the person included in the frame; generating initial point clouds as a set of vertices of the SMPL meshes for each frame; setting randomly initialized d-dimensional code vector corresponding to outfit style encoding for each person; inputting the initial point clouds in a Cloud Transformer neural network of athe draping network and inputting the outfit code vectors in a Multi-Layer Perceptron (MLP) neural network encoder; processing the outfit code vectors with an MLP encoder neural network and passing the output of the MLP encoder neural network to the Cloud Transformer neural network, to deform the initial point clouds providing the output of the MLP encoder neural network and output the predicted point cloud of the outfit for each frame; obtaining, after processing all frames from the set of frames of persons, pre-trained draping network including weights of the trained MLP encoder neural network, weights of the trained Cloud Transformer neural network, outfit code vectors of encoding styles of all persons; and inputting, by pre-trained draping network, appropriate style of outfit, corresponding one of the vectors and one of the point clouds, on a body shape and a body pose of a user.

2. A method for obtaining predicted point cloud of outfit and an outfit code vector from an image of a material person in outfit for modeling the outfit on a person, the outfit being adapted to a body pose and a body shape of a person, the method comprises: obtaining, by detecting device, the image of the material person in the outfit; predicting Skinned Multi-Person Linear (SMPL) mesh in the desired pose and body shape from the image by the SMPLify method; generating an initial point cloud as a vertice of the SMPL mesh for the image; predicting a binary outfit mask corresponding pixels of the outfit in the image by a segmentation network; initializing, with random values, d-dimensional outfit code vector for outfit style encoding for the image; inputting the initial point cloud and the outfit code vector into a pre-trained draping network; obtaining a predicted point cloud of the outfit from the pre-trained draping network output; projecting the outfit point cloud to a black-and-white image with camera parameters of the image of the person; comparing, by obtaining a loss function, a projection of the predicted point cloud on the image with a ground truth binary outfit mask corresponding the pixels of the outfit in the image via a chamfer distance between a two-dimensional (2D) point clouds, which are projections of three-dimensional (3D) point clouds; optimizing the outfit code vector based on the obtained loss function; and inputting, by the obtained outfit code vector, predicted point cloud of the outfit of the image on any body shape and any body pose of a user.

3. The method of claim 2, training of the pre-trained network comprising: providing a set of frames of persons, each person wearing an outfit and the frames being video sequences in which each person makes movements; obtaining for each frame among the frames Skinned Multi-Person Linear (SMPL) meshes corresponding to a pose and a shape of a body of a person included in the frame; obtaining, for each frame among the frames, outfit mesh corresponding to the pose and the shape of the body of the person included in the frame; generating an initial point cloud as a set of vertices of the SMPL meshes for each frame; setting randomly initialized d-dimensional code vector corresponding to outfit style encoding for each person; inputting the initial point cloud in a Cloud Transformer neural network of a draping network and inputting the outfit code vectors in a Multi-Layer Perceptron (MLP) neural network encoder; processing the outfit code vector with an MLP encoder neural network and passing the output of the MLP encoder neural network to the Cloud Transformer neural network, to deform the initial point cloud providing the output of the MLP encoder neural network and output the predicted point cloud of the outfit for each frame; obtaining, after processing all frames from the set of frames of persons, pre-trained draping network including weights of the trained MLP encoder neural network, weights of the trained Cloud Transformer neural network, outfit code vectors of encoding styles of all persons; and inputting, by pre-trained draping network, appropriate style of outfit, corresponding one of the vectors and one of the point clouds, on a body shape and a body pose of the user.

4. A method for modeling outfit on a person, the outfit being adapted to a body pose and a body shape of any person, the method comprises: providing a color video stream of a first person; choosing, by a user, an outfit corresponding to a video of a second person in an outfit; obtaining a predicted point cloud of outfit and an outfit code vector according to a method for any of frame included in the video; initializing, with random values, n-dimensional appearance descriptor vector corresponding to each point of the point cloud; generating, by rasterization block, a 16-channeled image tensor with the use of the 3D coordinates of each point and a neural descriptor of each point, and a binary black-white mask corresponding to the pixels of an image covered by the points; processing, by a rendering network, the 16-channeled image tensor along with the binary black-white mask for obtaining outfit red-green-blue (RGB) color image and an outfit mask; optimizing rendering network weights and appearance descriptors values based on a ground truth video-sequence of a person to obtain the desired outfit appearance; imaging to the user, by a screen, video of the first person in the outfit of second person, by the predicted rendered outfitting image given body pose and body shape, wherein the user is inputs videos of a person and views the learned colored outfitting model retargeted to new body shapes and new body poses, rendered on top of a new video to dress the person from a video in an outfit chosen by the user.

5. The method of claim 4, further comprising imaging, to the user, colored outfitting model over the user, the user being a real person.

6. The method of claim 4, wherein the method of obtaining a predicted point cloud of outfit and an outfit code vector for any of frame included in the video comprises: obtaining, by detecting device, an image of a material person in outfit; predicting Skinned Multi-Person Linear (SMPL) mesh in the desired pose and body shape from the image by the SMPLify method; generating an initial point cloud as a vertice of the SMPL mesh for the image; predicting a binary outfit mask corresponding the pixels of the outfit in the image by a segmentation network; initializing, with random values, d-dimensional outfit code vector for outfit style encoding for the image; inputting the initial point cloud and the outfit code vector into a pre-trained draping network, obtaining a predicted point cloud of the outfit from the pre-trained draping network output; projecting the outfit point cloud to a black-and-white image with camera parameters of the image of the person; comparing, by obtaining a loss function, a projection of the predicted point cloud on the image with a ground truth binary outfit mask corresponding the pixels of the outfit in the image via a chamfer distance between a two-dimensional (2D) point clouds, which are projections of three-dimensional (3D) point clouds; optimizing the outfit code vector based on the obtained loss function; and inputting, by the obtained outfit code vector, predicted point cloud of the outfit of the image on any body shape and any body pose of the user.

7. A system for modeling outfit on a person, comprising: a detecting device connected to a computer system comprising a processor configured to be implemented as an operating unit connected to a display screen and a selection interface, wherein the detecting device is configured to obtain a color video stream of first person in real time, wherein the selection interface being configured to receive an input by a user choosing an outfit based on a video of a second person in the outfit, wherein the display screen is configured to display the first person in real time in the outfit selected by the user from the video based on data received from the operation unit, and wherein a draping network for modeling the outfit on the person based on a body pose and a body shape of the person is trained by: providing a set of frames of persons, each person wearing an outfit and the frames being video sequences in which each person makes movements; obtaining for each frame among the frames Skinned Multi-Person Linear (SMPL) meshes corresponding to a pose and a shape of a body of a person included in the frame; obtaining, for each frame among the frames, outfit mesh corresponding to the pose and the shape of the body of the person included in the frame; generating initial point clouds as a set of vertices of the SMPL meshes for each frame; inputting the initial point cloud in a Cloud Transformer neural network of the draping network and inputting the outfit code vectors in a Multi-Layer Perceptron (MLP) neural network encoder; and processing the outfit code vectors with an MLP encoder neural network and passing the output of the MLP encoder neural network to the Cloud Transformer neural network, to deform the initial point cloud providing the output of the MLP encoder neural network and output the predicted point cloud of the outfit for each frame.

8. The system of claim 7, wherein the user is the first person.

Patent Metadata

Filing Date

Unknown

Publication Date

July 15, 2025

Inventors

Artur Andreevich GRIGOREV

Victor Sergeevich LEMPITSKY

llya Dmitrievich ZAKHARKIN

Kirill Yevgenevich MAZUR

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search