Method, Apparatus, and Storage Medium for Recognizing Image Object

PublishedJanuary 18, 2022

Assigneenot available in USPTO data we have

Technical Abstract

Patent Claims

17 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A method for recognizing a target object in a target image, the method comprising: obtaining, by a device comprising a memory storing instructions and a processor in communication with the memory, an image recognition instruction, the image recognition instruction carrying object identification information used for indicating a target object in a target image; obtaining, by the device, an instruction feature vector matching the image recognition instruction; obtaining, by the device, an image feature vector set matching the target image, the image feature vector set comprising an i th image feature vector for indicating an image feature of the target image in an i th scale, and i being a positive integer; and recognizing, by the device, the target object from the target image according to the instruction feature vector and the image feature vector set by: obtaining, by the device, a target image feature vector in the image feature vector set, the target image feature vector indicating an image feature vector of the target image obtained through a first neural network model, and the first neural network model being obtained through training with a plurality of first sample images, obtaining, by the device, a change-image feature vector in the image feature vector set, the change-image feature vector comprising T variable image feature vectors of the target image obtained through a second neural network model, the second neural network model being obtained through training with a plurality of second sample images, and T being a positive integer, determining, by the device according to the instruction feature vector and the target image feature vector, an object feature vector matching the object identification information, and recognizing, by the device, the target object from the target image according to the object feature vector and the change-image feature vector.

2. The method according to claim 1 , wherein the recognizing the target object from the target image according to the object feature vector and the change-image feature vector comprises: obtaining, by the device, a vector parameter of the object feature vector; scaling, by the device, each variable image feature vector in the change-image feature vector according to the vector parameter, and performing a conversion operation on the scaled change-image feature vector, to obtain T intermediate image feature vectors; inputting, by the device, the object feature vector as an initial feature vector into a third neural network model, the third neural network model being obtained through training with a plurality of third sample images; inputting, by the device, the T intermediate image feature vectors sequentially into the third neural network model; determining, by the device according to an output result from the third neural network model, a target region that matches the object identification information and is in the target image; and recognizing, by the device, the target object from the target region.

3. The method according to claim 2 , wherein the determining, by the device according to the output result from the third neural network model, the target region comprises: obtaining, by the device according to the output result, a probability vector matching the target image, the probability vector comprising a j th probability element indicating a probability that a j th pixel location in the target image is in the target region, and j being a positive integer; obtaining, by the device, at least one target probability element indicating a probability greater than a threshold from the probability vector; and determining, by the device, the target region according to at least one pixel location indicated by the at least one target probability element in the target image.

4. The method according to claim 2 , wherein before the inputting the object feature vector as the initial feature vector into the third neural network model, the method further comprises: training, by the device, the third neural network model according to the plurality of third sample images by: obtaining a target value matching a training input value of the third neural network model and a training output value outputted by the third neural network model, and adjusting the third neural network model according to the target value and the training output value by using a loss function.

5. The method according to claim 1 , wherein the determining, by the device according to the instruction feature vector and the target image feature vector, the object feature vector matching the object identification information comprises: obtaining, by the device, a coordinate vector matching the target image; splicing, by the device, the instruction feature vector, the target image feature vector, and the coordinate vector, to obtain a spliced feature vector; and inputting, by the device, the spliced feature vector into a fourth neural network model, to obtain the object feature vector, the fourth neural network model being obtained through training with a plurality of sample objects.

6. The method according to claim 1 , wherein after the recognizing, by the device, the target object from the target image, the method further comprises: performing, by the device, an image processing operation on the target object, the image processing operation comprising at least one of the following operations: a cropping operation on the target object, or an editing operation on the target object.

7. An apparatus for recognizing a target object in a target image, the apparatus comprising: a memory storing instructions; and a processor in communication with the memory, wherein, when the processor executes the instructions, the processor is configured to cause the apparatus to: obtain an image recognition instruction, the image recognition instruction carrying object identification information used for indicating a target object in a target image, obtain an instruction feature vector matching the image recognition instruction, obtain an image feature vector set matching the target image, the image feature vector set comprising an i th image feature vector for indicating an image feature of the target image in an i th scale, and i being a positive integer, and recognize the target object from the target image according to the instruction feature vector and the image feature vector set by: obtaining a target image feature vector in the image feature vector set, the target image feature vector indicating an image feature vector of the target image obtained through a first neural network model, and the first neural network model being obtained through training with a plurality of first sample images, obtaining a change-image feature vector in the image feature vector set, the change-image feature vector comprising T variable image feature vectors of the target image obtained through a second neural network model, the second neural network model being obtained through training with a plurality of second sample images, and T being a positive integer, determining, according to the instruction feature vector and the target image feature vector, an object feature vector matching the object identification information, and recognizing the target object from the target image according to the object feature vector and the change-image feature vector.

8. The apparatus according to claim 7 , wherein, when the processor is configured to cause the apparatus to recognize the target object from the target image according to the object feature vector and the change-image feature vector, the processor is configured to cause the apparatus to: obtain a vector parameter of the object feature vector; scale each variable image feature vector in the change-image feature vector according to the vector parameter, and perform a conversion operation on the scaled change-image feature vector, to obtain T intermediate image feature vectors; input the object feature vector as an initial feature vector into a third neural network model, the third neural network model being obtained through training with a plurality of third sample images; input the T intermediate image feature vectors sequentially into the third neural network model; determine, according to an output result from the third neural network model, a target region that matches the object identification information and is in the target image; and recognize the target object from the target region.

9. The apparatus according to claim 8 , wherein, when the processor is configured to cause the apparatus to determine, according to the output result from the third neural network model, the target region, the processor is configured to cause the apparatus to: obtain, according to the output result, a probability vector matching the target image, the probability vector comprising a j th probability element indicating a probability that a j th pixel location in the target image is in the target region, and j being a positive integer; obtain at least one target probability element indicating a probability greater than a threshold from the probability vector; and determine the target region according to at least one pixel location indicated by the at least one target probability element in the target image.

10. The apparatus according to claim 8 , wherein, when the processor is configured to cause the apparatus to input the object feature vector as the initial feature vector into the third neural network model, the processor is configured to cause the apparatus to: train the third neural network model according to the plurality of third sample images by: obtaining a target value matching a training input value of the third neural network model and a training output value outputted by the third neural network model, and adjusting the third neural network model according to the target value and the training output value by using a loss function.

11. The apparatus according to claim 7 , wherein, when the processor is configured to cause the apparatus to determine, according to the instruction feature vector and the target image feature vector, the object feature vector matching the object identification information, the processor is configured to cause the apparatus to: obtain a coordinate vector matching the target image; splice the instruction feature vector, the target image feature vector, and the coordinate vector, to obtain a spliced feature vector; and input the spliced feature vector into a fourth neural network model, to obtain the object feature vector, the fourth neural network model being obtained through training with a plurality of sample objects.

12. The apparatus according to claim 7 , wherein, after the processor is configured to cause the apparatus to recognize the target object from the target image, the processor is configured to cause the apparatus to: perform an image processing operation on the target object, the image processing operation comprising at least one of the following operations: a cropping operation on the target object, or an editing operation on the target object.

13. A non-transitory computer readable storage medium storing computer readable instructions, the computer readable instructions, when executed by a processor, causing the processor to perform: obtaining an image recognition instruction, the image recognition instruction carrying object identification information used for indicating a target object in a target image; obtaining an instruction feature vector matching the image recognition instruction; obtaining an image feature vector set matching the target image, the image feature vector set comprising an i th image feature vector for indicating an image feature of the target image in an i th scale, and i being a positive integer; and recognizing the target object from the target image according to the instruction feature vector and the image feature vector set by: obtaining a target image feature vector in the image feature vector set, the target image feature vector indicating an image feature vector of the target image obtained through a first neural network model, and the first neural network model being obtained through training with a plurality of first sample images, obtaining a change-image feature vector in the image feature vector set, the change-image feature vector comprising T variable image feature vectors of the target image obtained through a second neural network model, the second neural network model being obtained through training with a plurality of second sample images, and T being a positive integer, determining, according to the instruction feature vector and the target image feature vector, an object feature vector matching the object identification information, and recognizing the target object from the target image according to the object feature vector and the change-image feature vector.

14. The non-transitory computer readable storage medium according to claim 13 , wherein, when the computer readable instructions cause the processor to perform recognizing the target object from the target image according to the object feature vector and the change-image feature vector, the computer readable instructions cause the processor to perform: obtaining a vector parameter of the object feature vector; scaling each variable image feature vector in the change-image feature vector according to the vector parameter, and performing a conversion operation on the scaled change-image feature vector, to obtain T intermediate image feature vectors; inputting the object feature vector as an initial feature vector into a third neural network model, the third neural network model being obtained through training with a plurality of third sample images; inputting the T intermediate image feature vectors sequentially into the third neural network model; determining, according to an output result from the third neural network model, a target region that matches the object identification information and is in the target image; and recognizing the target object from the target region.

15. The non-transitory computer readable storage medium according to claim 14 , wherein, when the computer readable instructions cause the processor to perform determining, according to the output result from the third neural network model, the target region, the computer readable instructions cause the processor to perform: obtaining, according to the output result, a probability vector matching the target image, the probability vector comprising a j th probability element indicating a probability that a j th pixel location in the target image is in the target region, and j being a positive integer; obtaining at least one target probability element indicating a probability greater than a threshold from the probability vector; and determining the target region according to at least one pixel location indicated by the at least one target probability element in the target image.

16. The non-transitory computer readable storage medium according to claim 14 wherein, before the computer readable instructions cause the processor to perform inputting the object feature vector as the initial feature vector into the third neural network model, the computer readable instructions further cause the processor to perform: training the third neural network model according to the plurality of third sample images by: obtaining a target value matching a training input value of the third neural network model and a training output value outputted by the third neural network model, and adjusting the third neural network model according to the target value and the training output value by using a loss function.

17. The non-transitory computer readable storage medium according to claim 13 , wherein, when the computer readable instructions cause the processor to perform determining, according to the instruction feature vector and the target image feature vector, the object feature vector matching the object identification information, the computer readable instructions cause the processor to perform: obtaining a coordinate vector matching the target image; splicing the instruction feature vector, the target image feature vector, and the coordinate vector, to obtain a spliced feature vector; and inputting the spliced feature vector into a fourth neural network model, to obtain the object feature vector, the fourth neural network model being obtained through training with a plurality of sample objects.

Patent Metadata

Filing Date

Unknown

Publication Date

January 18, 2022

Inventors

Ruiyu LI

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search