An apparatus is provided for classifying targets into a known-object group and an unknown-object group. The apparatus includes a speech/image data storage unit configured to store a spoken sound of a name of an object and an image of the object; a unit configured to calculate a speech confidence level of a speech for the name of the object with reference to a spoken sound of a name of a known object; a unit configured to calculate an image confidence level of an image of an object with respect to an image of a known object; and a unit configured to compare an evaluation value, which is obtained by combining the speech confidence level and image confidence level, with a threshold value, and classify a target object into an object group determined according to whether the spoken sound of the name and the image are known or unknown.
Legal claims defining the scope of protection, as filed with the USPTO.
1. An object classification apparatus comprising: a speech/image data storage unit configured to store a spoken sound of a name of an object and an image of the object; a speech confidence level calculation unit configured to calculate a speech confidence level of a speech for the name of the object with reference to a speech model of a name of a known object, the speech confidence level being a ratio of speech likelihood of the name of the object for the speech model of the name of the known object to the highest speech likelihood among speech likelihoods calculated in phoneme sequence for the spoken sound of the name of the object; an image confidence level calculation unit configured to calculate an image confidence level of an image of an object with reference to an image model of a known object, the image confidence level being a ratio of image likelihood of the object for the image model of the known object to the highest image likelihood among image likelihoods that an image model of the known object may take; and an object classification unit configured to compare an evaluation value, which is a combination of the speech confidence level and the image confidence level, with a threshold value, and to classify a target object into an object group determined according to whether the spoken sound of the name and the image are known or unknown.
2. The apparatus according to claim 1 , wherein the object classification unit is configured to classify objects into a group of objects whose spoken sound of name and image are known, and a group of objects whose spoken sound of name and image are unknown.
3. The apparatus according to claim 1 , wherein the object classification unit is configured to classify objects into a group of objects whose spoken sound of name and image are known, and a group of objects whose at least one of spoken sound of name and image is unknown.
4. The apparatus according to claim 1 , wherein the object classification unit is configured to classify objects into a group of objects whose spoken sound of name and image are known, a group of objects whose only one of spoken sound of name and image is unknown, and a group of objects whose spoken sound of name and image are unknown.
5. An object recognition apparatus comprising: the object classification apparatus of claim 1 ; and an object recognition unit configured to recognize which known object a target object is, the target object being classified into a group of objects whose spoken sound of name and image are known.
6. An object classification method, wherein a classification apparatus including a data storage unit configured to store a spoken sound of a name of an object and an image of the object is used, the method comprising: calculating a speech confidence level of a spoken sound for the name of the object with reference to a speech model of a name of a known object, the speech confidence level being a ratio of speech likelihood of the name of the object for the speech model of the name of the known object to the highest speech likelihood among speech likelihoods calculated in phoneme sequence for to the spoken sound of the name of the object; calculating an image confidence level of an image of an object with reference to an image model of a known object, the image confidence level being a ratio of image likelihood of the object of the image model of the known object to the highest image likelihood among image likelihoods the image model of the known object may take; and calculating an evaluation value by combining the speech confidence level and the image confidence level, comparing the evaluation value with a threshold value, and classifying a target object into an object group determined according to whether the spoken sound of the name and the image are known or unknown.
7. An object recognition method using a classification apparatus including a data storage unit configured to store a spoken sound of a name of an object and an image of the object, the object recognition method comprising: calculating a speech confidence level of a spoken sound for the name of the object with reference to a speech model of a name of a known object, the speech confidence level being a ratio of speech likelihood of the name of the object for the speech model of the name of the known object to the highest speech likelihood among speech likelihoods calculated in phoneme sequence for the spoken sound of the name of the object; calculating an image confidence level of an image of an object with reference to an image model of a known object, the image confidence level being a ratio of image likelihood of the object for the image model of the known object to the highest image likelihood among image likelihoods the image model of the known object may take; calculating an evaluation value by combining the speech confidence level and the image confidence level, comparing the evaluation value with a threshold value, and classifying a target object into an object group determined according to whether the spoken sound of the name and the image are known or unknown; and recognizing which known object the target object is, the target object being classified into a group of objects whose spoken sound of name and image are known.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
December 21, 2012
October 28, 2014
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.