In one embodiment, an apparatus comprises a memory and a processor. The memory stores visual data captured by one or more sensors. The processor detects one or more first objects in the visual data based on a machine learning model and one or more first reference templates. The processor further determines, based on an object ontology, that the visual data is expected to contain a second object, wherein the object ontology indicates that the second object is related to the one or more first objects. The processor further detects the second object in the visual data based on the machine learning model and a second reference template. The processor further determines, based on an inference rule, that the visual data is expected to contain a third object. The processor further detects the third object in the visual data based on the machine learning model and a third reference template.
Legal claims defining the scope of protection, as filed with the USPTO.
1. An apparatus, comprising: a memory to store visual data captured by one or more sensors; and a processor to: detect one or more first objects in the visual data based on a machine learning model and one or more first reference templates, wherein the one or more first reference templates are for object recognition of the one or more first objects; determine, based on an object ontology, that the visual data is expected to contain a second object, wherein the object ontology indicates that the second object is related to the one or more first objects; detect the second object in the visual data based on the machine learning model and a second reference template, wherein the second reference template is for object recognition of the second object; determine, based on an inference rule, that the visual data is expected to contain a third object; and detect the third object in the visual data based on the machine learning model and a third reference template, wherein the third reference template is for object recognition of the third object.
2. The apparatus of claim 1 , further comprising the one or more sensors, wherein the one or more sensors comprise a camera.
3. The apparatus of claim 1 , further comprising a communication interface to: obtain, via a network, the second reference template from a reference template repository, wherein the second reference template is obtained based at least in part on determining that the visual data is expected to contain the second object; and obtain, via the network, the third reference template from the reference template repository, wherein the third reference template is obtained based at least in part on determining that the visual data is expected to contain the third object.
4. The apparatus of claim 1 , wherein the object ontology comprises a representation of a hierarchy of objects at a plurality of levels of abstraction.
5. The apparatus of claim 4 , wherein the processor to determine, based on the object ontology, that the visual data is expected to contain the second object is further to: determine, based on the hierarchy of objects, that the second object is a parent of the one or more first objects.
6. The apparatus of claim 5 , wherein: the one or more first objects comprise a plurality of first objects; and the hierarchy of objects indicates that the second object is a common parent of the plurality of first objects.
7. The apparatus of claim 4 , wherein the processor to determine, based on the object ontology, that the visual data is expected to contain the second object is further to: determine, based on the hierarchy of objects, that the second object is a child of the one or more first objects.
8. The apparatus of claim 1 , wherein the inference rule comprises a plurality of conditions associated with recognizing a particular visual scene.
9. The apparatus of claim 8 , wherein the plurality of conditions indicate that the particular visual scene is associated with the one or more first objects, the second object, and the third object.
10. The apparatus of claim 9 , wherein the processor is further to: identify the inference rule from a plurality of inference rules, wherein the inference rule is identified based on a determination that the visual data and the particular visual scene associated with the inference rule each comprise the one or more first objects and the second object.
11. The apparatus of claim 9 , wherein the processor is further to: determine that the visual data comprises the particular visual scene associated with the inference rule, wherein the visual data satisfies the plurality of conditions associated with recognizing the particular visual scene.
12. A system, comprising: a camera to capture visual data representing an environment; a memory to store one or more first reference templates associated with object recognition of one or more first objects; a communication interface to receive, over a network from a reference template repository, a second reference template associated with object recognition of a second object and a third reference template associated with object recognition of a third object; and one or more processing devices to: detect the one or more first objects in the visual data based on a machine learning model and the one or more first reference templates; determine, based on an object ontology, that the visual data is expected to contain the second object, wherein the object ontology indicates that the second object is related to the one or more first objects; detect the second object in the visual data based on the machine learning model and the second reference template; determine, based on an inference rule, that the visual data is expected to contain the third object; and detect the third object in the visual data based on the machine learning model and the third reference template.
13. The system of claim 12 , wherein the one or more processing devices comprise: an object recognition processor to: detect the one or more first objects in the visual data based on the machine learning model and the one or more first reference templates; detect the second object in the visual data based on the machine learning model and the second reference template; and detect the third object in the visual data based on the machine learning model and the third reference template; a semantic processor to determine, based on the object ontology, that the visual data is expected to contain the second object; and an inference processor to determine, based on the inference rule, that the visual data is expected to contain the third object.
14. The system of claim 12 , further comprising: a cache to store a plurality of reference templates, wherein the plurality of reference templates comprises: the one or more first reference templates; the second reference template; or the third reference template; and a cache warmer to: determine that the plurality of reference templates may be needed for object recognition; retrieve the plurality of reference templates from the reference template repository; and store the plurality of reference templates in the cache.
15. The system of claim 12 , wherein the object ontology comprises a representation of a hierarchy of objects at a plurality of levels of abstraction.
16. The system of claim 15 , wherein the one or more processing devices to determine, based on the object ontology, that the visual data is expected to contain the second object are further to: determine, based on the hierarchy of objects, that the second object is a parent of the one or more first objects; or determine, based on the hierarchy of objects, that the second object is a child of the one or more first objects.
17. The system of claim 12 , wherein the inference rule comprises a plurality of conditions associated with recognizing a particular visual scene, wherein the plurality of conditions indicate that the particular visual scene comprises the one or more first objects, the second object, and the third object.
18. The system of claim 17 , wherein the one or more processing devices are further to: determine that the visual data comprises the particular visual scene associated with the inference rule, wherein the visual data satisfies the plurality of conditions associated with recognizing the particular visual scene.
19. At least one machine accessible storage medium having instructions stored thereon, wherein the instructions, when executed on a machine, cause the machine to: obtain visual data captured by one or more sensors; detect one or more first objects in the visual data based on a machine learning model and one or more first reference templates, wherein the one or more first reference templates are for object recognition of the one or more first objects; determine, based on an object ontology, that the visual data is expected to contain a second object, wherein the object ontology indicates that the second object is related to the one or more first objects; detect the second object in the visual data based on the machine learning model and a second reference template, wherein the second reference template is for object recognition of the second object; determine, based on an inference rule, that the visual data is expected to contain a third object; and detect the third object in the visual data based on the machine learning model and a third reference template, wherein the third reference template is for object recognition of the third object.
20. The storage medium of claim 19 , wherein: the object ontology comprises a representation of a hierarchy of objects at a plurality of levels of abstraction; and the instructions that cause the machine to determine, based on the object ontology, that the visual data is expected to contain the second object further cause the machine to: determine, based on the hierarchy of objects, that the second object is a parent of the one or more first objects; or determine, based on the hierarchy of objects, that the second object is a child of the one or more first objects.
21. The storage medium of claim 19 , wherein the inference rule comprises a plurality of conditions associated with recognizing a particular visual scene, wherein the plurality of conditions indicate that the particular visual scene comprises the one or more first objects, the second object, and the third object.
22. The storage medium of claim 21 , wherein the instructions further cause the machine to: determine that the visual data comprises the particular visual scene associated with the inference rule, wherein the visual data satisfies the plurality of conditions associated with recognizing the particular visual scene.
23. A method, comprising: obtaining visual data captured by one or more sensors; detecting one or more first objects in the visual data based on a machine learning model and one or more first reference templates, wherein the one or more first reference templates are for object recognition of the one or more first objects; determining, based on an object ontology, that the visual data is expected to contain a second object, wherein the object ontology indicates that the second object is related to the one or more first objects; detecting the second object in the visual data based on the machine learning model and a second reference template, wherein the second reference template is for object recognition of the second object; determining, based on an inference rule, that the visual data is expected to contain a third object; and detecting the third object in the visual data based on the machine learning model and a third reference template, wherein the third reference template is for object recognition of the third object.
24. The method of claim 23 , wherein: the object ontology comprises a representation of a hierarchy of objects at a plurality of levels of abstraction; and determining, based on the object ontology, that the visual data is expected to contain the second object comprises: determining, based on the hierarchy of objects, that the second object is a parent of the one or more first objects; or determining, based on the hierarchy of objects, that the second object is a child of the one or more first objects.
25. The method of claim 23 , wherein: the inference rule comprises a plurality of conditions associated with recognizing a particular visual scene, wherein the plurality of conditions indicate that the particular visual scene comprises the one or more first objects, the second object, and the third object; and the method further comprises determining that the visual data comprises the particular visual scene associated with the inference rule, wherein the visual data satisfies the plurality of conditions associated with recognizing the particular visual scene.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
September 25, 2018
July 21, 2020
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.