A computer vision system configured for detection and recognition of objects in video and still imagery in a live or historical setting uses a teacher-student object detector training approach to yield a merged student model capable of detecting all of the classes of objects any of the teacher models is trained to detect. Further, training is simplified by providing an iterative training process wherein a relatively small number of images is labeled manually as initial training data, after which an iterated model cooperates with a machine-assisted labeling process and an active learning process where detector model accuracy improves with each iteration, yielding improved computational efficiency. Further, synthetic data is generated by which an object of interest can be placed in a variety of setting sufficient to permit training of models. A user interface guides the operator in the construction of a custom model capable of detecting a new object.
Legal claims defining the scope of protection, as filed with the USPTO.
2. The method of claim 1 wherein new unlabeled data is supplied to the system production model, the iterated model, and the optimizing step and the processing step includes processing the new unlabeled data.
3. The method of claim 1 wherein at least one of the system production model and the iterated model is a single shot multibox detector.
4. The method of claim 1 wherein classification comprises determining the probability distribution of the presence of any of the objects of interest, or the background, at an anchor box.
5. The method of claim 1 wherein classification is modeled as a softmax function to output confidence of a foreground class or a background class.
6. The method of claim 1 wherein regression is modeled as a non-linear multivariate regression function.
7. The method of claim 6 wherein the multivariate regression function outputs a four-dimensional vector representing center coordinates, width and height of the bounding box enclosing the object in the image.
8. The method of claim 1 wherein the second training dataset is only partly labeled.
9. The method of claim 1 wherein the system production model is interoperable with the iterated model.
10. The method of claim 1 wherein at least the second training dataset comprises at least in part synthetic data.
11. The method of claim 1 wherein at least one of the first and second machine learned models is selected from a group comprising a single shot multibox detector and a low shot learning detector.
12. The method of claim 1 wherein the system training output is provided to an operator for correction and the corrected output is processed in a second iteration of the processing step.
13. The method of claim 1 comprising the further step of providing a validation dataset to the system production model and the iterated model.
14. The method of claim 1 in which the iterated model comprises a plurality of iterated models, each comprising a second machine learned model capable, following training, of detecting and classifying at least one newly specified object.
15. The method of claim 1 wherein at least the second training dataset comprises in part video snippets.
16. The method of claim 1 wherein only part of the second training dataset is labeled.
18. The system of claim 17 wherein at least one of the first and second machine learned models is selected from a group comprising a single shot multibox detector and a low shot learning detector.
19. The system of claim 17 in which the second machine learned mode comprises a plurality of iterated models, each capable, following training, of detecting and classifying at least one newly specified object.
20. The system of claim 17 wherein new unlabeled data is processed in both the system production model and the iterated model.
21. The system of claim 17 wherein the system training output is provided to an operator for correction and the instructions cause the processor to reiterate execution of the process including the corrected output.
22. The system of claim 17 where at least the second training dataset comprises in part video snippets.
24. The storage media of claim 23 wherein the second training dataset comprises at least in part video snippets.
25. The storage media of claim 23 wherein the second training dataset comprises at least in part synthetic data.
26. The storage media of claim 23 wherein classification is modeled as a softmax function to output confidence of a foreground class or a background class.
27. The storage media of claim 23 wherein regression is modeled as a non-linear multivariate regression function.
28. The storage media of claim 27 wherein the multivariate regression function outputs a four-dimensional vector representing center coordinates, width and height of the bounding box enclosing the object in the image.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
October 4, 2022
April 25, 2023
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.