Building Deep Learning Ensembles with Diverse Targets

PublishedJanuary 11, 2022

Assigneenot available in USPTO data we have

Technical Abstract

Patent Claims

30 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A computer-implemented method of training an ensemble machine learning system comprising an ensemble, wherein the ensemble comprises a plurality of ensemble members, the method comprising: training, by a computer system, individually the plurality of ensemble members on a training data set, wherein each of the plurality of ensemble members is trained according to an associated objective for the ensemble member; and after training the plurality of ensemble members, training, by the computer system, a consolidated machine learning system, wherein: the consolidated machine learning system comprises the plurality of ensemble members and a joint optimization machine learning system, such that an output from each of the plurality of ensemble members is input to the joint optimization machine learning system; the joint optimization machine learning system is trained to optimize a shared objective for the ensemble; and each of the plurality of ensemble members is trained according to both the shared objective for the ensemble and the associated objective for the ensemble member, such that, in the training of the consolidated machine learning system, partial derivatives of the shared objective from the joint optimization machine learning system are back-propagated to the outputs of the plurality of ensemble members.

2. The computer-implemented method of claim 1 , wherein the associated objective for each of the plurality of ensemble members is unique.

3. The computer-implemented method of claim 1 , wherein: each of the plurality of ensemble members comprises an output detector node; and the associated objective comprises a subset of the training data set as a target for each output detector node.

4. The computer-implemented method of claim 1 , wherein: each of the plurality of ensemble members comprises an output detector node; the training data set comprises a first subset and a second subset that is disjoint from the first subset; and the associated objective comprises: a first value for the output detector node when a training data item falls within the first subset of the training data set; and a second value for the output detector node when the training data item falls within the second subset of the training data set.

5. The computer-implemented method of claim 1 , wherein training the consolidated machine learning system comprises: computing, by the computer system, feed-forward activations for each of the plurality of ensemble members for a training data item from a training data set; computing, by the computer system, feed-forward activations for the joint optimization machine learning system for the training data item; back propagating, by the computer system, partial derivatives of the shared objective through the joint optimization machine learning system; computing, by the computer system, a weighted sum of the partial derivatives of the shared objective and a derivative of the associated objective for each of the plurality of ensemble members; estimating, by the computer system, an update term for each of the plurality of ensemble members according to the weighted sum; and updating, by the computer system, learned parameters of each of the plurality of ensemble members according to the update term.

6. The computer-implemented method of claim 5 , wherein estimating the update term comprises: back propagating, by the computer system, a derivative of the weighted sum through each of the plurality of ensemble members.

7. The computer-implemented method of claim 5 , wherein estimating the update term comprises: storing, by the computer system, current values of learned parameters of each of the plurality of ensemble members as stored values; determining, by the computer system, updated values of the learned parameters of each of the plurality of ensemble members for a positive-example training data item; adding, by the computer system, a difference between the stored values and the updated values of the learned parameters to an accumulated gradient estimate for the training data set; and resetting, by the computer system, the learned parameters to the stored values.

8. The computer-implemented method of claim 5 , wherein estimating the update term comprises: storing, by the computer system, current values of learned parameters of each of the plurality of ensemble members as stored values; determining, by the computer system, first updated values of the learned parameters of each of the plurality of ensemble members for a negative-example training data item; resetting, by the computer system, the learned parameters to the stored values; determining, by the computer system, second updated values of the learned parameters of each of the plurality of ensemble members for a positive-example training data item; adding, by the computer system, an averaged difference between the first updated values and the second updated values of the learned parameters to an accumulated gradient estimate for the training data set; and resetting, by the computer system, the learned parameters to the stored values.

9. The computer-implemented method of claim 5 , wherein the weighted sum comprises a weight applied to the partial derivatives of the shared objective relative to the derivative of the associated objective for each of the plurality of ensemble members.

10. The computer-implemented method of claim 9 , further comprising: controlling, by the computer system, the weight according to a training progress of each of the plurality of ensemble members.

11. The computer-implemented method of claim 10 , wherein controlling the weight according to the training progress of each of the plurality of ensemble members comprises: reducing, by the computer system, the weight as each of the plurality of ensemble members reaches convergence.

12. The computer-implemented method of claim 1 , wherein the plurality of ensemble members comprises a plurality of different machine learning system types.

13. The computer-implemented method of claim 1 , wherein the plurality of ensemble members comprises a single machine learning system type.

14. The computer-implemented method of claim 13 , wherein the single machine learning system type comprises a neural network.

15. The computer-implemented method of claim 14 , wherein each neural network comprises a same number of layers, a same number of nodes within each of the layers, and a same arrangement of directed arc connections between the nodes.

16. A computer system for training an ensemble machine learning system comprising an ensemble, wherein the ensemble comprises a plurality of ensemble members, the computer system comprising: a processor; and a memory coupled to the processor, the memory storing: the plurality of ensemble members; a joint optimization machine learning system; and instructions that, when executed by the processor, cause the computer system to: train individually the plurality of ensemble members on a training data set, wherein each of the plurality of ensemble members is trained according to an associated objective for the ensemble member; and after training the plurality of ensemble members, train a consolidated machine learning system, wherein: the consolidated machine learning system comprises the plurality of ensemble members and the joint optimization machine learning system, such that an output from each of the plurality of ensemble members is input to the joint optimization machine learning system; the joint optimization machine learning system is trained according to a shared objective; and each of the plurality of ensemble members is trained according to both the shared objective and the associated objective for the ensemble member such that, in the training of the consolidated machine learning system, partial derivatives of the shared objective from the joint optimization machine learning system are back-propagated to the outputs of the plurality of ensemble members.

17. The computer system of claim 16 , wherein the associated objective for each of the plurality of ensemble members is unique.

18. The computer system of claim 16 , wherein: each of the plurality of ensemble members comprises an output detector node; and the associated objective comprises a subset of the training data set as a target for each output detector node.

19. The computer system of claim 16 , wherein: each of the plurality of ensemble members comprises an output detector node; the training data set comprises a first subset and a second subset that is disjoint from the first subset; and the associated objective comprises: a first value for the output detector node when a training data item falls within the first subset of the training data set; and a second value for the output detector node when the training data item falls within the second subset of the training data set.

20. The computer system of claim 16 , wherein the instructions cause the computer system to train the consolidated machine learning system by causing the computer system to: compute feed-forward activations for each of the plurality of ensemble members for a training data item from a training data set; compute feed-forward activations for the joint optimization machine learning system for the training data item; back propagate partial derivatives of the shared objective through the joint optimization machine learning system; compute a weighted sum of the partial derivatives of the shared objective and a derivative of the associated objective for each of the plurality of ensemble members; estimate an update term for each of the plurality of ensemble members according to the weighted sum; and update learned parameters of each of the plurality of ensemble members according to the update term.

21. The computer system of claim 20 , wherein the instructions cause the computer system to estimate the update term by causing the computer system to: back propagate a derivative of the weighted sum through each of the plurality of ensemble members.

22. The computer system of claim 20 , wherein the instructions cause the computer system to estimate the update term by causing the computer system to: store current values of learned parameters of each of the plurality of ensemble members as stored values; determine updated values of the learned parameters of each of the plurality of ensemble members for a positive-example training data item; add a difference between the stored values and the updated values of the learned parameters to an accumulated gradient estimate for the training data set; and reset the learned parameters to the stored values.

23. The computer system of claim 20 , wherein the instructions cause the computer system to estimate the update term by causing the computer system to: store current values of learned parameters of each of the plurality of ensemble members as stored values; determine first updated values of the learned parameters of each of the plurality of ensemble members for a negative-example training data item; reset the learned parameters to the stored values; determine second updated values of the learned parameters of each of the plurality of ensemble members for a positive-example training data item; add an averaged difference between the first updated values and the second updated values of the learned parameters to an accumulated gradient estimate for the training data set; and reset the learned parameters to the stored values.

24. The computer system of claim 20 , wherein the weighted sum comprises a weight applied to the partial derivatives of the shared objective relative to the derivative of the associated objective for each of the plurality of ensemble members.

25. The computer system of claim 24 , wherein the instructions further cause the computer system to: control the weight according to a training progress of each of the plurality of ensemble members.

26. The computer system of claim 25 , wherein the instructions cause the computer system to control the weight according to the training progress of each of the plurality of ensemble members by causing the computer system to: reduce the weight as each of the plurality of ensemble members reaches convergence.

27. The computer system of claim 16 , wherein the plurality of ensemble members comprises a plurality of different machine learning system types.

28. The computer system of claim 16 , wherein the plurality of ensemble members comprises a single machine learning system type.

29. The computer system of claim 28 , wherein the single machine learning system type comprises a neural network.

30. The computer system of claim 29 , wherein each neural network comprises a same number of layers, a same number of nodes within each of the layers, and a same arrangement of directed arc connections between the nodes.

Patent Metadata

Filing Date

Unknown

Publication Date

January 11, 2022

Inventors

James K. BAKER

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search