Feature-Augmented Neural Networks and Applications of Same

PublishedDecember 13, 2016

Assigneenot available in USPTO data we have

InventorsGeoffrey G. Zweig Tomas Mikolov

Technical Abstract

Patent Claims

20 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A method performed using one or more processing devices, the method comprising: receiving a word input vector at an input layer of a neural network, the word input vector representing an individual word from an input sequence of words; receiving a topic feature vector at the input layer of the neural network, the topic feature vector being separate from the word input vector and representing topics expressed in the input sequence of words; using the neural network to generate an output vector at an output layer of the neural network based at least on the word input vector and the topic feature vector, wherein using the neural network includes, by a hidden layer of the neural network: modifying the word input vector using a first learned matrix; and modifying the topic feature vector using a second learned matrix that is separate from the first learned matrix, wherein the output vector represents a word probability given the word input vector and the topic feature vector; and performing a natural language processing operation based at least on the word probability represented by the output vector.

2. The method of claim 1 , wherein using the neural network includes: by the hidden layer of the neural network, modifying a time-delayed hidden-state vector with a third learned matrix, wherein the time-delayed hidden-state vector represents an output of the hidden layer in a prior time instance, wherein the word input vector, the topic feature vector, and the time-delayed hidden-state vector are separate vectors, and wherein the first learned matrix, the second learned matrix, and the third learned matrix are separate matrices.

3. The method of claim 2 , wherein using the neural network includes, by the output layer of the neural network: modifying the output of the hidden layer with a fourth learned matrix; and modifying the topic feature vector with a fifth learned matrix, wherein the first learned matrix, the second learned matrix, the third learned matrix, the fourth learned matrix, and the fifth learned matrix are separate matrices.

4. The method of claim 2 , wherein using the neural network includes, by the hidden layer: performing a first multiplication operation of the word input vector by the first learned matrix to generate a first multiplication output; performing a second multiplication operation of the topic feature vector by the second learned matrix to generate a second multiplication output; performing a third multiplication operation of the time-delayed hidden-state vector by the third learned matrix to generate a third multiplication output; and summing the first multiplication output, the second multiplication output, and the third multiplication output to generate the output of the hidden layer.

5. The method of claim 1 , further comprising: generating the topic feature vector using a Latent Dirichlet Allocation (LDA) technique; and as subsequent words from the input sequence are processed using the neural network, incrementally generating next topic feature vectors based at least on previous feature vectors.

6. The method of claim 5 , wherein said incrementally generating comprises applying a decay factor to previous topic feature vectors for previous words that have already been processed by the neural network.

7. The method of claim 1 , wherein the input sequence of words is part of an input document.

8. A system comprising: at least one processing device; and at least one computer readable medium storing instructions which, when executed by the at least one processing device, cause the at least one processing device to: receive a word input vector at an input layer of a neural network, the word input vector representing an individual word from an input sequence of words; receive a topic feature vector at the input layer of the neural network, the topic feature vector being separate from the word input vector and representing topics expressed in the input sequence of words; use the neural network to generate an output vector at an output layer of the neural network based at least on the word input vector and the topic feature vector, wherein using the neural network includes, by a hidden layer of the neural network: modifying the word input vector using a first learned matrix; and modifying the topic feature vector using a second learned matrix that is separate from the first learned matrix, wherein the output vector represents a word probability given the word input vector and the topic feature vector; and perform a natural language processing operation based at least on the word probability represented by the output vector.

9. The system of claim 8 , wherein the instructions, when executed by the at least one processing device, cause the at least one processing device to: by the hidden layer of the neural network, modify a time-delayed hidden-state vector with a third learned matrix, wherein the time-delayed hidden-state vector represents an output of the hidden layer in a prior time instance, wherein the word input vector, the topic feature vector, and the time-delayed hidden-state vector are separate vectors, and wherein the first learned matrix, the second learned matrix, and the third learned matrix are separate matrices.

10. The system of claim 9 , wherein the instructions, when executed by the at least one processing device, cause the at least one processing device to: by the output layer of the neural network: modify the output of the hidden layer with a fourth learned matrix; and modify the topic feature vector with a fifth learned matrix, wherein the first learned matrix, the second learned matrix, the third learned matrix, the fourth learned matrix, and the fifth learned matrix are separate matrices.

11. The system of claim 9 , wherein the instructions, when executed by the at least one processing device, cause the at least one processing device to: by the hidden layer of the neural network: perform a first multiplication operation of the word input vector by the first learned matrix to generate a first multiplication output; perform a second multiplication operation of the topic feature vector by the second learned matrix to generate a second multiplication output; perform a third multiplication operation of the time-delayed hidden-state vector by the third learned matrix to generate a third multiplication output; and sum the first multiplication output, the second multiplication output, and the third multiplication output to generate the output of the hidden layer.

12. The system of claim 8 , wherein the instructions, when executed by the at least one processing device, cause the at least one processing device to: generate the topic feature vector using a Latent Dirichlet Allocation (LDA) technique; and as subsequent words from the input sequence are processed using the neural network, incrementally generate next topic feature vectors based at least on previous feature vectors.

13. The system of claim 12 , wherein the instructions, when executed by the at least one processing device, cause the at least one processing device to: apply a decay factor to previous topic feature vectors for previous words that have already been processed by the neural network.

14. The system of claim 8 , wherein the input sequence of words is part of an input document.

15. At least one computer readable storage medium storing instructions which, when executed by at least one processing device, cause the at least one processing device to perform acts comprising: receiving a word input vector at an input layer of a neural network, the word input vector representing an individual word from an input sequence of words; receiving a topic feature vector at the input layer of the neural network, the topic feature vector being separate from the word input vector and representing topics expressed in the input sequence of words; using the neural network to generate an output vector at an output layer of the neural network based at least on the word input vector and the topic feature vector, wherein using the neural network includes, by a hidden layer of the neural network: modifying the word input vector using a first learned matrix; and modifying the topic feature vector using a second learned matrix that is separate from the first learned matrix, wherein the output vector represents a word probability given the word input vector and the topic feature vector; and performing a natural language processing operation based at least on the word probability represented by the output vector.

16. The at least one computer readable storage medium of claim 15 , wherein using the neural network includes: by the hidden layer of the neural network, modifying a time-delayed hidden-state vector with a third learned matrix, wherein the time-delayed hidden-state vector represents an output of the hidden layer in a prior time instance, wherein the word input vector, the topic feature vector, and the time-delayed hidden-state vector are separate vectors, and wherein the first learned matrix, the second learned matrix, and the third learned matrix are separate matrices.

17. The at least one computer readable storage medium of claim 16 , wherein using the neural network includes, by the output layer of the neural network: modifying the output of the hidden layer with a fourth learned matrix; and modifying the topic feature vector with a fifth learned matrix, wherein the first learned matrix, the second learned matrix, the third learned matrix, the fourth learned matrix, and the fifth learned matrix are separate matrices.

18. The at least one computer readable storage medium of claim 16 wherein using the neural network includes, by the hidden layer: performing a first multiplication operation of the word input vector by the first learned matrix to generate a first multiplication output; performing a second multiplication operation of the topic feature vector by the second learned matrix to generate a second multiplication output; performing a third multiplication operation of the time-delayed hidden-state vector by the third learned matrix to generate a third multiplication output; and summing the first multiplication output, the second multiplication output, and the third multiplication output to generate the output of the hidden layer.

19. The at least one computer readable storage medium of claim 15 , the acts further comprising: generating the topic feature vector using a Latent Dirichlet Allocation (LDA) technique; and as subsequent words from the input sequence are processed using the neural network, incrementally generating next topic feature vectors based at least on previous feature vectors.

20. The at least one computer readable storage medium of claim 19 , wherein said incrementally generating comprises applying a decay factor to previous topic feature vectors for previous words that have already been processed by the neural network.

Patent Metadata

Filing Date

Unknown

Publication Date

December 13, 2016

Inventors

Geoffrey G. Zweig

Tomas Mikolov

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search