Artificial Intelligence System for Modeling Emotions Elicited by Videos

PublishedNovember 16, 2021

Assigneenot available in USPTO data we have

Technical Abstract

Patent Claims

20 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A non-transitory computer storage medium storing executable code, wherein the executable code configures a computing system to perform a process comprising: for each video of a plurality of videos in a training data set: accessing a plurality of emotional reactions provided by a plurality of users at a plurality of timecodes of the video, wherein an emotional reaction provided by a first user at a timecode of the video represents emotion experienced by the first user when viewing video content at the timecode of the video, generating, based on machine learning analysis of the plurality of emotional reactions, a reaction timeline representing intensities and categories of emotional reactions elicited at the plurality of timecodes of the video, and identifying, based on content of the video, features of the video occurring at one or more of the plurality of timecodes, wherein the features of the video comprise visual features in image frames of the video or audio features of the video; for each video of the plurality of videos, training a machine learning model to predict emotional reactions elicited by features of the video by: providing the features of the video occurring at the one or more of the plurality of timecodes as input into the machine learning model, providing the reaction timeline as an expected output of the machine learning model based at least in part on the features of the video occurring at one or more of the plurality of timecodes, and training parameters of the machine learning model to predict the expected output from the input; upon completion of the training, storing the parameters of the machine learning model; receiving a new video not in the training data set; providing features of the new video as input to the machine learning model; and predicting, using the parameters of the machine learning model when provided with the features of the new video as input to the machine learning model, emotional reactions that will likely be experienced by a second user when the second user watches the new video as elicited by features of the new video, wherein the emotional reactions likely to be experienced by the second user are predicted by the machine learning model prior to the second user viewing the new video.

2. The non-transitory computer storage medium of claim 1 , wherein the machine learning model comprises an artificial neural network, the non-transitory computer storage medium storing further executable code, wherein the further executable code configures the computing system to perform a further process comprising training the parameters using backpropagation.

3. The non-transitory computer storage medium of claim 1 , wherein a machine learning system comprises the machine learning model and at least one additional machine learning model trained to identify features in video content, the non-transitory computer storage medium storing further executable code, wherein the further executable code configures the computing system to perform a further process comprising: identifying, based on providing content of the new video to the at least one additional machine learning model, a feature timeline representing occurrence of the features of the new video at one or more of a plurality of timecodes of the new video, wherein providing the features of the new video as input to the machine learning model comprises inputting the feature timeline of the new video into the machine learning model, and wherein predicting the emotional reactions elicited by the features of the new video comprises predicting, using the machine learning model, a reaction timeline representing occurrence of at least one category of emotional reaction at one or more of the plurality of timecodes of the new video.

4. The non-transitory computer storage medium of claim 3 , the non-transitory computer storage medium storing further executable code, wherein the further executable code configures the computing system to perform a further process comprising: identifying a third user predicted to enjoy the new video based at least partly on the reaction timeline; and outputting the new video for presentation to the third user.

5. The non-transitory computer storage medium of claim 3 , the non-transitory computer storage medium storing further executable code, wherein the further executable code configures the computing system to perform a further process comprising outputting a graphical representation of the reaction timeline as feedback to a user interface for creating or editing the new video.

6. The non-transitory computer storage medium of claim 3 , wherein the at least one additional machine learning model comprises a convolutional neural network trained to identify visual features in image frames of video content, the non-transitory computer storage medium storing further executable code, wherein the further executable code configures the computing system to perform a further process comprising: inputting image frames of the new video into the convolutional neural network; and using an output of the convolutional neural network to generate the feature timeline of the new video, wherein the feature timeline includes at least one visual feature occurring at one or more of the plurality of timecodes of the new video.

7. The non-transitory computer storage medium of claim 3 , wherein the at least one additional machine learning model is trained to identify audio features in an audio waveform of video content, the non-transitory computer storage medium storing further executable code, wherein the further executable code configures the computing system to perform process comprising: inputting an audio waveform of the new video into the at least one additional machine learning model; and using an output of the at least one additional machine learning model to identify the feature timeline of the new video, wherein the feature timeline includes at least one audio feature occurring at one or more of the plurality of timecodes of the new video.

8. The non-transitory computer storage medium of claim 3 , wherein the at least one additional machine learning model is trained to identify key words or phrases in a dialogue transcript of video content, the non-transitory computer storage medium storing further executable code, wherein the further executable code configures the computing system to perform a further process comprising: inputting a dialogue transcript of the new video into the at least one additional machine learning model; and using an output of the at least one additional machine learning model to identify the feature timeline of the new video, wherein the feature timeline includes at least one key word or phrase occurring at one or more of the plurality of timecodes of the new video.

9. A system comprising: a computer-readable memory storing executable instructions; and one or more computing devices in communication with the computer-readable memory, the one or more computing devices programmed by the executable instructions to at least: for each video of a plurality of videos in a training data set: access a plurality of emotional reactions provided by a plurality of users at a plurality of timecodes of the video, generate, based on machine learning analysis of the plurality of emotional reactions, a reaction timeline representing intensities and categories of emotional reactions elicited at the plurality of timecodes of the video, and identify, based on content of the video, features of the video occurring at one or more of the plurality of timecodes, wherein the features of the video comprise visual features in image frames of the video or audio features of the video; for each video of the plurality of videos, train a machine learning model to predict emotional reactions elicited by features of the video, wherein to train the machine learning model to predict emotional reactions elicited by the features of the video, the one or more computing devices are further programmed by the executable instructions to at least: provide the features of the video occurring at the one or more of the plurality of timecodes as input into the machine learning model, provide the reaction timeline as an expected output of the machine learning model based at least in part on the features of the video occurring at one or more of the plurality of timecodes, and train parameters of the machine learning model to predict the expected output from the input; upon completion of the training, store the parameters of the machine learning model; receive a new video not in the training data set; provide features of the new video as input to the machine learning model; and predict emotional reactions likely to be experienced by a first user when watching the new video as elicited by the features of the new video using the parameters of the machine learning model.

10. The system of claim 9 , wherein the machine learning model comprises an artificial neural network, the computer-readable memory storing further executable instructions, wherein the one or more computing devices are further programmed to by the further executable instructions to at least train the parameters using backpropagation.

11. The system of claim 9 , wherein a machine learning system comprises the machine learning model and at least one additional machine learning model trained to identify features in video content, the computer-readable memory storing further executable instructions, wherein the one or more computing devices are further programmed to by the further executable instructions to at least: identify, based on providing content of the new video to the at least one additional machine learning model, a feature timeline representing occurrence of the features of the new video at one or more of a plurality of timecodes of the new video, wherein to provide the features of the new video as input to the machine learning model, the one or more computing devices are further programmed to input the feature timeline of the new video into the machine learning model, and wherein to predict the emotional reactions elicited by the features of the new video, the one or more computing devices are further programmed to predict, using the machine learning model, a reaction timeline representing occurrence of at least one category of emotional reaction at one or more of the plurality of timecodes of the new video.

12. The system of claim 11 , the computer-readable memory storing further executable instructions, wherein the one or more computing devices are further programmed to by the further executable instructions to at least: identify a second user predicted to enjoy the new video based at least partly on the reaction timeline; and output the new video for presentation to the second user.

13. The system of claim 11 , the computer-readable memory storing further executable instructions, wherein the one or more computing devices are further programmed to by the further executable instructions to at least output a graphical representation of the reaction timeline as feedback to a user interface for creating or editing the new video.

14. The system of claim 11 , wherein the at least one additional machine learning model comprises a convolutional neural network trained to identify visual features in image frames of video content, the computer-readable memory storing further executable instructions, wherein the one or more computing devices are further programmed to by the further executable instructions to at least: input image frames of the new video into the convolutional neural network; and use an output of the convolutional neural network to identify the feature timeline of the new video, wherein the feature timeline includes at least one visual feature occurring at one or more of the plurality of timecodes of the new video.

15. The system of claim 11 , wherein the at least one additional machine learning model is trained to identify audio features in an audio waveform of video content, the computer-readable memory storing further executable instructions, wherein the one or more computing devices are further programmed to by the further executable instructions to at least: input an audio waveform of the new video into the at least one additional machine learning model; and use an output of the at least one additional machine learning model to identify the feature timeline of the new video, wherein the feature timeline includes at least one audio feature occurring at one or more of the plurality of timecodes of the new video.

16. A computer-implemented method comprising, as performed by at least one computing device configured to execute specific instructions: for each video of a plurality of videos in a training data set: accessing a plurality of emotional reactions provided by a plurality of users at a plurality of timecodes of the video, generating, based on machine learning analysis of the plurality of emotional reactions, a reaction timeline representing intensities and categories of emotional reactions elicited at the plurality of timecodes of the video, and identifying, based on content of the video, features of the video occurring at one or more of the plurality of timecodes, wherein the features of the video comprise visual features in image frames of the video or audio features of the video; for each video of the plurality of videos, training a machine learning model to predict emotional reactions elicited by features of the video by: providing the features of the video occurring at the one or more of the plurality of timecodes as input into the machine learning model, providing the reaction timeline as an expected output of the machine learning model based at least in part on the features of the video occurring at one or more of the plurality of timecodes, and training parameters of the machine learning model to predict the expected output from the input; upon completion of the training, storing the parameters of the machine learning model; receiving a new video not in the training data set; providing features of the new video as input to the machine learning model; and predicting emotional reactions likely to be experienced by a first user when watching the new video as elicited by the features of the new video using the parameters of the machine learning model.

17. The computer-implemented method of claim 16 , wherein a machine learning system comprises the machine learning model and at least one additional machine learning model trained to identify features in video content, the computer-implemented method further comprising: identifying, based on providing content of the new video to the at least one additional machine learning model, a feature timeline representing occurrence of the features of the new video at one or more of a plurality of timecodes of the new video, wherein providing features of the new video as input to the machine learning model comprises inputting the feature timeline of the new video into the machine learning model, and wherein predicting emotional reactions elicited by the features of the new video comprises predicting, using the machine learning model, a reaction timeline representing occurrence of at least one category of emotional reaction at one or more of the plurality of timecodes of the new video.

18. The computer-implemented method of claim 17 , wherein the at least one additional machine learning model comprises a convolutional neural network trained to identify visual features in image frames of video content, the computer-implemented method further comprising: inputting image frames of the new video into the convolutional neural network; and using an output of the convolutional neural network to identify the feature timeline of the new video, wherein the feature timeline includes at least one visual feature occurring at one or more of the plurality of timecodes of the new video.

19. The computer-implemented method of claim 16 further comprising: determining a second user predicted to enjoy the new video based at least partly on output of the machine learning model; and outputting the new video for presentation to the second user.

20. The computer-implemented method of claim 16 , wherein predicting the emotional reactions likely to be experienced by the first user when watching the new video comprises predicting, prior to the first user watching the new video, the emotional reactions likely to be experienced by the first user when watching the new video.

Patent Metadata

Filing Date

Unknown

Publication Date

November 16, 2021

Inventors

Charles Shearer Dorner

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search