Multimodal Sequential Recommendation with Window Co-Attention

PublishedJuly 25, 2023

Assigneenot available in USPTO data we have

InventorsHandong Zhao Zhankui He Zhe Lin Zhaowen Wang Ajinkya Gorakhnath Kale

Technical Abstract

Patent Claims

15 claims

Legal claims defining the scope of protection, as filed with the USPTO.

3. The method as recited in claim 2, the identifying the content item recommendation further including generating, by the modality-wise attentive encoder module and for each candidate content item of the multiple candidate content items, a corresponding candidate encoding based on a content item encoding of the content item identifier of the candidate content item and modality encodings of the multiple modalities for the content item.

4. The method as recited in claim 2, further comprising applying, in generating the feature outputs, a sliding window masking operation so that the feature outputs for each content item in the content item sequence are generated by analyzing the content item sequence encoding and interaction between the content item sequence encoding and the multiple modality encodings only for other content items in the content item sequence encoding within a threshold distance of the content item.

5. The method as recited in claim 4, further comprising generating the feature outputs by ignoring one or more content items in the content item sequence encoding that are not within the threshold distance of the content item.

6. The method as recited in claim 1, the creating including, for each content item in the content item sequence, maintaining a one-to-one correspondence between the content item encoding for the content item and each of the multiple modalities corresponding to the content item.

7. The method as recited in claim 1, the multiple modalities including multiple of textual, visual, audible, and categorical.

8. The method as recited in claim 1, further comprising terminating presentation of the recommendation to the user in response to an event.

9. The method as recited in claim 1, wherein the content item modality encoding module generates an encoding of text information by generating a vector encoding of the text information.

10. The method as recited in claim 1, wherein the content item modality encoding module generates an encoding of visual information by generating a vector encoding of the visual information.

13. The computing device as recited in claim 12, the identifying the content item recommendation further including generating, by the modality-wise attentive encoder module and for each candidate content item of the multiple candidate content items, a corresponding candidate encoding based on a content item encoding of the content item identifier of the candidate content item and modality encodings of the multiple modalities for the content item.

14. The computing device as recited in claim 12, the operations further comprising applying, in generating the feature outputs, a sliding window masking operation so that the feature outputs for each content item in the content item sequence are generated by analyzing the content item sequence encoding and interaction between the content item sequence encoding and the multiple modality encodings only for other content items in the content item sequence encoding within a threshold distance of the content item.

15. The computing device as recited in claim 14, the operations further comprising generating the feature outputs by ignoring one or more content items in the content item sequence encoding that are not within the threshold distance of the content item.

16. The computing device as recited in claim 11, the creating including, for each content item in the content item sequence, maintaining a one-to-one correspondence between the content item encoding for the content item and each of the multiple modalities corresponding to the content item.

17. The computing device as recited in claim 16, the multiple modalities including multiple of textual, visual, audible, and categorical.

19. The system in claim 18, wherein creating the aggregated information encoding includes applying a sliding window masking operation so that data for each content item in the content item sequence and the multiple modality encodings are analyzed only for other content items in the content item sequence encoding within a threshold distance of the content item.

20. The system in claim 18, the multiple modalities including multiple of textual, visual, audible, and categorical.

Patent Metadata

Filing Date

Unknown

Publication Date

July 25, 2023

Inventors

Handong Zhao

Zhankui He

Zhe Lin

Zhaowen Wang

Ajinkya Gorakhnath Kale

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search