11711581

Multimodal Sequential Recommendation with Window Co-Attention

PublishedJuly 25, 2023
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
15 claims

Legal claims defining the scope of protection, as filed with the USPTO.

3

3. The method as recited in claim 2, the identifying the content item recommendation further including generating, by the modality-wise attentive encoder module and for each candidate content item of the multiple candidate content items, a corresponding candidate encoding based on a content item encoding of the content item identifier of the candidate content item and modality encodings of the multiple modalities for the content item.

4

4. The method as recited in claim 2, further comprising applying, in generating the feature outputs, a sliding window masking operation so that the feature outputs for each content item in the content item sequence are generated by analyzing the content item sequence encoding and interaction between the content item sequence encoding and the multiple modality encodings only for other content items in the content item sequence encoding within a threshold distance of the content item.

5

5. The method as recited in claim 4, further comprising generating the feature outputs by ignoring one or more content items in the content item sequence encoding that are not within the threshold distance of the content item.

6

6. The method as recited in claim 1, the creating including, for each content item in the content item sequence, maintaining a one-to-one correspondence between the content item encoding for the content item and each of the multiple modalities corresponding to the content item.

7

7. The method as recited in claim 1, the multiple modalities including multiple of textual, visual, audible, and categorical.

8

8. The method as recited in claim 1, further comprising terminating presentation of the recommendation to the user in response to an event.

9

9. The method as recited in claim 1, wherein the content item modality encoding module generates an encoding of text information by generating a vector encoding of the text information.

10

10. The method as recited in claim 1, wherein the content item modality encoding module generates an encoding of visual information by generating a vector encoding of the visual information.

13

13. The computing device as recited in claim 12, the identifying the content item recommendation further including generating, by the modality-wise attentive encoder module and for each candidate content item of the multiple candidate content items, a corresponding candidate encoding based on a content item encoding of the content item identifier of the candidate content item and modality encodings of the multiple modalities for the content item.

14

14. The computing device as recited in claim 12, the operations further comprising applying, in generating the feature outputs, a sliding window masking operation so that the feature outputs for each content item in the content item sequence are generated by analyzing the content item sequence encoding and interaction between the content item sequence encoding and the multiple modality encodings only for other content items in the content item sequence encoding within a threshold distance of the content item.

15

15. The computing device as recited in claim 14, the operations further comprising generating the feature outputs by ignoring one or more content items in the content item sequence encoding that are not within the threshold distance of the content item.

16

16. The computing device as recited in claim 11, the creating including, for each content item in the content item sequence, maintaining a one-to-one correspondence between the content item encoding for the content item and each of the multiple modalities corresponding to the content item.

17

17. The computing device as recited in claim 16, the multiple modalities including multiple of textual, visual, audible, and categorical.

19

19. The system in claim 18, wherein creating the aggregated information encoding includes applying a sliding window masking operation so that data for each content item in the content item sequence and the multiple modality encodings are analyzed only for other content items in the content item sequence encoding within a threshold distance of the content item.

20

20. The system in claim 18, the multiple modalities including multiple of textual, visual, audible, and categorical.

Patent Metadata

Filing Date

Unknown

Publication Date

July 25, 2023

Inventors

Handong Zhao
Zhankui He
Zhe Lin
Zhaowen Wang
Ajinkya Gorakhnath Kale

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “Multimodal Sequential Recommendation with Window Co-Attention” (11711581). https://patentable.app/patents/11711581

© 2026 Patentable. All rights reserved.

Patentable is a research and drafting-assistant tool, not a law firm, and does not provide legal advice. Documents we generate are drafts for review by a licensed patent attorney.