10050919

Highly Parallel Scalable Distributed Email Threading Algorithm

PublishedAugust 14, 2018
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
20 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1

1. A method comprising: identifying, by a computing device, a plurality of email subjects in an email corpus stored on a storage device; performing on each computing node of a plurality of nodes, each of said nodes including one or more processors coupled to a memory: retrieving a given email subject from the plurality of email subjects, wherein the given email subject is only retrieved by a single node; identifying a plurality of emails which are associated with the given email subject; storing, on the node, a plurality of emails associated with the given email subject; and reconstructing one or more email threads from the plurality of emails by determining relationships between the plurality of emails based at least in part on: header information responsive to determining the header information is in a given format; and content of the plurality of emails responsive to determining the header information is not in the given format; and conveying the reconstructed one or more email threads to a database for storage.

2

2. The method as recited in claim 1 , further comprising utilizing a distributed queue accessible by each of the plurality of nodes for storing the plurality of email subjects, and wherein each of the plurality of nodes are configured to perform said retrieving, storing, and reconstructing in parallel.

3

3. The method as recited in claim 1 , further comprising organizing a database using an email subject as a primary key and a relaxed hash of derived email content as a secondary key.

4

4. The method as recited in claim 1 , further comprising: determining a status of an email thread is incomplete prior to conveying the email thread to the database responsive to determining a message identifier in the email thread references an email that is not found in the a plurality of emails associated with the given email subject; and marking the email thread as incomplete with a list of message identifiers with missing emails responsive to determining the status is incomplete.

5

5. The method as recited in claim 4 , further comprising: identifying, on a first node, a first email thread as an incomplete email thread; identifying, on a second node, a second email thread as an incomplete email thread; and merging the first email thread and the second email thread responsive to determining that a first email of the first email thread references a second email of the second email thread.

6

6. The method as recited in claim 5 , wherein the first email thread is associated with a first email subject, and wherein the second email thread is associated with a second email subject.

7

7. The method as recited in claim 1 , further comprising: receiving an incremental batch of emails; and threading the incremental batch of emails by forming threads from the batch of emails, wherein threading the incremental batch of emails only updates threads related to emails in the incremental batch of emails.

8

8. A system comprising: a database; and a plurality of computing nodes, each of said nodes including one or more processors coupled to a memory; wherein each node of the plurality of nodes is configured to: retrieve a given email subject from a plurality of email subjects, wherein the given email subject is only retrieved by a single node; store, on the node, a plurality of emails associated with a corresponding email subject, wherein the plurality of emails are retrieved from the database; and reconstruct one or more email threads from the plurality of emails by determining relationships between the plurality of emails based at least in part on: header information responsive to determining the header information is in a given format; and content of the plurality of emails responsive to determining the header information is not in the given format; and convey the reconstructed one or more email threads to a database for storage.

9

9. The system as recited in claim 8 , wherein each of the plurality of nodes are configured to perform said retrieving, storing, and reconstructing in parallel.

10

10. The system as recited in claim 9 , wherein the database is configured to use an email subject as a primary key and a relaxed hash of derived email content as a secondary key.

11

11. The system as recited in claim 9 , wherein determining the relationships between the plurality of emails further comprises removing a sent date from the header information.

12

12. The system as recited in claim 8 , wherein a first node of the plurality of nodes is configured to identify a first email thread as an incomplete email thread, wherein a second node of the plurality of nodes is configured to identify a second email thread as an incomplete email thread, and wherein the first node is further configured to merge the first email thread and the second email thread responsive to determining that the first email of the first email thread references the second email of the second email thread.

13

13. The system as recited in claim 12 , wherein the first email thread is associated with a first email subject, and wherein the second email thread is associated with a second email subject.

14

14. The system as recited in claim 8 , wherein the system is further configured to: receive an incremental batch of emails; and thread the incremental batch of emails by forming threads from the batch of emails, wherein threading the incremental batch of emails only updates threads related to emails in the incremental batch of emails.

15

15. A non-transitory computer readable storage medium storing program instructions, wherein the program instructions are executable by a processor to: identify a plurality of email subjects in an email corpus stored on a storage device; perform on each computing node of a plurality of nodes, each of said nodes including one or more processors coupled to a memory: retrieve a given email subject from the plurality of email subjects, wherein the given email subject is only retrieved by a single node; identify a plurality of emails which are associated with the given email subject; store, on the node, a plurality of emails associated with the given email subject; and reconstruct one or more email threads from the plurality of emails by determining relationships between the plurality of emails based at least in part on: header information responsive to determining the header information is in a given format; and content of the plurality of emails responsive to determining the header information is not in the given format; and convey the reconstructed one or more email threads to a database for storage.

16

16. The non-transitory computer readable storage medium as recited in claim 15 , wherein the program instructions are further executable by a processor to utilize a distributed queue accessible by each of the plurality of nodes for storing the plurality of email subjects, and wherein each of the plurality of nodes are configured to perform said retrieving, storing, and reconstructing in parallel.

17

17. The non-transitory computer readable storage medium as recited in claim 15 , wherein the program instructions are further executable by a processor to organize a database using an email subject as a primary key and a relaxed hash of derived email content as a secondary key.

18

18. The non-transitory computer readable storage medium as recited in claim 15 , wherein the program instructions are further executable by a processor to: determine a status of an email thread is incomplete prior to conveying the email thread to the database responsive to determining a message identifier in the email thread references an email that is not found in the a plurality of emails associated with the given email subject; and mark the email thread as incomplete with a list of message identifiers with missing emails responsive to determining the status is incomplete.

19

19. The non-transitory computer readable storage medium as recited in claim 18 , wherein the program instructions are further executable by a processor to: identify, on a first node, a first email thread as an incomplete email thread; identify, on a second node, a second email thread as an incomplete email thread; and merge the first email thread and the second email thread responsive to determining that a first email of the first email thread references a second email of the second email thread.

20

20. The non-transitory computer readable storage medium as recited in claim 19 , wherein the first email thread is associated with a first email subject, and wherein the second email thread is associated with a second email subject.

Patent Metadata

Filing Date

Unknown

Publication Date

August 14, 2018

Inventors

Nilesh Salpe
Vaijayanti Bharadwaj

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “HIGHLY PARALLEL SCALABLE DISTRIBUTED EMAIL THREADING ALGORITHM” (10050919). https://patentable.app/patents/10050919

© 2026 Patentable. All rights reserved.

Patentable is a research and drafting-assistant tool, not a law firm, and does not provide legal advice. Documents we generate are drafts for review by a licensed patent attorney.