Patentable/Patents/US-8938508
US-8938508

Correlating web and email attributes to detect spam

PublishedJanuary 20, 2015
Assigneenot available in USPTO data we have
Inventorsnot available in USPTO data we have
Technical Abstract

A computer correlates web and email attributes to detect spam. A security module on a client collects attributes of a web site to which an email address was submitted and attributes of an email message sent to the email address that was previously submitted. The security module analyzes the attributes of the web site and the email message to determine whether the email message was sent to the email address responsive to the submission of the email address to the web site. Based on the analysis, the security module determines whether the email message is spam. A machine learning module on a security server establishes training data describing the attributes of the web site to which email addresses were submitted and attributes of legitimate emails received in response to the address submissions. The machine learning module generates an attributes classifier for the security module for spam detection.

Patent Claims
18 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1

1. A method of detecting spam email messages comprising: using a computer to perform steps comprising: collecting attributes of a web site to which an email address was submitted; collecting attributes of an email message sent to the email address; identifying a degree of correlation between at least one of the collected attributes of the web site and at least one of the collected attributes of the email message, the identifying comprising using a classifier to analyze the at least one collected attribute of the web site and the at least one collected attribute of the email message, wherein the analysis is based at least in part on a plurality of weights describing different values that represent the relative importances of the collected attributes of the web site and email message, wherein the classifier is generated by training on training data describing attributes of training web sites to which email addresses were submitted and legitimate emails received responsive to the submissions of the email addresses to the training web sites, generating the classifier comprising: generating feature vectors from the training data, the feature vectors having features describing the attributes of the training web sites and having features describing the attributes of the legitimate emails received responsive to the submissions of the email addresses to the training web sites; and training the classifier using the feature vectors, the training causing the classifier to learn weights describing relative importances of the features in recognizing when email messages are received in response to email addresses submitted to web sites; and determining whether the email message is spam responsive at least in part to the degree of correlation, a stronger correlation indicating a decreased likelihood that the email message is spam.

2

2. The method of claim 1 , wherein collecting attributes of the web site to which an email address was submitted comprises: collecting one or more primary attributes describing the web site; and collecting one or more secondary attributes derived from the primary attributes.

3

3. The method of claim 2 , wherein the primary attributes describing the web site comprise at least one of an Internet Protocol (IP) address and a Domain Name System (DNS) name of a web server hosting the web site.

4

4. The method of claim 2 , wherein the secondary attributes derived from the primary attributes comprise at least one of geolocation data describing a geographic location of a web server hosting the web site, whether an IP address of the web server is known to be associated with an Internet Service Provider (ISP), information about a domain name registrar at which the DNS name for the web server is registered, and information about a registrant of the DNS name.

5

5. The method of claim 1 , wherein collecting attributes of an email message sent to the email address comprises: collecting one or more primary attributes describing the email message; and collecting one or more secondary attributes derived from the primary attributes.

6

6. The method of claim 5 , wherein the primary attributes describing the email message comprise at least one of a DNS name of a “from” address of the email message, an IP address of a mail server involved in sending the email message, a DNS name of the mail server involved in sending the email message, and attributes of a mail session involved in transmitting the email message.

7

7. The method of claim 5 , wherein the secondary attributes derived from the primary attributes comprise at least one of geolocation data describing a geographic location of the mail server involved in sending the email message, whether the IP address of the mail server is known to be associated with an ISP, information about a domain name registrar at which the DNS of the web server is registered, and information about a registrant of the DNS name.

8

8. A non-transitory computer-readable storage medium storing executable computer program instructions for detecting spam email messages, the computer program instructions comprising instructions for: collecting attributes of a web site to which an email address was submitted; collecting attributes of an email message sent to the email address; identifying a degree of correlation between at least one of the collected attributes of the web site and at least one of the collected attributes of the email message, the identifying comprising using a classifier to analyze the at least one collected attribute of the web site and the at least one collected attribute of the email message, wherein the analysis is based at least in part on a plurality of weights describing different values that represent the relative importances of the collected attributes of the web site and email message, wherein the classifier is generated by training on training data describing attributes of training web sites to which email addresses were submitted and legitimate emails received responsive to the submissions of the email addresses to the training web sites, generating the classifier comprising: generating feature vectors from the training data, the feature vectors having features describing the attributes of the training web sites and having features describing the attributes of the legitimate emails received responsive to the submissions of the email addresses to the training web sites; and training the classifier using the feature vectors, the training causing the classifier to learn weights describing relative importances of the features in recognizing when email messages are received in response to email addresses submitted to web sites; and determining whether the email message is spam responsive at least in part to the degree of correlation, a stronger correlation indicating a decreased likelihood that the email message is spam.

9

9. The computer-readable storage medium of claim 8 , wherein the computer program instructions for collecting attributes of the web site to which an email address was submitted comprise instructions for: collecting one or more primary attributes describing the web site; and collecting one or more secondary attributes derived from the primary attributes.

10

10. The computer-readable storage medium of claim 9 , wherein the primary attributes describing the web site comprise at least one of an IP address and a DNS name of a web server hosting the web site.

11

11. The computer-readable storage medium of claim 9 , wherein the secondary attributes derived from the primary attributes comprise at least one of geolocation data describing a geographic location of a web server hosting the web site, whether an IP address of the web server is known to be associated with an ISP, information about a domain name registrar at which the DNS name for the web server is registered, and information about a registrant of the DNS name.

12

12. The computer-readable storage medium of claim 8 , wherein the computer program instructions for collecting attributes of an email message sent to the email address comprise instructions for: collecting one or more primary attributes describing the email message; and collecting one or more secondary attributes derived from the primary attributes.

13

13. The computer-readable storage medium of claim 12 , wherein the primary attributes describing the email message comprise at least one of a DNS name of a “from” address of the email message, an IP address of a mail server involved in sending the email message, a DNS name of the mail server involved in sending the email message, and attributes of a mail session involved in transmitting the email message.

14

14. The computer-readable storage medium of claim 12 , wherein the secondary attributes derived from the primary attributes comprise at least one of geolocation data describing a geographic location of the mail server involved in sending the email message, whether the IP address of the mail server is known to be associated with an ISP, information about a domain name registrar at which the DNS of the web server is registered, and information about a registrant of the DNS name.

15

15. A computer system for detecting spam email messages comprising: a processor for executing computer program instructions; and a non-transitory computer-readable medium storing executable computer program instructions executable to perform steps comprising: collecting attributes of a web site to which an email address was submitted; collecting attributes of an email message sent to the email address; identifying a degree of correlation between at least one of the collected attributes of the web site and at least one of the collected attributes of the email message, the identifying comprising using a classifier to analyze the at least one collected attribute of the web site and the at least one collected attribute of the email message, wherein the analysis is based at least in part on a plurality of weights describing different values that represent the relative importances of the collected attributes of the web site and email message, wherein the classifier is generated by training on training data describing attributes of training web sites to which email addresses were submitted and legitimate emails received responsive to the submissions of the email addresses to the training web sites, generating the classifier comprising: generating feature vectors from the training data, the feature vectors having features describing the attributes of the training web sites and having features describing the attributes of the legitimate emails received responsive to the submissions of the email addresses to the training web sites; and training the classifier using the feature vectors, the training causing the classifier to learn weights describing relative importances of the features in recognizing when email messages are received in response to email addresses submitted to web sites; and determining whether the email message is spam responsive at least in part to the degree of correlation, a stronger correlation indicating a decreased likelihood that the email message is spam.

16

16. The computer system of claim 15 , wherein collecting attributes of the web site to which an email address was submitted comprises: collecting one or more primary attributes describing the web site; and collecting one or more secondary attributes derived from the primary attributes.

17

17. The computer system of claim 15 , wherein the classifier is generated by training on training data describing attributes of web sites to which email addresses were submitted and legitimate emails received responsive to the submissions of the email addresses to the web sites.

18

18. The method of claim 1 , wherein generating the classifier comprises generating a decision tree classifier having a tree structure including branch nodes and leaf nodes, wherein a leaf node represents a classification indicating whether an attribute of a web site is important in spam detection and contains a confidence score measuring a confidence in the represented classification.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

Patent Metadata

Filing Date

July 22, 2010

Publication Date

January 20, 2015

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “Correlating web and email attributes to detect spam” (US-8938508). https://patentable.app/patents/US-8938508

© 2026 Patentable. All rights reserved.

Patentable is a research and drafting-assistant tool, not a law firm, and does not provide legal advice. Documents we generate are drafts for review by a licensed patent attorney.