A URL verification service is provided that is used to evaluate the trustworthiness of universal resource locators (URLs). As a user browses the world wide web, the URL for a web page to which the user is browsing is captured by the service. The URL has a second level domain corresponding to a web site. The URL verification service identifies a proposed brand that should be associated with the URL if the URL is trustworthy. The proposed brand and the second level domain are used as database queries to query a database such as a search engine database. The results of the database query are processed to determine whether the URL is legitimately associated with the URL. To ensure that the proposed brand is identified accurately, the URL verification service gathers brand information using web page content, secure sockets layer certificate content, or other web site attributes.
Legal claims defining the scope of protection, as filed with the USPTO.
1. A computer-implemented method for determining whether or not web page universal resource locators (URLs) are associated with trusted or fraudulent web sites, comprising: as a user browses the world wide web using computing equipment, obtaining a URL for a web page of unknown authenticity, wherein the URL comprises a second level domain and wherein the second level domain comprises a top level domain; obtaining, with computing equipment, a web site attribute that is associated with the URL and separate from the URL; identifying, with computing equipment, a proposed brand that is associated with the URL using information from the URL and information from the web site attribute, wherein identifying the proposed brand comprises: extracting proposed brand text from the URL by discarding the top level domain from the second level domain; and comparing the extracted proposed brand text with the information from the web site attribute to identify the proposed brand; querying, with computing equipment, a search engine to determine whether the proposed brand is legitimately associated with the URL, comprising: directing the search engine to produce a link count of how many web links exist to the second level domain; and directing the search engine to produce a ranked list of which second level domains contain the proposed brand; and computing, with computing equipment, a Boolean trust factor using the link count and the ranked list.
2. The method defined in claim 1 , wherein the web site attribute comprises a web page title identified by title tags in the web page, the method further comprising: using the web page title in identifying the proposed brand.
3. The method defined in claim 1 , wherein the web site attribute comprises meta-tag information in the web page, the method further comprising: using the meta-tag information in identifying the proposed brand.
4. The method defined in claim 1 , wherein the web site attribute comprises copyright information from a footer of the web page, the method further comprising: using the copyright information in identifying the proposed brand.
5. The method defined in claim 1 , wherein the second level domain corresponds to a web site that has an associated secure sockets layer (SSL) certificate containing organization name and common name fields and wherein the web site attribute comprises the organization name or the common name, the method further comprising: using the organization name or common name in identifying the proposed brand.
6. The method defined in claim 1 wherein identifying the proposed brand comprises: obtaining a text string from the web attribute; removing spaces from the text string to produce a collapsed version of the text string; and attempting to match at least a portion of the second level domain to at least a portion of the collapsed version of the text string.
7. The method defined in claim 1 wherein identifying the proposed brand comprises: obtaining a text string from the web attribute; and attempting to identify whether at least some characters in the second level domain are an acronym by comparing those characters to the text string.
8. The method defined in claim 1 wherein identifying the proposed brand comprises: obtaining a text string from the web attribute; and determining whether at least some characters in the second level domain form an acronym for words in the text string by comparing the characters in the second level domain to word-start characters in the text string.
9. The method defined in claim 1 wherein identifying the proposed brand comprises: obtaining a text string from the web attribute; making linguistically-equivalent substitutions for numeric strings in the second level domain; and determining whether at least a portion of the second level domain in which the linguistically-equivalent substitutions have been made matches at least a portion of the text string obtained from the web attribute.
10. The method defined in claim 1 wherein the computing equipment with which the user browses the world wide web communicates with a sever over a communications network, the method further comprising: examining a cache of previous URL verification results at the computing equipment to determine whether the proposed brand is legitimately associated with the URL; and if examination of the cache reveals that it is unknown whether the proposed brand is legitimately associated with the URL, sending a URL verification request containing the URL from the computing equipment to the server, wherein querying the search engine to determine whether the proposed brand is legitimately associated with the URL comprises using the server to query the search engine to determine whether the proposed brand is legitimately associated with the URL in response to the URL verification request from the computing equipment.
11. A computer-implemented method for determining whether or not web page universal resource locators (URLs) are associated with trusted or fraudulent web sites, comprising: as a user browses the world wide web using computing equipment, obtaining a URL for a web page of unknown authenticity, wherein the URL contains a second level domain; obtaining, with computing equipment, a web page title string from text located between title tags in the web page; comparing, with computing equipment, at least a portion of the second level domain to the web page title string to identify a proposed brand; querying, with computing equipment, a database using at least the proposed brand; using results from querying the database to determine, with computing equipment, whether the proposed brand is legitimately associated with the URL, wherein: querying the database comprises providing the database with at least a portion of the second level domain and the proposed brand as query terms; and using the results comprises performing computations on a link count that represents how many web page links exist to the second level domain and determining which position the second level domain holds in a ranked list of second level domains corresponding to web sites containing the proposed brand; and computing, with computing equipment, a Boolean trust factor using the link count and the ranked list.
12. The computer-implemented method defined in claim 11 wherein querying the database comprises transmitting a search engine query to a search engine over the internet.
13. The computer-implemented method defined in claim 11 wherein querying the database comprises providing the database with at least a portion of the second level domain and the proposed brand as query terms.
14. The computer-implemented method defined in claim 11 wherein comparing the portion of the second level domain to the web page title string to identify the proposed brand comprises determining whether the portion of the second level domain matches a collapsed string formed by removing spaces from the web page title string.
15. The computer-implemented method defined in claim 11 wherein comparing the portion of the second level domain to the web page title string to identify the proposed brand comprises determining whether the portion of the second level domain serves as an acronym for words in the web page title string.
16. The computer-implemented method defined in claim 11 wherein comparing the portion of the second level domain to the web page title string to identify the proposed brand comprises comparing a version of the second level domain in which linguistically-equivalent terms have been substituted for numeric terms to the web page title string.
17. The method defined in claim 11 wherein the computing equipment with which the user browses the world wide web is user computing equipment that communicates with a sever over a communications network, the method further comprising: preprocessing the URL at the user computing equipment to determine its status; and if the status of the URL is uncertain, sending a URL verification request to the server, wherein querying the database comprises using the server to query the database.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
June 30, 2005
August 3, 2010
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.