A system, computer readable medium and method for searching for recently altered documents on the World Wide Web is provided. The method selects a server to be searched or crawled by a Web crawler based on a user selected ranking. Servers are ranked by a filter program which compares a user query with the content of a server and the frequency in which content is altered. A top percentage of ranked servers are crawled and the recently altered information, such as hyperlinks, are then provided to the user.
Legal claims defining the scope of protection, as filed with the USPTO.
1. System for monitoring the World Wide Web (WWW), comprising: a user interface coupled to the WWW operable to obtain user information, wherein the user information includes a query; a ranking component operable to rank a set of servers wherein each one of the set of servers is coupled to the WWW and wherein the ranking is based on at least one of: 1) a comparison of content on each server with the query; and 2) a frequency at which content on each server is altered; and a search engine coupled to the WWW including a Web crawler operable to search at least one of the ranked servers in order of rank based on the query and generate search results wherein the search results refer to content on ranked servers that satisfy the query.
2. The system of claim 1 , wherein the user information includes a keyword.
3. The system of claim 1 , wherein the user information includes a search interval value.
4. The system of claim 1 , wherein the user information includes a percentage searched value.
5. The system of claim 1 , wherein the frequency includes the number of alterations per day.
6. The system of claim 1 , wherein the frequency includes the number of alterations per week.
7. The system of claim 1 , wherein the frequency can be the number of alterations per month.
8. The system of claim 1 , wherein the frequency can be the number of alterations per year.
9. The system of claim 1 , wherein the frequency can be an average of (1) the number of alterations per day, (2) the number of alterations per week, (3) the number of alterations per month, and (4) the number of alterations per year.
10. The system of claim 1 , wherein the comparison of content on each server with the query is accomplished by comparing a content vector for each server with the user information to obtain a content score for each server.
11. A method adapted for obtaining information from the World Wide Web (WWW) comprising the steps of: obtaining a query; calculating a content score of a first document having a first address on the WWW wherein the content score is based on comparing a content vector for the first document with the query; ranking the first document in a set of documents based on at least one of: 1) the content score; and 2) a frequency at which document content is altered; selecting a highest ranked document from the set of documents; and crawling a first processing device on which the highest ranked document is stored to obtain a first altered document.
12. The method of claim 11 , further comprising: providing a hyperlink of the first altered document to a user.
13. The method of claim 11 , further comprising the steps of: obtaining a search interval from a user; and crawling the first processing device periodically, using the search interval.
14. The method of claim 11 , further comprising the steps of: notifying a user that the content of the first document has changed.
15. The method of claim 11 , wherein the step of calculating further includes: obtaining the content vector of the first document.
16. The method of claim 11 , wherein the query includes a key-word.
17. The method of claim 11 , wherein the frequency based on a last modified field in the first document.
18. A machine readable medium having instructions stored thereon that when executed by a processor cause a system to: obtain a query; calculate a content score of a first document having a first address on the World Wide Web (WWW) wherein the content score is based on comparing a content vector for the first document with the query; rank the first document in a set of documents based on at least one of: 1) the content score; and 2) a frequency at which content on the document is altered; select the highest ranked document from the set of documents; and crawl a first processing device on which the highest ranked document is stored to obtain a first altered document.
19. The machine readable medium of claim 18 , further comprising instructions that when executed cause a processor to: provide a hyperlink of the first altered document to a user.
20. The machine readable medium of claim 18 , further comprising instructions that when executed cause a processor to: obtain a search interval from a user; and crawl the first processing device periodically, using the search interval.
21. The machine readable medium of claim 18 , further comprising instructions that when executed cause a processor to: notify a user that the content of the first document has changed.
22. The machine readable medium of claim 18 , further comprising instructions that when executed cause a processor to: obtain the content vector of the first document.
23. The machine readable medium of claim 18 wherein: the query includes a keyword.
24. The machine readable medium of claim 18 wherein: the frequency is based on a last modified field in the document.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
November 29, 1999
June 15, 2004
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.