Method for Automatic Deduction of Rules for Matching Content to Categories

PublishedMay 16, 2006

Assigneenot available in USPTO data we have

InventorsWilliam F. Conroy Desiree D. G. Gosby

Technical Abstract

Patent Claims

20 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A method of classifying document content within a strange taxonomy, the strange taxonomy comprising a plurality of first categories in a computer document storage organizational scheme and a plurality of first documents, each first document tagged with at least one first category according to the strange taxonomy, the method comprising the steps of: spidering the strange taxonomy to generate at least one pairing of each first document with each first category with which the each first document is tagged, said strange taxonomy having an internal organizational structure that cannot be viewed by a user who is interacting with the strange taxonomy; creating a rule generation document representing each of the at least one pairings; parsing a second document according to the rule generation document; and classifying the parsed second document into a particular first category, said classifying comprising submitting the parsed second document to a classification engine.

2. The method of claim 1 , wherein the step of spidering the plurality of first documents comprises spidering to retrieve at least one of metadata, a storage location, and a category tag.

3. The method of claim 1 , wherein the step of spidering the plurality of first documents tagged with at least one first category according to the strange taxonomy comprises the steps of: spidering the strange taxonomy with a first spider, the first spider adapted to the strange taxonomy being spidered; creating a third document using the first spider, the third document describing the strange taxonomy, the third document comprising a link to each of the first documents; and spidering the strange taxonomy with a second spider by spidering the third document created by the first spider, the second spider operable to access each of the first documents through the links in the third document.

4. The method of claim 3 , wherein the step of creating the third document comprises creating an XML document.

5. A method of classifying document content within a taxonomy, the taxonomy comprising a plurality of first categories in a computer document storage organizational scheme and a plurality of first documents, each first document tagged with at least one first category according to the taxonomy, the method comprising the steps of: spidering the taxonomy to generate at least one pairing of each first document with each first category with which the each first document is tagged; creating a rule generation document representing each of the at least one pairings; parsing a second document according to the rule generation document; and classifying the parsed second document into a particular first category, said classifying comprising submitting the parsed second document to a classification engine, wherein the taxonomy comprises a strange taxonomy and wherein the step of spidering the plurality of first documents tagged with at least one first category according to the taxonomy comprises the steps of: spidering the strange taxonomy with a first spider, the first spider adapted to the strange taxonomy being spidered; creating a third document using the first spider, the third document describing the strange taxonomy, the third document comprising a link to each of the first documents; and spidering the strange taxonomy with a second spider by spidering the third document created by the first spider, the second spider operable to access each of the first documents through the links in the third document, wherein the steps of spidering the strange taxonomy with the first spider and creating a third document comprise steps taken after the second document is classified into the taxonomy, the second document thereby becoming a first document within the plurality of first documents.

6. A method of classifying document content within a taxonomy, the taxonomy comprising a plurality of first categories in a computer document storage organizational scheme and a plurality of first documents, each first document tagged with at least one first category according to the taxonomy, the method comprising the steps of: spidering the taxonomy to generate at least one pairing of each first document with each first category with which the each first document is tagged; creating a rule generation document representing each of the at least one pairings; parsing a second document according to the rule generation document; and classifying the parsed second document into a particular first category, said classifying comprising submitting the parsed second document to a classification engine, wherein the taxonomy comprises a strange taxonomy and wherein the step of spidering the plurality of first documents tagged with at least one first category according to the taxonomy comprises the steps of: spidering the strange taxonomy with a first spider, the first spider adapted to the strange taxonomy being spidered; creating a third document using the first spider, the third document describing the strange taxonomy, the third document comprising a link to each of the first documents; and spidering the strange taxonomy with a second spider by spidering the third document created by the first spider, the second spider operable to access each of the first documents through the links in the third document, wherein the step of spidering the strange taxonomy with a second spider comprises the step of spidering the strange taxonomy with a second spider after the second document is presented for classification within the taxonomy.

7. The method of claim 3 , further comprising making the third document available for use by document-searching software.

8. The method of claim 1 , wherein the step of creating a rule generation document comprises the steps of: receiving a plurality of first-document-category pairings produced by the spidering step; extracting at least one of a keyword and a pattern of keywords from each of the first documents within the plurality of first documents; associating each at least one of a keyword and a pattern of keywords in each of the first documents with the at least one first category of the first document from which the at least one of a keyword and a pattern of keywords was extracted; and generating rules for mapping at least one of a keyword and a pattern of keywords to the first category.

9. The method of claim 8 , wherein the step of associating each at least one of a keyword and a pattern of keywords in each of the first documents with the at least one first category of the first document from which the at least one of a keyword and a pattern of keywords was extracted further comprises parsing each first document.

10. The method of claim 8 , wherein the step of associating each at least one of a keyword and a pattern of keywords in each of the first documents with the at least one first category of the first document from which the at least one of a keyword and a pattern of keywords was extracted further comprises reading keywords from the metadata of each first document.

11. The method of claim 1 , wherein the rule generation document comprises rules for mapping from at least one of a keyword and a pattern of keywords to one or more first categories, the step of parsing a second document according to the rule generation document comprises the steps of: parsing the second document to determine at least one of a keyword and a pattern of keywords; looking up the at least one of a keyword and a pattern of keywords of the second document in the rule generation document to find at least one of the first categories associated with the at least one of a keyword and a pattern of keywords of the second document; scoring the found at least one first category according to a predetermined criteria; and determining from the scoring the at least one first category comprising the classification of the second document.

12. The method of claim 11 , wherein the step of scoring according to a predetermined criteria comprises scoring by at least one of: similarity to at least one pattern of keywords associated with a first category; frequency of keywords in a first category; commonality of keywords among documents in a first category; absence of particular keywords among documents in a first category; and uniqueness of keywords in a first category.

13. The method of claim 12 , wherein the step of determining from the scoring at least one first category further comprises the steps of selecting one of: a) the at least one first category having a score comprising an extrema among the alternatives; b) at least one first category having a score in a predetermined relationship to a predetermined threshold score; and c) at least one first category having a particular predetermined score.

14. The method of claim 13 , wherein the step of selecting further comprises selecting the at least one first category having the first-in-time score meeting the selection criteria.

15. The method of claim 1 , wherein the step of classifying the parsed second document into at least one first category comprises at least one of the steps of adding data to the metadata of the second document identifying the at least one first category, tagging the second document according to the taxonomy, and storing the second document in a location associated with the at least one first category.

16. The method of claim 1 , wherein the step of classifying the parsed second document into a first category further comprises tagging the parsed second document.

17. A method of classifying document content within a taxonomy, the taxonomy comprising a plurality of first categories in a computer document storage organizational scheme and a plurality of first documents, each first document tagged with at least one first category according to the taxonomy, the method comprising the steps of: spidering the taxonomy to generate at least one pairing of each first document with each first category with which the each first document is tagged; creating a rule generation document representing each of the at least one pairings; parsing a second document according to the rule generation document; and classifying the parsed second document into a particular first category, said classifying the parsed second document into the particular first category comprising submitting the parsed second document to a classification engine, wherein the taxonomy comprises a plurality of strange taxonomies, and further wherein: the step of creating a rule generation document comprises generating a single rule generation document for the plurality of strange taxonomies; and the step of classifying the parsed second document into at least one first category comprises the steps of: classifying the parsed second document into one strange taxonomy within the plurality of strange taxonomies; and classifying the parsed second document into one category within the plurality of categories within the strange taxonomy; the method operable to select one strange taxonomy among the plurality of strange taxonomies within which to classify the second document.

18. A method of classifying document content within a taxonomy, the taxonomy comprising a plurality of first categories in a computer document storage organizational scheme and a plurality of first documents, each first document tagged with at least one first category according to the taxonomy, the method comprising the steps of: spidering the taxonomy to generate at least one pairing of each first document with each first category with which the each first document is tagged; creating a rule generation document representing each of the at least one pairings; parsing a second document according to the rule generation document; and classifying the parsed second document into a particular first category, said classifying the parsed second document into the particular first category comprising submitting the parsed second document to a classification engine, wherein the taxonomy comprises a hierarchy of strange taxonomies, and further wherein: the step of creating a rule generation document comprises at least one of: generating at least one rule within the rule generation document for each strange taxonomy within the hierarchy of strange taxonomies; and creating a rule generation document for each level of the hierarchy of strange taxonomies; and the step of classifying the parsed second document into at least one first category comprises the steps of: classifying the parsed second document into at least one strange taxonomy within the hierarchy of strange taxonomies; and classifying the parsed second document into at least one first category within the at least one strange taxonomy within the hierarchy of strange taxonomies.

19. A method of classifying document content within a taxonomy, the taxonomy comprising a plurality of first categories in a computer document storage organizational scheme and a plurality of first documents, each first document tagged with at least one first category according to the taxonomy, the method comprising the steps of: spidering the taxonomy to generate at least one pairing of each first document with each first category with which the each first document is tagged; creating a rule generation document representing each of the at least one pairings; parsing a second document according to the rule generation document; and classifying the parsed second document into a particular first category, said classifying the parsed second document into the particular first category comprising submitting the parsed second document to a classification engine, wherein the rule generation document comprises rules for mapping from at least one of a keyword and a pattern of keywords to one or more first categories, and wherein the step of parsing the second document according to the rule generation document comprises the steps of: finding no keywords in the parsed second document similar to keywords in the rule generation document; creating a new category within the taxonomy; and classifying the second document in the new category.

20. A method for categorizing the content of a new document within a strange taxonomy, the strange taxonomy comprising a plurality of first categories and a plurality of first documents within at least one of the first categories, wherein a root node for the strange taxonomy has been provided, the plurality of first documents being stored on a computer-readable strorage device, the method being implemented through execution of computer readable program code by a processor of a computer system, said computer readable program code being stored on a computer usable medium, the method comprising the steps of: automatically spidering the strange taxonomy to identify each first category and each document among the plurality of first documents classified within each respective first category; automatically forming pairs for each of the first documents, each pair comprising one of the first documents and the category within which the one of the first documents is classified; automatically extracting at least one of a keyword and a pattern of keywords from each of the first documents in each of the first categories; automatically associating at least one of a keyword and a pattern of keywords extracted from each of the first documents within each of the first categories with the first category in which the first documents are classified; automatically generating rules, each rule mapping at least one of a keyword and patterns of keywords to the first category in which the first documents containing the at least one of a keywords and a pattern of keywords are classified; automatically parsing an unclassified document to determine new keywords therein; and automatically classifying the unclassified document into at least one of a new category and a first category having documents containing at least one of keywords and patterns of keywords similar to the new keywords.

Patent Metadata

Filing Date

Unknown

Publication Date

May 16, 2006

Inventors

William F. Conroy

Desiree D. G. Gosby

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search