A three phase process and system is disclosed for automatically and adaptively filtering and classifying electronic text-based messages, such as e-mail, e-commerce transactions, CGI forms, and optically scanned and textualized written and facsimile messages. In the first phase of processing, the message is subjected to one or more feature extraction methodologies. The output signals from the first phase are then clustered in the second phase of processing using one or more clustering methodologies. The second phase yields a suggested five characteristics of the message: attitude, issue or problem, request, customer type, and author education level. In the third phase, a human operator interface presents the original message along with the proposed properties and allows an operator to correct or tune the properties, and corrections and tuning being fed back into the network of a feature extraction and clustering methodologies. Finally, the architecture of the system is such that feature extraction and clustering methodologies may be added, updated, or removed in a module fashion to allow the system to be customized to various applications and to allow the system to be modernized as new algorithms become available.
Legal claims defining the scope of protection, as filed with the USPTO.
1. A system for filtering and modeling electronic text messages comprising: a message reception means for receiving an electronic text message into the system, said text message having a header and a body, said body containing a natural language text message from an author; a feature extraction means for performing natural language analysis of the text message from the message reception means, said feature extraction means producing one or more output signals relating to any of keyword frequencies, word co-occurrence statistics, a dimensionally-reduced representation of the keyword frequencies, phoneme frequencies, structural pattern statistics for any of sentences, paragraphs and pages, estimated education level of the author, and customer type; a clustering means receiving said output signals from said feature extraction means, said clustering means producing a set of assigned properties based upon the content of the body of the electronic message, said assigned properties including at least one of attitude, one or more issues presented, one or more requests, an author type, and an author's education level; and a learning process which receives said assigned properties and performs relevance ranking and query by example, and which is capable of learning changes to said assigned properties submitted via a user interface such that rules and thresholds used in said feature extraction means and/or clustering means are updated automatically in real time without operator intervention.
2. The system for filtering and modeling electronic text messages of claim 1 wherein said message reception means includes an electronic mail reception means.
3. The system for filtering and modeling electronic text messages of claim 2 wherein said electronic mail reception means includes a means for receiving electronic mail using the Internet Network Information Center RFC821 Simple Mail Transfer Protocol.
4. The system for filtering and modeling electronic text messages of claim 2 wherein said electronic mail reception means includes a means for receiving electronic mail using the protocol of the International Telecommunications Union Recommendation X.400.
5. The system for filtering and modeling electronic text messages of claim 1 , further comprising an electronic mail transmission means.
6. The system for filtering and modeling electronic text messages of claim 5 wherein said electronic mail transmission means includes a means for transmitting electronic mail using the Simple Mail Transfer Protocol.
7. The system for filtering and modeling electronic text messages of claim 5 wherein said electronic mail transmission means includes a means for transmitting electronic mail using the protocol of the International Telecommunications Union Recommendation X.400.
8. The system for filtering and modeling electronic text messages of claim 1 wherein said message reception means further comprises an interface means to an asynchronous data network.
9. The system for filtering and modeling electronic text messages of claim 8 wherein said asynchronous data network further comprises an Ethernet local area network.
10. The system for filtering and modeling electronic text messages of claim 8 wherein said asynchronous data network further comprises a protocol handler with Transfer Control Protocol/Internet Protocol Internet capabilities.
11. The system for filtering and modeling electronic text messages of claim 1 , further comprising an interface to an asynchronous data network.
12. The system for filtering and modeling electronic text messages of claim 11 wherein said asynchronous data network further comprises Ethernet local area network.
13. The system for filtering and modeling electronic text messages of claim 11 wherein said asynchronous data network further comprises a protocol handler with Transfer Control Protocol/Internet Protocol Internet capabilities.
14. The system for filtering and modeling electronic text messages of claim 1 wherein said message reception means further comprises a database interface means.
15. The system for filtering and modeling electronic text messages of claim 14 wherein said database means is a database with an open database interface.
16. The system for filtering and modeling electronic text messages of claim 1 , further comprising an interface to a database means.
17. The system for filtering and modeling electronic text messages of claim 16 wherein said database means comprises an open database interface.
18. The system for filtering and modeling electronic text messages of claim 1 wherein said feature extraction means further comprises a keyword frequency analysis means which outputs a multi-dimensional keyword frequency signal.
19. The system for filtering and modeling electronic text messages of claim 1 wherein said feature extraction means further comprises a morphological process means for increasing the probability of pattern recognition.
20. The system for filtering and modeling electronic text messages of claim 1 wherein said feature extraction means further comprises a natural language processing means.
21. The system for filtering and modeling electronic text messages of claim 1 wherein said feature extraction means further comprises a dimensional reduction means which employs thesauri.
22. The system for filtering and modeling electronic text messages of claim 1 wherein said feature extraction means further comprises a word co-occurrence analysis means, which outputs a word co-occurrence statistics signal.
23. The system for filtering and modeling electronic text messages of claim 1 wherein said feature extraction means further comprises a syllabic analysis means which outputs a phoneme frequency signal.
24. The system for filtering and modeling electronic text messages of claim 1 wherein said feature extraction means further comprises word-level sentence, paragraph and page structure analysis means which outputs a structural pattern signal.
25. The system for filtering and modeling electronic text messages of claim 1 wherein said feature extraction means further comprises an author profile estimation means, out pulling an author profile signal.
26. The system for filtering and modeling electronic text messages of claim 1 wherein said feature extraction means further comprises an author education level estimation means, outputting an author education level signal.
27. The system for filtering and modeling electronic text messages of claim 1 wherein said clustering means further comprises a k-means means for producing message tags in the message tag set.
28. The system for filtering and modeling electronic text messages of claim 1 wherein said clustering means further comprises a isodata means for producing message tags in the message tag set.
29. The system for filtering and modeling electronic text messages of claim 1 wherein said clustering means further comprises a backpropagation learning analysis means for producing message tags in the message tag set.
30. The system for filtering and modeling electronic text messages of claim 1 wherein said message tag set further comprises an author's attitude tag.
31. The system for filtering and modeling electronic text messages of claim 1 wherein said message tag set further comprises an issue-problem tag.
32. The system for filtering and modeling electronic text messages of claim 1 wherein said message tag set further comprises a request tag.
33. The system for filtering and modeling electronic text messages of claim 1 wherein said message tag set further comprises an author's profile tag.
34. The system for filtering and modeling electronic text messages of claim 1 wherein said message tag set further comprises an author's education level tag.
35. The system for filtering and modeling electronic text messages of claim 1 further comprises a learning means which includes: a tagged message reception means for receiving said tagged messages from said clustering means; a network update means which is capable of modifying parameters, thresholds, and coefficients within said feature extraction means and within said clustering means; and a user interface means for presenting the received electronic text message and said message tag set, receiving operator input modifying said message tag set, and providing network updates to the system via said network update means.
36. A process for filtering and modeling electronic text messages of asynchronous communications systems comprising the steps of: receiving an electronic text-based message via a reception media, said text message having a header and a body, said body containing a natural language text message from an author; performing feature extraction by performing natural language analysis of the text message to produce one or more output signals relating to any of keyword frequencies, word co-occurrence statistics, a dimensionally-reduced representation of the keyword frequencies, phoneme frequencies, structural pattern statistics for any of sentences, paragraphs, and pages, estimated education level of the author, and customer type; performing clustering according to said feature extraction output signals to produce a set of assigned properties based upon the content of the body of the electronic message, said assigned properties including an attitude, one or more issues presented, on or more requests, an author type, and an author's education level; and performing a learning process by receiving said assigned properties, executing relevance ranking and query by example, and learning changes to said assigned properties submitted via a user interface such that rules and thresholds used in said feature extraction means and/or clustering means are updated automatically in real time without operator intervention.
37. A process for filtering and modeling electronic text messages of asynchronous communications systems of claim 36 wherein said step of feature extraction further comprises performing keyword analysis on said text message.
38. A process for filtering and modeling electronic text messages of asynchronous communications systems of claim 36 wherein said step of feature extraction further comprises performing morphology on said text message.
39. A process for filtering and modeling electronic text messages of asynchronous communications systems of claim 36 wherein said step of feature extraction further comprises performing natural language processing on said text message.
40. A process for filtering and modeling electronic text messages of asynchronous communications systems of claim 36 wherein said step of feature extraction further comprises performing dimensional reduction of said signals using thesauri.
41. A process for filtering and modeling electronic text messages of asynchronous communications systems of claim 36 wherein said step of feature extraction further comprises performing co-occurrence statistical analysis.
42. A process for filtering and modeling electronic text messages of asynchronous communications systems of claim 36 wherein said step of feature extraction further comprises performing syllabic analysis.
43. A process for filtering and modeling electronic text messages of a sychronous communications systems of claim 36 wherein said step of feature extraction further comprises performing word analysis.
44. A process for filtering and modeling electronic text messages of asynchronous communications systems of claim 36 wherein said step of clustering further comprises performing k-means techniques.
45. A process for filtering and modeling electronic text messages of asynchronous communications systems of claim 36 wherein said step of clustering further comprises performing isodata techniques.
46. A process for filtering and modeling electronic text messages of asynchronous communications systems of claim 36 wherein said step of clustering further comprises performing auto-indexing techniques.
47. A process for filtering and modeling electronic text messages of a sychronous communications systems of claim 36 wherein said step of clustering further comprises performing backpropagation learning algorithm techniques.
48. A process for filtering and modeling electronic text messages of asynchronous communications systems of claim 36 further comprising the steps: presentation of the electronic text message and the message tags to a user via a user interface; receiving corrections to said message tags via said user interface from said user; and automatically modifying logic within said determination of inherent factor within said text message.
49. A computer-readable medium containing a data structure for storing property tags for electronic text-based messages comprising. an identifier link to a received electronic text-based message, an entry for an author's apparent attitude; an entry for an issue raised by the message, an entry for a request made in the message; an entry for a demographic profile indication for the author; and an entry for an estimated education level of the author.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
June 1, 1999
April 6, 2004
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.