Corpus-Based System and Method for Acquiring Polar Adjectives

PublishedSeptember 10, 2013

Assigneenot available in USPTO data we have

Technical Abstract

Patent Claims

22 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A method for generating a polar vocabulary comprising: extracting textual content from reviews in a corpus of reviews, each of the reviews including an author's rating; identifying a set of frequent nouns from the textual content of the reviews; extracting adjectival terms from the textual content of the reviews, each adjectival term being associated in the textual content with one of the frequent nouns; and with a processor, generating a polar vocabulary including at least some of the extracted adjectival terms, a polarity measure being associated with each adjectival term in the vocabulary which is based on the ratings of the reviews from which the adjectival term was extracted, the generating of the polar vocabulary comprising identifying a set of positive reviews and a set of negative reviews, based on the ratings, and computing, for an identified adjectival term, a measure of its occurrence in the positive and negative sets of reviews, the polarity measure of the term being based on the measure of occurrence.

2. The method of claim 1 , wherein the identifying of the set of frequent nouns comprises: parsing the textual content to identify a set of nouns; optionally, filtering the identified nouns; computing frequencies of the nouns in the corpus of reviews; and identifying a set of the most frequent nouns.

3. The method of claim 1 , wherein the extracting of the adjectival terms includes parsing the textual content to identify an adjectival term which is in a relation with one of the identified frequent nouns.

4. The method of claim 1 , wherein the measure of occurrence is based on a relative frequency of occurrence of the adjectival term in the positive and negative sets of reviews.

5. The method of claim 1 , wherein the measure of occurrence considers negation of a term.

6. The method of claim 1 , wherein the polarity measure is selected from positive and negative polarity.

7. The method of claim 1 , further comprising filtering the identified adjectival terms to remove objective terms.

8. The method of claim 7 , wherein the filtering of the identified adjectival terms to remove objective terms includes retrieving objectivity scores for each context of one of the adjectival terms in a set of contexts recognized in a lexical resource and removing the adjectival term if its objectivity score meets or exceeds a threshold value.

9. The method of claim 1 , further comprising using a lexical resource to identify an error in the assignment of polarity to a term and changing the polarity of that term.

10. The method of claim 1 , wherein the rating is intended to reflect an author's opinion of an item which is the subject of the textual content.

11. The method of claim 1 , wherein the rating is selected from the group consisting of a ratio, a percentage, a score, a textual comment selected from a finite set of textual comments, and combinations thereof.

12. The method of claim 1 , further comprising inputting the polar vocabulary to an opinion mining system.

13. The method of claim 1 , further comprising, obtaining the corpus of reviews from an opinion website by filtering the reviews on the website to identify reviews relating to a selected class of goods or services.

14. The method of claim 13 , wherein the class of goods relates to printers.

15. The method of claim 1 , wherein the extracting textual content from reviews in a corpus of reviews comprises extracting textual content from at least 1000 reviews.

17. A system for performing the method of claim 1 comprising memory which stores instructions for performing the method and a processor in communication with the memory for executing the instructions.

18. An opinion mining system comprising: memory which stores a polar vocabulary generated by the method of claim 1 ; memory, which stores an opinion mining component for extracting an opinion from input text using the polar vocabulary; and a processor which implements the opinion mining component.

19. A computer program product comprising a non-transitory recording medium containing instructions, which when executed on a computer causes the computer to perform the method of claim 1 .

20. A system for generating a polar vocabulary comprising: a parser which extracts textual content from reviews in a corpus of reviews, parses the corpus of reviews to identify nouns from the textual content of the reviews, from which a set of frequent nouns is identified by associating the identified nouns with respective frequencies in the corpus of reviews, based on a number of their occurrences, and extracts adjectival terms from the textual content of the reviews, the adjectival terms being extracted with a parser rule which identifies an adjectival term which is in a relation with one of the frequent nouns; and a vocabulary generator which generates a polar vocabulary comprising adjectival terms identified by the parser, a polarity measure being associated with each adjectival term in the vocabulary which is based on the ratings of the reviews from which the adjectival term was extracted; a processor which implements the parser and vocabulary generator.

21. The system of claim 20 , further comprising a filtering component for removing adjectival terms which are determined to be objective.

22. A method comprising: retrieving a corpus of reviews, each of the reviews including an author-generated rating of an item and textual content comprising the author's comments about the item; based on the rating, assigning each review to one of a set of sub-corpora; identifying frequent nouns from the textual content of the reviews, comprising parsing the corpus of reviews to identify nouns from the textual content of the reviews and identifying a set of frequent nouns based on respective frequencies of the identified nouns in the corpus of reviews; extracting a set of adjectival terms from the textual content of the reviews, each adjectival term being associated in the textual content with one of the frequent nouns; computing, for each of the adjectival terms in the set, a measure of the occurrences of the adjectival term in each of the sub-corpora; and generating a polar vocabulary including at least some of the extracted adjectival terms, a polarity measure being associated with each adjectival term in the vocabulary which is based on the measure of occurrences of the term in each of the sub-corpora.

23. The method of claim 22 , further comprising incorporating the polar vocabulary into an opinion mining system.

Patent Metadata

Filing Date

Unknown

Publication Date

September 10, 2013

Inventors

Caroline Brun

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search