One or more classification algorithms are applied to at least one natural language document in order to extract both attributes and values of a given product. Supervised classification algorithms, semi-supervised classification algorithms, unsupervised classification algorithms or combinations of such classification algorithms may be employed for this purpose. The at least one natural language document may be obtained via a public communication network. Two or more attributes (or two or more values) thus identified may be merged to form one or more attribute phrases or value phrases. Once attributes and values have been extracted in this manner, association or linking operations may be performed to establish attribute-value pairs that are descriptive of the product. In a presently preferred embodiment, an (unsupervised) algorithm is used to generate seed attributes and values which can then support a supervised or semi-supervised classification algorithm.
Legal claims defining the scope of protection, as filed with the USPTO.
1. A method for associating a plurality of attributes and a plurality of values for a product within at least one natural language document to define attribute-value pairs, the method comprising: determining, by a computer, correlations between two or more attributes of the plurality of attributes; identifying at least one attribute phrase based on the correlations between the two or more attributes; determining correlations between two or more values of the plurality of values; identifying at least one value phrase based on the correlations between the two or more values; associating an attribute of the plurality of attributes or an attribute phrase of the at least one attribute phrase with a value of the plurality of values or a value phrase of the at least one value phrase based on syntactic dependency therebetween; and storing the attribute or attribute phrase and the associated value or value phrase as an attribute-value pair.
2. The method of claim 1 , further comprising: determining correlations between another attribute of the plurality of attributes or another value phrase of the at least one value phrase and another value of the plurality of values or another value phrase of the at least one value phrase; associating the other attribute or the other attribute phrase with the other value or the other value phrase based on correlations therebetween; and storing the other attribute or other attribute phrase and the associated other value or other value phrase as another attribute-value pair.
3. The method of claim 1 , further comprising: identifying another attribute of the plurality of attributes or another value phrase of the at least one value phrase that is in proximity to another value of the plurality of values or another value phrase of the at least one value phrase; associating the other attribute or the other attribute phrase with the other value or the other value phrase based on the proximity therebetween; and storing the other attribute or other attribute phrase and the associated other value or other value phrase as another attribute-value pair.
4. An apparatus for associating a plurality of attributes and a plurality of values within at least one natural language document to define attribute-value pairs, comprising: a correlation module operative to determine correlations between two or more attributes of the plurality of attributes, and to determine correlations between two or more values of the plurality of values; a phrase determination module operative to identify at least one attribute phrase based on the correlations between the two or more attributes, and to identify at least one value phrase based on the correlations between the two or more values; a syntactic dependency module operative to associate an attribute of the plurality of attributes or an attribute phrase of the at least one attribute phrase with a value of the plurality of values or a value phrase of the at least one value phrase based on syntactic dependency therebetween; and a machine readable store storing the attribute or attribute phrase and the associated value or value phrase as an attribute-value pair.
5. The apparatus of claim 4 , wherein the correlation module is operative to determine correlations between another attribute of the plurality of attributes or another value phrases of the at least one value phrase and another value of the plurality of values or another value phrase of the at least one value phrase, the apparatus further comprising: an associating module operative to associate the other attribute or the other attribute phrase with the other value or the other value phrase based on correlations therebetween, wherein the machine readable store is further operative to store the other attribute or other attribute phrase and the associated other value or other value phrase as another attribute-value pair.
6. The apparatus of claim 4 , further comprising: a proximity module operative to identify another attribute of the plurality of attributes or another value phrase of the at least one value phrase that is in proximity to another value of the plurality of values or another value phrase of the at least one value phrase, wherein the associating module is further operative to associate the other attribute or the other attribute phrase with the other value or the other value phrase based on the proximity therebetween, and wherein the machine readable store is further operative to store the other attribute or other attribute phrase and the associated other value or other value phrase as another attribute-value pair.
7. A computer-readable medium having stored thereon executable instructions that, when executed, cause a computer to: determine correlations between two or more attributes of a plurality of attributes for a product within at least one natural language document; identify at least one attribute phrase based on the correlations between the two or more attributes; determine correlations between two or more values of a plurality of values for the product within the at least one natural language document; identify at least one value phrase based on the correlations between the two or more values; associate an attribute of the plurality of attributes or an attribute phrase of the at least one attribute phrase with a value of the plurality of values or a value phrase of the at least one value phrase based on syntactic dependency therebetween; and store the attribute or attribute phrase and the associated value or value phrase as an attribute-value pair.
8. The computer-readable medium of claim 7 , further comprising executable instructions that, when executed, cause the computer to: determine correlations between another attribute of the plurality of attributes or another value phrase of the at least one value phrase and another value of the plurality of values or another value phrase of the at least one value phrase; associate the other attribute or the other attribute phrase with the other value or the other value phrase based on correlations therebetween; and store the other attribute or other attribute phrase and the associated other value or other value phrase as another attribute-value pair.
9. The computer-readable medium of claim 7 , further comprising executable instructions that, when executed, cause the computer to: identify another attribute of the plurality of attributes or another value phrase of the at least one value phrase that is in proximity to another value of the plurality of values or another value phrase of the at least one value phrase; associate the other attribute or the other attribute phrase with the other value or the other value phrase based on the proximity therebetween; and store the other attribute or other attribute phrase and the associated other value or other value phrase as another attribute-value pair.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
April 30, 2007
June 28, 2011
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.