US-8086592

Apparatus and method for associating unstructured text with structured data

PublishedDecember 27, 2011

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

A computer readable storage medium includes executable instructions to receive a semantic abstraction describing at least one underlying data source. The semantic abstraction includes at least one dimension with at least one dimension value. Unstructured text is parsed into parsed text units. A dimension value is matched to a parsed text unit to form matched content. An indication of the matched content is stored.

Patent Claims

17 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A non-transitory computer readable storage medium, comprising executable instructions to: receive a semantic abstraction describing at least one underlying data source, wherein the semantic abstraction includes data model objects including source dimensions and corresponding source dimension values; form an index with the source dimensions and the corresponding source dimension values derived from the data model objects of the semantic abstraction; parse secondary source unstructured text into a plurality of parsed text units, wherein the secondary source unstructured text is from a data source separate from the underlying data source; match a source dimension value from the index to a parsed text unit to form matched content; store the matched content in a table specifying a source dimension of the underlying data source, corresponding source dimension values and a count for a parsed text unit from the external source unstructured text that matches a dimension value; create an annotation in the table indicative of the number of matches between a selected dimension value and parsed text units; and process the table to rank the relevance of the external source unstructured text to the semantic abstraction of the underlying data source.

2. The computer readable storage medium of claim 1 wherein the secondary source unstructured text is supplied from one of a local repository and a remote repository.

3. The computer readable storage medium of claim 1 wherein the secondary source unstructured text is selected from customer complaints, screen scrapings, news feeds, text files, Really Simple Syndication (RSS) feeds, data streams and emails.

4. The computer readable storage medium of claim 1 wherein the secondary source unstructured text is provided in response to a query.

5. The computer readable storage medium of claim 1 wherein the executable instructions to parse include executable instructions to tokenize the unstructured text to form tokenized text, stem the tokenized text, and remove stop words to produce the parsed text units.

6. The computer readable storage medium of claim 1 further comprising executable instructions to create a coordinate in the table indicative of a match between a plurality of source dimension values and the parsed text unit.

7. The computer readable storage medium of claim 6 further comprising executable instructions to rank the coordinate.

8. The computer readable storage medium of claim 1 further comprising executable instructions to simultaneously display a dimension value and associated unstructured text.

9. The computer readable storage medium of claim 1 further comprising executable instructions to apply a business intelligence tool to the matched content to generate a report.

10. A method for implementation by one or more data processors, the method comprising: receiving, by at least one processor, a semantic abstraction describing at least one data source, the semantic abstraction comprising data model objects including source dimensions and corresponding source dimension values; forming, by at least one processor, an index with the source dimensions and the corresponding source dimension values derived from the data model objects of the semantic abstraction; parsing, by at least one processor, unstructured text into a plurality of parsed text units; matching, by at least one processor, a source dimension value to a parsed text unit to form matched content; storing, by at least one processor, the matched content in a table; and creating, by at least one processor, an annotation in the table indicative of number of matches between a selected dimension value and parsed text units.

11. The method in accordance with claim 10 , wherein the table specifies a source dimension of the data source, corresponding source dimension values and a count for a parsed text unit from the unstructured text that matches a dimension value.

12. The method in accordance with claim 11 further comprising: processing, by at least one processor, the table to rank the relevance of the unstructured text to the semantic abstraction of the data source.

13. The method in accordance with claim 10 , wherein the unstructured text is obtained from a secondary source separate from the at least one data source.

14. The method in accordance with claim 10 further comprising: creating, by at least one processor, a coordinate in the table indicative of a match between a plurality of source dimension values and the parsed text unit; and ranking, by at least one processor, the coordinate.

15. A method to associate unstructured text with structured data, the method being implemented by one or more data processors and comprising: receiving, by at least one processor, structured data from at least one first data source, the structured data comprising source dimensions and corresponding source dimension values; forming, by at least one processor, an index with the source dimensions and the corresponding source dimension values derived from the structured data; receiving, by at least one processor, the unstructured text from a second data source; parsing, by at least one processor, the unstructured text into a plurality of parsed text units; matching, by at least one processor, a source dimension value of the source dimension values to a parsed text unit of the plurality of parsed text units to form matched content; creating, by at least one processor, an annotation in a table indicative of number of matches between a selected dimension value and parsed text units; processing, by at least one processor the table, to rank the matched content, the rank indicating relevance of the associating of the parsed text unit with the corresponding source dimension value.

16. The method in accordance with claim 15 , wherein the table stores the matched content and specifies a count for a parsed text unit that matches a dimension value.

17. The method in accordance with claim 15 , wherein the parsing comprises tokenizing the unstructured text to form tokenized text, stemming the tokenized text, and removing stop words to produce the parsed text units.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G06F

Patent Metadata

Filing Date

November 30, 2007

Publication Date

December 27, 2011

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search