A data handling method combines search capabilities with analytical functionality. The invention provides advantages when dealing with structured documents (such as electronic catalogs, XML documents, text documents, HTML documents, Internet documents, etc.) and other data stored in a computer system. Various embodiments include simplified ways to express search/analysis requests of a data set and also to express results to such requests.
Legal claims defining the scope of protection, as filed with the USPTO.
1. A method of selecting desired documents from a document collection comprising: reading a first query expression, said first query expression indicating a first subset of said collection; reading an analysis indicator of an analysis to be performed on said first subset; and generating an analysis output by performing said analysis on said first subset; wherein said analysis comprises a function that returns indications of N subsets of said first subset, and wherein said analysis output is initially returned as one or more second query expressions each associated with one of said N subsets of said first subset.
2. The method of claim 1 wherein said first query expression comprises one or more name/value pairs.
3. The method of claim 1 wherein said first query expression further comprises one or more name/value pairs combined with one or more operators.
4. The method of claim 3 wherein said operators comprise standard boolean operators, statistical operators such as match N of M, and threshold expressions allowing different weighting for different parts of an expression.
5. The method of claim 1 wherein said first query expression further comprises one or more indications of linguistic tools.
6. The method of claim 1 further comprising reading one or more parameters after said analysis indicator, said one or more parameters specifying modification of said analysis.
7. The method of claim 1 further comprising: reading a redirection indicator indicating that said analysis results are to be evaluated as queries to a document set.
8. The method of claim 1 further comprising: determining if said first subset contains sufficient data; and if not, modifying said query using a fallback procedure to generate a larger first subset.
9. The method of claim 8 further comprising: reading a second query expression outside of said redirection indicator and combining said second query expression with one or more query expressions output by said analysis.
10. The method of claim 1 further comprising: reading an analysis parameter indicating a desired number of second query expressions to be returned by said analysis.
11. The method of claim 1 wherein said function comprises a rollup function that returns said indications of said N subsets of said first subset, each of said N subsets including documents that contain at least one of the top N most frequent values from an indicated field expression.
12. The method of claim 1 wherein said function comprises a categorize function that returns said indications of said N subsets of said first subset, each of said N subsets including documents identified by determining patterns of name/value pairs that tend to occur together.
13. The method of claim 1 wherein said function comprises a partition function that returns said indications of said N subsets of said first subset based on the number of the documents that appear in more than one of said N subsets and the number of the documents in each of said N subsets.
14. Computer-readable media embodying instructions executable by a computer to perform a method of selecting desired documents from a document collection, the method comprising: reading a first query expression, said first query expression indicating a first subset of said collection; reading an analysis indicator of an analysis to be performed on said first subset; and generating an analysis output by performing said analysis on said first subset; wherein said analysis comprises a function that returns indications of N subsets of said first subset; and wherein said analysis output is initially returned as one or more second query expressions each associated with one of said N subsets of said first subset.
15. The media of claim 14 wherein said first query expression comprises one or more name/value pairs.
16. The media of claim 14 wherein said first query expression further comprises one or more name/value pairs combined with one or more operators.
17. The media of claim 16 wherein said operators comprise standard boolean operators, statistical operators such as match N of M, and threshold expressions allowing different weighting for different parts of an expression.
18. The media of claim 14 wherein said first query expression further comprises one or more indications of linguistic tools.
19. The media of claim 14 , wherein the method further comprises reading one or more parameters after said analysis indicator, said one or more parameters specifying modification of said analysis.
20. The media of claim 14 , wherein the method further comprises: reading a redirection indicator indicating that said analysis results are to be evaluated as queries to a document set.
21. The media of claim 14 , wherein the method further comprises: determining if said first subset contains sufficient data; and if not, modifying said query using a fallback procedure to generate a larger first subset.
22. The media of claim 20 , wherein the method further comprises: reading a second query expression outside of said redirection indicator and combining said second query expression with one or more query expressions output by said analysis.
23. The media of claim 14 , wherein the method further comprises: reading an analysis parameter indicating a desired number of second query expressions to be returned by said analysis.
24. The media of claim 14 wherein said function comprises a rollup function that returns said indications of said N subsets of said first subset, each of said N subsets including documents that contain at least one of the top N most frequent values from an indicated field expression.
25. The media of claim 14 wherein said function comprises a categorize function that returns said indications of said N subsets of said first subset, each of said N subsets including documents identified by determining patterns of name/value pairs that tend to occur together.
26. The media of claim 14 wherein said function comprises a partition function that returns said indications of said N subsets of said first subset based on the number of the documents that appear in more than one of said N subsets and the number of the documents in each of said N subsets.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
October 27, 2000
October 12, 2004
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.