8010534

Identifying Related Objects Using Quantum Clustering

PublishedAugust 30, 2011
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
22 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1

1. A method of clustering objects, implemented on a computing device, the method comprising: constructing a feature-object database of multiple objects, wherein the constructing comprises, for each of the multiple objects, obtaining the object; extracting features from the object to obtain extracted features; calculating feature values for the extracted features, wherein at least some of the features values are continuous; selecting extracted features to obtain selected features and corresponding selected feature values; quantizing the selected feature values to obtain quantized selected feature values, wherein quantizing comprises identifying the selected feature values as discrete or continuous and transforming the identified continuous feature values into discrete feature values; and building the feature-object database having keys comprising quantized selected feature values; building a connected objects database; building a directed graph of connected objects from the connected objects database, wherein the connected objects database is built from the feature-object database; identifying clusters of connected objects; and evaluating the clusters of identified objects to designate groups of related objects.

2

2. The method of claim 1 wherein the objects comprise documents or files.

3

3. The method of claim 1 wherein the identifying clusters comprises: walking the directed graph of connected objects to determine which objects share a system defined number of features to be considered a cluster.

4

4. The method of claim 1 wherein transforming the identified continuous selected feature values into discrete selected feature values comprises: assigning value ranges for the continuous selected feature values, each value range having a corresponding representative code.

5

5. The method of claim 4 wherein transforming the identified continuous feature values into discrete feature values further comprises preparing a synthetic vocabulary wherein each term in the vocabulary represents a selected feature and the corresponding selected feature value, the representation comprising the corresponding representative code.

6

6. A storage medium having instructions stored thereon which when executed by a processor cause the processor to perform actions comprising: constructing a feature-object database of multiple objects, wherein the constructing comprises, for each of the multiple objects, obtaining the object; extracting features from the object to obtain extracted features; calculating feature values for the extracted features, wherein at least some of the features values are continuous; selecting extracted features to obtain selected features and corresponding selected feature values; quantizing the selected feature values to obtain quantized selected feature values, wherein quantizing comprises identifying the selected feature values as discrete or continuous and transforming the identified continuous selected feature values into discrete selected feature values; and building the feature-object database having keys comprising quantized selected feature values; building a connected objects database; building a directed graph of connected objects from the connected objects database, wherein the connected objects database is built from the feature-object database; identifying clusters of connected objects; and evaluating the clusters of identified objects to designate groups of related objects.

7

7. The storage medium of claim 6 wherein the objects comprise documents or files.

8

8. The storage medium of claim 6 wherein the identifying clusters comprises: walking the directed graph of connected objects.

9

9. The storage medium of claim 6 wherein transforming the identified continuous selected feature values into discrete selected feature values comprises: assigning value ranges for the identified continuous selected feature values, each value range having a corresponding representative code.

10

10. The storage medium of claim 9 wherein transforming the identified continuous selected feature values into discrete selected feature values further comprises preparing a synthetic vocabulary wherein each term in the vocabulary represents a selected feature and the corresponding selected feature value, the representation comprising the corresponding representative code.

11

11. A computing device comprising: a processor, a memory, and a storage device, wherein the storage device includes a storage medium having instructions thereon which when executed cause the computing device to perform operations comprising: constructing a feature-object database of multiple objects, wherein the constructing comprises, for each of the multiple objects, obtaining the object; extracting features from the object; calculating feature values for the extracted features, wherein at least some of the features values are continuous; selecting features to obtain selected features and corresponding selected feature values; quantizing the selected feature values to obtain quantized selected feature values, wherein quantizing comprises identifying the selected feature values as discrete or continuous and transforming the identified continuous selected feature values into discrete selected feature values; and building the feature-object database having keys comprising quantized selected feature values; building a connected objects database; building a directed graph of connected objects from the connected objects database, wherein the connected objects database is built from the feature-object database; identifying clusters of connected objects; and evaluating the clusters of identified objects to determine groups of related objects.

12

12. The computing device of claim 11 wherein the objects comprise documents or files.

13

13. The computing device of claim 11 wherein the identifying clusters comprises: walking the directed graph of connected objects.

14

14. The computing device of claim 11 wherein transforming the identified continuous selected feature values into discrete selected feature values comprises: assigning value ranges for the identified continuous selected feature values, each value range having a corresponding representative code.

15

15. The computing device of claim 14 wherein transforming the identified continuous selected feature values into discrete selected feature values further comprises preparing a synthetic vocabulary wherein each term in the vocabulary represents a selected feature and the corresponding selected feature value, the representation comprising the representative code.

16

16. A method, implemented on a computing device, of identifying related documents by clustering a plurality of documents, the method comprising: constructing a feature-document database of a plurality of documents, wherein the constructing comprises, for each of the documents, obtaining one of the plurality of documents as a current document; extracting words from the current document to obtain extracted words; forming features from the extracted words; calculating feature values for the features, wherein at least some of the features values are continuous; selecting some of the features to obtain selected features and corresponding feature values; quantizing the selected feature value to obtain quantized selected feature values, wherein quantizing comprises identifying the selected feature values as discrete or continuous and transforming the identified continuous selected feature values into discrete selected feature values; and building the feature-document database having keys comprising quantized selected features; building a connected documents database; building a directed graph of connected objects from the connected objects database, wherein the connected objects database is built from the feature-object database; identifying clusters of connected documents; and evaluating the clusters of identified documents to determine groups of related documents.

17

17. The method of claim 16 wherein the forming features from the extracted words comprises: organizing the extracted words into shingles; hashing each shingle; and selecting some shingles to be features.

18

18. The method of claim 16 wherein constructing the feature-document database further comprises breaking the current document into paragraphs and wherein forming features from the extracted words comprises forming word groups that co-occur in a paragraph.

19

19. The method of claim 18 wherein the word groups comprise pairs or triples, and the words in the word groups need not be immediately adjacent.

20

20. The method of claim 16 wherein the identifying clusters comprises walking the directed graph of connected documents.

21

21. The method of claim 16 wherein transforming the identified continuous selected feature values into discrete selected feature values: assigning value ranges for the identified continuous selected feature values, each value range having a corresponding representative code.

22

22. The method of claim 21 wherein transforming the identified continuous selected feature values into discrete selected feature values further comprises preparing a synthetic vocabulary wherein each term in the vocabulary represents a selected feature and the corresponding selected feature value, the representation comprising the corresponding representative code.

Patent Metadata

Filing Date

Unknown

Publication Date

August 30, 2011

Inventors

Herbert L. Roitblat
Brian Golbere

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “IDENTIFYING RELATED OBJECTS USING QUANTUM CLUSTERING” (8010534). https://patentable.app/patents/8010534

© 2026 Patentable. All rights reserved.

Patentable is a research and drafting-assistant tool, not a law firm, and does not provide legal advice. Documents we generate are drafts for review by a licensed patent attorney.