Patentable/Patents/US-8819451
US-8819451

Techniques for representing keywords in an encrypted search index to prevent histogram-based attacks

PublishedAugust 26, 2014
Assigneenot available in USPTO data we have
Inventorsnot available in USPTO data we have
Technical Abstract

A method and system for cryptographically indexing, searching for, and retrieving documents is provided. In some embodiments, an encryption system is provided that generates a document index that allows users to retrieve documents by performing encrypted queries for keywords associated with the documents. In some embodiments, each keyword maps to the same number of encrypted document identifiers. In some embodiments, an extractor graph is employed to map an indication of each keyword to a number of buckets storing encrypted document identifiers. In some embodiments, an order-preserving encryption system is provided. The encryption system uses an ordered index that maps encrypted instances of ordered attribute values to documents that are associated with those values. The ordered index enables queries containing query operators that rely on order, such as less than (“<”) or greater than (“>”), to be successfully performed on encrypted attribute values.

Patent Claims
21 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1

1. A method in a networked computer system having a memory and a processor for searching for documents, the method comprising: generating a searchable encrypted keyword index for a plurality of documents by identifying a plurality of keywords, and for each of the identified plurality of keywords, encrypting the keyword to generate an encrypted instance of the keyword; identifying the documents that contain the keyword, for each of the identified documents, determining a document identifier, with a processor, generating a concatenated string by sequentially joining a string of the keyword to a string of the determined document identifier into the concatenated string, and encrypting the concatenated string by applying an encryption function to the concatenated string, and sending to a document storage service the encrypted instance of the keyword along with each of the encrypted concatenated strings wherein the document storage server associates the encrypted instance of the keyword with each of the encrypted concatenated strings that is an encryption of that keyword concatenated with a document identifier; and identifying an encrypted document that matches a query by inputting a keyword received from a user, encrypting the inputted keyword to generate an encrypted instance of the inputted keyword, sending the encrypted instance of the inputted keyword to the document storage service, receiving from the document storage service at least one encrypted concatenated string associated with the encrypted instance of the inputted keyword, and decrypting the received at least one encrypted concatenated string to identify both the keyword of the encrypted concatenated string and the document identifier of the encrypted concatenated string.

2

2. The method of claim 1 , further comprising: for each of the identified plurality of keywords, determining a number of concatenated strings to generate for the keyword, and if the number of concatenated strings generated for the keyword is less than the determined number, repeatedly: randomly selecting one of the document identifiers, generating a concatenated string by concatenating the keyword to the determined document identifier, and encrypting the concatenated string so that the determined number of concatenated strings is generated for the keyword.

3

3. The method of claim 2 wherein the determined number of concatenated strings to generate for the keyword is based at least in part on the number of documents containing the keyword contained by the most documents.

4

4. The method of claim 2 wherein the determined number of concatenated strings to generate for the keyword is based at least in part on the total number of documents.

5

5. The method of claim 2 wherein the determined number of concatenated strings to generate is the same for each of the identified plurality of keywords.

6

6. The method of claim 1 , further comprising: concatenating a count value to the concatenated string prior to encryption, the count value corresponding to the number of times the document identifier and the keyword have been used together to generate a concatenated string.

7

7. The method of claim 6 , further comprising: after decrypting the received at least one encrypted concatenated string, discarding the encrypted concatenated strings having a count value greater than zero.

8

8. A computing system configured to process a query, the computing system comprising: a memory storing computer-executable instructions of: a mapping component configured to, for each of a plurality of keywords, identify documents associated with that keyword, for each of the identified documents, determine a document identifier associated with that document, generate a concatenated string by concatenating an indication of that keyword to the document identifier to sequentially join a string of the keyword and a string of the document identifier into a single string that is the concatenated string, encrypt the concatenated string, and cause to be stored at a document storage server an association of that keyword to the encrypted concatenated string; a query component configured to submit an indication of search terms to the document storage server; a receive component configured to receive a plurality of encrypted concatenated strings from the document storage server; a decrypt component configured to, for each of the plurality of received encrypted concatenated strings, decrypt that encrypted concatenated string, and retrieve a document identifier from the decrypted concatenated string, and a display component configured to display an indication of the retrieved document identifiers; and a processor that executes the computer-executable instructions stored in the memory.

9

9. The computing system of claim 8 wherein generating a concatenated string further comprises concatenating a count value to the document identifier, the count value corresponding to the number of times the document identifier has been encrypted for a particular keyword.

10

10. The computing system of claim 9 wherein the mapping component is further configured to: determine an explicit number of encrypted concatenated strings to associate with each keyword; and if the number of identified documents associated with a keyword is less than the explicit number, repeating the following steps so that the total number of encrypted concatenated strings associated with the keyword equals the explicit number: randomly select one of the identified documents, determine the document identifier associated with the selected document, increment a count value corresponding to the number of times the document identifier has been encrypted for a particular keyword, generate a concatenated string by concatenating an indication of the keyword and the count value to the document identifier, encrypt the concatenated string, and cause to be stored at a document storage server an association of the keyword to the encrypted concatenated string.

11

11. The computing system of claim 10 wherein the decrypt component is further configured to: retrieve a count value from the decrypted concatenated string; and if the retrieved count value is greater than a predetermined threshold, discard the decrypted concatenated string from which the count value was retrieved.

12

12. The computing system of claim 8 wherein in response to receiving an indication of search terms, the document storage server invokes an extractor graph to identify a plurality of encrypted concatenated strings to return.

13

13. The computing system of claim 8 wherein causing to be stored at a document storage server an association of that keyword to the encrypted concatenated string includes invoking an extractor graph to determine where the encrypted concatenated string is to be stored.

14

14. The computing system of claim 13 wherein the decrypt component is further configured to: retrieve an indication of a keyword from the decrypted concatenated string; and if the retrieved keyword does not match one of the search terms, discard the decrypted concatenated string from which the keyword was retrieved.

15

15. A computer-readable storage device storing computer-executable instructions for causing a computing device to generate encrypted keyword-document identifier pairs for an index, by a method comprising: for each of a plurality of keywords, identifying the documents that contain the keyword; for each of the identified documents, retrieving a document identifier for the document; generating keyword-document identifier pair that contains the keyword and the retrieved document identifier as a combined string formed by sequentially joining a string of the keyword and a string of the retrieved document identifier; and encrypting the generated keyword-document identifier pair by applying an encryption function to the combined string such that the keyword and the retrieved document identifier of the combined string are encrypted as a single combined string so that the keyword and the retrieved document identifier of the encrypted keyword-document identifier pair cannot be separately decrypted; and sending to a document storage service a mapping of each of the plurality of keywords to the one or more encrypted keyword-document identifier pairs that contain the keyword of the encrypted keyword-document identifier pairs.

16

16. The computer-readable storage device of claim 15 including: for at least some of the plurality of keywords, selecting a number of encrypted keyword-document identifier pairs for the keyword; and generating additional encrypted keyword-document identifier pairs with information indicating that the additional encrypted keyword-document identifier pair can be discarded.

17

17. The computer-readable storage device of claim 16 wherein the selected number of encrypted keyword-document identifier pairs for the keyword is based at least in part on the number of documents containing the keyword contained by the most documents.

18

18. The computer-readable storage device of claim 16 wherein the selected number of encrypted keyword-document identifier pairs for the keyword is based at least in part on the total number of documents.

19

19. The computer-readable storage device of claim 16 wherein the selected number of encrypted keyword-document identifier pairs for the keyword is the same for each of the keywords.

20

20. The computer-readable storage device of claim 15 comprising: including a count value in the keyword-document identifier pair prior to encryption, the count value corresponding to the number of times the keyword-document identifier pair has been generated.

21

21. The computer-readable storage device of claim 15 wherein the additional encrypted keyword-document identifier pairs for a keyword include a different keyword to indicate that the additional encrypted keyword-document identifier pair is to be discarded.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

Patent Metadata

Filing Date

May 28, 2009

Publication Date

August 26, 2014

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “Techniques for representing keywords in an encrypted search index to prevent histogram-based attacks” (US-8819451). https://patentable.app/patents/US-8819451

© 2026 Patentable. All rights reserved.

Patentable is a research and drafting-assistant tool, not a law firm, and does not provide legal advice. Documents we generate are drafts for review by a licensed patent attorney.