9639577

Systems and Methods for Determining Membership of an Element Within a Set Using a Minimum of Resources

PublishedMay 2, 2017
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
9 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1

1. A computer-implemented method for scanning data, the method comprising: storing Golomb-Compressed Sequence (GCS) pre-filter data in a random-access memory; storing a GCS pre-filter index to the GCS pre-filter data in the random-access memory; storing GCS data on a non-transitory storage drive; storing a GCS index to the GCS data in the random-access memory; scanning at least a portion of data associated with a file to determine whether any portion of the scanned data matches contents of the GCS pre-filter data or GCS data; generating a plurality of hashes from the scanned data; sorting the plurality of hashes of the scanned data; deduplicating the sorted plurality of hashes; querying the pre-filter index with the plurality of sorted and deduplicated hashes to determine whether the GCS pre-filter data is associated with the scanned data; upon determining the GCS pre-filter data is associated with the scanned data: identifying the location of the GCS pre-filter data associated with the scanned data; retrieving the GCS pre-filter data from the identified location; analyzing at least a portion of the retrieved GCS pre-filter data; and generating a notification indicating a match to the scanned data is found in the GCS pre-filter data; upon determining at least one of the sorted and deduplicated hashes is not associated with the GCS pre-filter data, querying the GCS index with the at least one sorted and deduplicated hash not associated with the GCS pre-filter data to determine whether the GCS data is associated with the scanned data; upon determining the GCS data is associated with the scanned data: identifying the location of the GCS data associated with the scanned data; retrieving the GCS data from the identified location; analyzing at least a portion of the retrieved GCS data; generating a notification indicating a match to the scanned data is found in the GCS data; upon determining at least one of the sorted and deduplicated hashes is not associated with the GCS pre-filter data or the GCS data: querying a database with the at least one sorted and deduplicated hash not associated with the GCS pre-filter data or GCS data; determining whether data in the database is associated with the scanned data based on the querying; and upon determining data in the database is associated with the scanned data: identifying the location of data in the database associated with the scanned data; retrieving the data in the database from the identified location; analyzing at least a portion of the retrieved data in the database; generating a notification indicating a match to the scanned data is found in the database.

2

2. The method of claim 1 , further comprising: identifying a location of GCS data for each of the plurality of sorted and deduplicated hashes that is associated with the scanned data; generating a list of file offsets that enable a single hard disk drive read request based on the identified plurality of locations of GCS data.

3

3. The method of claim 2 , further comprising: retrieving the GCS data from the identified plurality of locations; and analyzing at least a portion of the retrieved GCS data.

4

4. The method of claim 3 , further comprising: determining, based on the analysis of the GCS data associated with the plurality of sorted and deduplicated hashes, whether to perform additional data querying.

5

5. The method of claim 1 , further comprising: upon determining the GCS data is not associated with the scanned data, determining, based on the query of the GCS index, whether to perform additional data querying.

6

6. A computing device configured to scan data, comprising: a hardware processor; memory in electronic communication with the processor; instructions stored in the memory, the instructions being executable by the processor to: store Golomb-Compressed Sequence (GCS) pre-filter data in a random-access memory; store a GCS pre-filter index to the GCS pre-filter data in the random-access memory; store GCS data on a non-transitory storage drive; store a GCS index to the GCS data in the random-access memory; scan at least a portion of data associated with a file to determine whether any portion of the scanned data matches contents of the GCS pre-filter data or GCS data; generate a plurality of hashes from the scanned data; sort the plurality of hashes of the scanned data; deduplicate the sorted plurality of hashes; query the pre-filter index with the plurality of sorted and deduplicated hashes to determine whether the GCS pre-filter data is associated with the scanned data; upon determining the GCS pre-filter data is associated with the scanned data, identify the location of the GCS pre-filter data associated with the scanned data: retrieve the GCS pre-filter data from the identified location; analyze at least a portion of the retrieved GCS pre-filter data; and generate a notification indicating a match to the scanned data is found in the GCS pre-filter data; upon determining at least one of the sorted and deduplicated hashes is not associated with the GCS pre-filter data, query the GCS index with the at least one sorted and deduplicated hash not associated with the GCS pre-filter data to determine whether the GCS data is associated with the scanned data; upon determining the GCS data is associated with the scanned data: identify the location of the GCS data associated with the scanned data; retrieve the GCS data from the identified location; analyze at least a portion of the retrieved GCS data; generate a notification indicating a match to the scanned data is found in the GCS data; upon determining at least one of the sorted and deduplicated hashes is not associated with the GCS pre-filter data or the GCS data: querying a database with the at least one sorted and deduplicated hash not associated with the GCS pre-filter data or GCS data; determining whether data in the database is associated with the scanned data based on the querying; and upon determining data in the database is associated with the scanned data: identify the location of data in the database associated with the scanned data; retrieve the data in the database from the identified location; analyze at least a portion of the retrieved data in the database; generate a notification indicating a match to the scanned data is found in the database.

7

7. The computing device of claim 6 , wherein the instructions are executable by the processor to: identify a location of GCS data for each of the plurality of sorted and deduplicated hashes that is associated with the scanned data.

8

8. The computing device of claim 7 , wherein the instructions are executable by the processor to: generate a list of file offsets that enable a single hard disk drive read request to acquire the GCS data from the identified plurality of locations of GCS data; retrieve the GCS data from the identified plurality of locations in a single sweep of a magnetic head of a hard disk drive; analyze at least a portion of the retrieved GCS data; and determine, based on the analysis of the GCS data associated with the plurality of sorted and deduplicated hashes, whether to perform additional data querying.

9

9. A computer-program product for scanning data, the computer-program product comprising a non-transitory computer-readable medium storing instructions thereon, the instructions being executable by a processor to: store Golomb-Compressed Sequence (GCS) pre-filter data in a random-access memory; store a GCS pre-filter index to the GCS pre-filter data in the random-access memory; store GCS data on a non-transitory storage drive; store a GCS index to the GCS data in the random-access memory; scan at least a portion of data associated with a file to determine whether any portion of the scanned data matches contents of the GCS pre-filter data or GCS data; generate a plurality of hashes from the scanned data; sort the plurality of hashes of the scanned data; deduplicate the sorted plurality of hashes; query the pre-filter index with the plurality of sorted and deduplicated hashes to determine whether the GCS pre-filter data is associated with the scanned data; upon determining the GCS pre-filter data is associated with the scanned data, identify the location of the GCS pre-filter data associated with the scanned data: retrieve the GCS pre-filter data from the identified location; analyze at least a portion of the retrieved GCS pre-filter data; and generate a notification indicating a match to the scanned data is found in the GCS pre-filter data; upon determining at least one of the sorted and deduplicated hashes is not associated with the GCS pre-filter data, query the GCS index with the at least one sorted and deduplicated hash not associated with the GCS pre-filter data to determine whether the GCS data is associated with the scanned data; upon determining the GCS data is associated with the scanned data: identify the location of the GCS data associated with the scanned data; retrieve the GCS data from the identified location; analyze at least a portion of the retrieved GCS data; generate a notification indicating a match to the scanned data is found in the GCS data; upon determining at least one of the sorted and deduplicated hashes is not associated with the GCS pre-filter data or the GCS data: querying a database with the at least one sorted and deduplicated hash not associated with the GCS pre-filter data or GCS data; determining whether data in the database is associated with the scanned data based on the querying; and upon determining data in the database is associated with the scanned data: identify the location of data in the database associated with the scanned data; retrieve the data in the database from the identified location; analyze at least a portion of the retrieved data in the database; generate a notification indicating a match to the scanned data is found in the database.

Patent Metadata

Filing Date

Unknown

Publication Date

May 2, 2017

Inventors

Everett Lai
Kenneth Coleman
Qun Li
Yuval Tarsi

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “SYSTEMS AND METHODS FOR DETERMINING MEMBERSHIP OF AN ELEMENT WITHIN A SET USING A MINIMUM OF RESOURCES” (9639577). https://patentable.app/patents/9639577

© 2026 Patentable. All rights reserved.

Patentable is a research and drafting-assistant tool, not a law firm, and does not provide legal advice. Documents we generate are drafts for review by a licensed patent attorney.