Legal claims defining the scope of protection, as filed with the USPTO.
1. A computer-implemented method for scanning data, the method comprising: storing Golomb-Compressed Sequence (GCS) pre-filter data in a random-access memory; storing a GCS pre-filter index to the GCS pre-filter data in the random-access memory; storing GCS data on a non-transitory storage drive; storing a GCS index to the GCS data in the random-access memory; scanning at least a portion of data associated with a file to determine whether any portion of the scanned data matches contents of the GCS pre-filter data or GCS data; generating a plurality of hashes from the scanned data; sorting the plurality of hashes of the scanned data; deduplicating the sorted plurality of hashes; querying the pre-filter index with the plurality of sorted and deduplicated hashes to determine whether the GCS pre-filter data is associated with the scanned data; upon determining the GCS pre-filter data is associated with the scanned data: identifying the location of the GCS pre-filter data associated with the scanned data; retrieving the GCS pre-filter data from the identified location; analyzing at least a portion of the retrieved GCS pre-filter data; and generating a notification indicating a match to the scanned data is found in the GCS pre-filter data; upon determining at least one of the sorted and deduplicated hashes is not associated with the GCS pre-filter data, querying the GCS index with the at least one sorted and deduplicated hash not associated with the GCS pre-filter data to determine whether the GCS data is associated with the scanned data; upon determining the GCS data is associated with the scanned data: identifying the location of the GCS data associated with the scanned data; retrieving the GCS data from the identified location; analyzing at least a portion of the retrieved GCS data; generating a notification indicating a match to the scanned data is found in the GCS data; upon determining at least one of the sorted and deduplicated hashes is not associated with the GCS pre-filter data or the GCS data: querying a database with the at least one sorted and deduplicated hash not associated with the GCS pre-filter data or GCS data; determining whether data in the database is associated with the scanned data based on the querying; and upon determining data in the database is associated with the scanned data: identifying the location of data in the database associated with the scanned data; retrieving the data in the database from the identified location; analyzing at least a portion of the retrieved data in the database; generating a notification indicating a match to the scanned data is found in the database.
2. The method of claim 1 , further comprising: identifying a location of GCS data for each of the plurality of sorted and deduplicated hashes that is associated with the scanned data; generating a list of file offsets that enable a single hard disk drive read request based on the identified plurality of locations of GCS data.
3. The method of claim 2 , further comprising: retrieving the GCS data from the identified plurality of locations; and analyzing at least a portion of the retrieved GCS data.
4. The method of claim 3 , further comprising: determining, based on the analysis of the GCS data associated with the plurality of sorted and deduplicated hashes, whether to perform additional data querying.
5. The method of claim 1 , further comprising: upon determining the GCS data is not associated with the scanned data, determining, based on the query of the GCS index, whether to perform additional data querying.
6. A computing device configured to scan data, comprising: a hardware processor; memory in electronic communication with the processor; instructions stored in the memory, the instructions being executable by the processor to: store Golomb-Compressed Sequence (GCS) pre-filter data in a random-access memory; store a GCS pre-filter index to the GCS pre-filter data in the random-access memory; store GCS data on a non-transitory storage drive; store a GCS index to the GCS data in the random-access memory; scan at least a portion of data associated with a file to determine whether any portion of the scanned data matches contents of the GCS pre-filter data or GCS data; generate a plurality of hashes from the scanned data; sort the plurality of hashes of the scanned data; deduplicate the sorted plurality of hashes; query the pre-filter index with the plurality of sorted and deduplicated hashes to determine whether the GCS pre-filter data is associated with the scanned data; upon determining the GCS pre-filter data is associated with the scanned data, identify the location of the GCS pre-filter data associated with the scanned data: retrieve the GCS pre-filter data from the identified location; analyze at least a portion of the retrieved GCS pre-filter data; and generate a notification indicating a match to the scanned data is found in the GCS pre-filter data; upon determining at least one of the sorted and deduplicated hashes is not associated with the GCS pre-filter data, query the GCS index with the at least one sorted and deduplicated hash not associated with the GCS pre-filter data to determine whether the GCS data is associated with the scanned data; upon determining the GCS data is associated with the scanned data: identify the location of the GCS data associated with the scanned data; retrieve the GCS data from the identified location; analyze at least a portion of the retrieved GCS data; generate a notification indicating a match to the scanned data is found in the GCS data; upon determining at least one of the sorted and deduplicated hashes is not associated with the GCS pre-filter data or the GCS data: querying a database with the at least one sorted and deduplicated hash not associated with the GCS pre-filter data or GCS data; determining whether data in the database is associated with the scanned data based on the querying; and upon determining data in the database is associated with the scanned data: identify the location of data in the database associated with the scanned data; retrieve the data in the database from the identified location; analyze at least a portion of the retrieved data in the database; generate a notification indicating a match to the scanned data is found in the database.
7. The computing device of claim 6 , wherein the instructions are executable by the processor to: identify a location of GCS data for each of the plurality of sorted and deduplicated hashes that is associated with the scanned data.
8. The computing device of claim 7 , wherein the instructions are executable by the processor to: generate a list of file offsets that enable a single hard disk drive read request to acquire the GCS data from the identified plurality of locations of GCS data; retrieve the GCS data from the identified plurality of locations in a single sweep of a magnetic head of a hard disk drive; analyze at least a portion of the retrieved GCS data; and determine, based on the analysis of the GCS data associated with the plurality of sorted and deduplicated hashes, whether to perform additional data querying.
9. A computer-program product for scanning data, the computer-program product comprising a non-transitory computer-readable medium storing instructions thereon, the instructions being executable by a processor to: store Golomb-Compressed Sequence (GCS) pre-filter data in a random-access memory; store a GCS pre-filter index to the GCS pre-filter data in the random-access memory; store GCS data on a non-transitory storage drive; store a GCS index to the GCS data in the random-access memory; scan at least a portion of data associated with a file to determine whether any portion of the scanned data matches contents of the GCS pre-filter data or GCS data; generate a plurality of hashes from the scanned data; sort the plurality of hashes of the scanned data; deduplicate the sorted plurality of hashes; query the pre-filter index with the plurality of sorted and deduplicated hashes to determine whether the GCS pre-filter data is associated with the scanned data; upon determining the GCS pre-filter data is associated with the scanned data, identify the location of the GCS pre-filter data associated with the scanned data: retrieve the GCS pre-filter data from the identified location; analyze at least a portion of the retrieved GCS pre-filter data; and generate a notification indicating a match to the scanned data is found in the GCS pre-filter data; upon determining at least one of the sorted and deduplicated hashes is not associated with the GCS pre-filter data, query the GCS index with the at least one sorted and deduplicated hash not associated with the GCS pre-filter data to determine whether the GCS data is associated with the scanned data; upon determining the GCS data is associated with the scanned data: identify the location of the GCS data associated with the scanned data; retrieve the GCS data from the identified location; analyze at least a portion of the retrieved GCS data; generate a notification indicating a match to the scanned data is found in the GCS data; upon determining at least one of the sorted and deduplicated hashes is not associated with the GCS pre-filter data or the GCS data: querying a database with the at least one sorted and deduplicated hash not associated with the GCS pre-filter data or GCS data; determining whether data in the database is associated with the scanned data based on the querying; and upon determining data in the database is associated with the scanned data: identify the location of data in the database associated with the scanned data; retrieve the data in the database from the identified location; analyze at least a portion of the retrieved data in the database; generate a notification indicating a match to the scanned data is found in the database.
Unknown
May 2, 2017
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.