US-10671540

Cache aware searching based on one or more files in one or more buckets in remote storage

PublishedJune 2, 2020

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

Embodiments are disclosed for performing cache aware searching. In response to a search query, a first bucket and a second bucket in remote storage for processing the search query. A determination is made that a first file in the first bucket is present in a cache when the search query is received. In response to the search query, a search is performed using the first file based on the determination that the first file is present in the cache when the search query is received, and the search is performed using a second file from the second bucket once the second file is stored in the cache.

Patent Claims

30 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A method, comprising: in response to a search query on a plurality of events, identifying a first bucket in remote storage for processing the search query, the first bucket storing a subset of the plurality of events having timestamps within a defined time range of the first bucket, wherein the first bucket is identified based on the defined time range; making a determination that a first file in the first bucket is present in a cache when the search query is received; and performing, in response to the search query, a search using the first file based on the determination that the first file is present in the cache when the search query is received.

2. The method of claim 1 , further comprising: determining a length of time to download, to the cache, a second file in a second bucket; and performing, in response to the search query, the search using a third file that is in the cache based on the length of time without downloading the second file.

3. The method of claim 1 , wherein the first bucket comprises a metadata file comprising a host name, a source name, and a source type, and wherein performing the search using the first file comprises: determining a length of time to download, to the cache, the metadata file in the first bucket, wherein the search using the first file is performed based on the length of time without downloading the metadata file.

4. The method of claim 1 , further comprising: performing, in response to the search query, the search using a second file from a second bucket once the second file is stored in the cache, wherein the second file is a metadata file; and eliminating the second bucket from the search query in response to processing the metadata file.

5. The method of claim 1 , further comprising: performing, in response to the search query, the search using a second file from a second bucket once the second file is stored in the cache, wherein the second bucket comprises a metadata file comprising a host name, a source name, and a source type, and wherein performing the search using the second file comprises: comparing the host name, source name, and the source type to the search query to obtain a comparison result indicating that at least one of host name, source name, and the source type does not match the search query; and eliminating, from the search, the second bucket based on the comparison result, wherein eliminating the second bucket comprises not searching a remaining file in the second bucket.

6. The method of claim 1 , further comprising: performing, in response to the search query, the search using a second file from a second bucket once the second file is stored in the cache, wherein the second bucket comprises a metadata file comprising a host name, a source name, and a source type, and wherein performing the search using the second file comprises: comparing the host name, source name, and the source type to the search query to obtain a comparison result indicating that the host name, source name, and the source type matches the search query; and performing, using a third file in the second bucket, the search based on the comparison result.

7. The method of claim 1 , further comprising: identifying an order on a list of buckets for processing the search query, wherein the first bucket is processed out of the order based on the first file being in the cache when the search query is received.

8. The method of claim 1 , further comprising: performing, in response to the search query, the search using a second file from a second bucket once the second file is stored in the cache; and selecting the second file for eviction after performing the search using the second file based on a relative size of the second file in the cache.

9. The method of claim 1 , further comprising: performing, in response to the search query, the search using a second file from a second bucket once the second file is stored in the cache; in response to the search query, copying the second bucket from the remote storage to the cache, the second bucket including first data associated with the search query; identifying a first file type associated with the second file in the second bucket, wherein the second file is associated with a usage status; accessing, based on the search query, a third bucket from the remote storage, the third bucket including second data associated with the search query; identifying a third file in the third bucket having the first file type; and copying, in response to the usage status indicating that the second file was used in processing the search query, the third file from the remote storage to the cache.

10. The method of claim 1 , further comprising: performing, in response to the search query, the search using a second file from a second bucket once the second file is stored in the cache; identifying a file type associated with a third file in the first bucket; identifying a fourth file in the second bucket having the file type; and copying the fourth file based on a usage status associated with the third file, wherein the usage status indicates that the third file was used during the processing of the search query.

11. The method of claim 1 , further comprising: performing, in response to the search query, the search using a second file from a second bucket once the second file is stored in the cache; identifying a first file type associated with a third file in the first bucket; identifying a fourth file in the second bucket having the first file type; making a first determination whether to copy the fourth file based on a usage status associated with the third file, wherein the usage status indicates whether the third file was used during the processing of the search query; making a second determination, based at least in part on the first determination indicating to copy the fourth file, whether to copy a fifth file of the second bucket having a second file type; and copying, based on the second determination, the fifth file from the remote storage to the cache.

12. The method of claim 1 , further comprising: tracking a first wait time to access one or more files of the first bucket; tracking a first search time to process the search query using the first bucket; and calculating a prefetch lookahead based on a ratio of the first wait time to the first search time, wherein the prefetch lookahead represents a target number of buckets in a bucket prefetch window, wherein the bucket prefetch window indicates which buckets are candidates for prefetching, wherein prefetching comprises copying one or more files of a bucket from the remote storage to the cache.

13. The method of claim 1 , further comprising: identifying a file type associated with the first file, wherein the file type is one of a metadata file type, a Bloom filter file type, an index file type, and a journal file type.

14. The method of claim 1 , wherein each event in the subset of the plurality of events comprises a portion of raw machine data.

15. The method of claim 1 , further comprising: performing, in response to the search query, the search using a second file from a second bucket once the second file is stored in the cache.

16. The method of claim 1 , wherein the remote storage is located in a cloud storage or a storage in an on-premises environment.

17. A computer system, comprising: a data store comprising a cache and a remote storage; and an indexer configured to spawn a search process that: in response to a search query on a plurality of events, identifies a first bucket in the remote storage for processing the search query, the first bucket storing a subset of the plurality of events having timestamps within a defined time range of the first bucket, wherein the first bucket is identified based on the defined time range, makes a determination that a first file in the first bucket is present in the cache when the search query is received, and performs, in response to the search query, a search using the first file based on the determination that the first file is present in the cache when the search query is received.

18. The computer system of claim 17 , wherein the search process further: determines a length of time to download, to the cache, a second file in a second bucket; and performs, in response to the search query, the search using a third file that is in the cache based on the length of time without downloading the second file.

19. The computer system of claim 17 , wherein the first bucket comprises a metadata file comprising a host name, a source name, and a source type, and wherein performing the search using the first file comprises: determining a length of time to download, to the cache, the metadata file in the first bucket, wherein the search using the first file is performed based on the length of time without downloading the metadata file.

20. The computer system of claim 17 , wherein the search process further: performs, in response to the search query, the search using a second file from a second bucket once the second file is stored in the cache, wherein the second file is a metadata file; and eliminates the second bucket from the search query in response to processing the metadata file.

21. The computer system of claim 17 , wherein the search process further: performs, in response to the search query, the search using a second file from a second bucket once the second file is stored in the cache, wherein the second bucket comprises a metadata file comprising a host name, a source name, and a source type, and wherein performing the search using the second file comprises: comparing the host name, source name, and the source type to the search query to obtain a comparison result indicating that at least one of host name, source name, and the source type does not match the search query; and eliminating, from the search, the second bucket based on the comparison result, wherein eliminating the second bucket comprises not searching a remaining file in the second bucket.

22. The computer system of claim 17 , wherein the search process further: performs, in response to the search query, the search using a second file from a second bucket once the second file is stored in the cache, wherein the second bucket comprises a metadata file comprising a host name, a source name, and a source type, and wherein performing the search using the second file comprises: comparing the host name, source name, and the source type to the search query to obtain a comparison result indicating that the host name, source name, and the source type matches the search query; and performing, using a third file in the second bucket, the search based on the comparison result.

23. The computer system of claim 17 , wherein the search process further: identifies an order on a list of buckets for processing the search query, wherein the first bucket is processed out of the order based on the first file being in the cache when the search query is received.

24. The computer system of claim 17 , wherein the search process further: performs, in response to the search query, the search using a second file from a second bucket once the second file is stored in the cache; and selects the second file for eviction after performing the search using the second file based on a relative size of the second file in the cache.

25. A non-transitory computer-readable medium comprising instructions, execution of which in a computer system causes the computer system to: in response to a search query on a plurality of events, identify a first bucket in remote storage for processing the search query, the first bucket storing a subset of the plurality of events having timestamps within a defined time range of the first bucket, wherein the first bucket is identified based on the defined time range; make a determination that a first file in the first bucket is present in a cache when the search query is received; and perform, in response to the search query, a search using the first file based on the determination that the first file is present in the cache when the search query is received.

26. The non-transitory computer-readable medium of claim 25 , wherein the instructions further cause the computer system to: determine a length of time to download, to the cache, a second file in a second bucket; and perform, in response to the search query, the search using a third file that is in the cache based on the length of time without downloading the second file.

27. The non-transitory computer-readable medium of claim 25 , wherein the instructions further cause the computer system to: perform, in response to the search query, the search using a second file from a second bucket once the second file is stored in the cache, wherein the second bucket comprises a metadata file comprising a host name, a source name, and a source type, and wherein performing the search using the second file comprises: comparing the host name, source name, and the source type to the search query to obtain a comparison result indicating that at least one of host name, source name, and the source type does not match the search query; and eliminating, from the search, the second bucket based on the comparison result, wherein eliminating the second bucket comprises not searching a remaining file in the second bucket.

28. The non-transitory computer-readable medium of claim 25 , wherein the instructions further cause the computer system to: perform, in response to the search query, the search using a second file from a second bucket once the second file is stored in the cache, wherein the second bucket comprises a metadata file comprising a host name, a source name, and a source type, and wherein performing the search using the second file comprises: comparing the host name, source name, and the source type to the search query to obtain a comparison result indicating that the host name, source name, and the source type matches the search query; and performing, using a third file in the second bucket, the search based on the comparison result.

29. The non-transitory computer-readable medium of claim 25 , wherein the instructions further cause the computer system to: identify an order on a list of buckets for processing the search query, wherein the first bucket is processed out of the order based on the first file being in the cache when the search query is received.

30. The non-transitory computer-readable medium of claim 25 , wherein the instructions further cause the computer system to: perform, in response to the search query, the search using a second file from a second bucket once the second file is stored in the cache; and select the second file for eviction after performing the search using the second file based on a relative size of the second file in the cache.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G06F

Patent Metadata

Filing Date

July 30, 2018

Publication Date

June 2, 2020

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search