Legal claims defining the scope of protection, as filed with the USPTO.
1. A method by a computer comprising: for each of a plurality of log records received from a host machine node as part of a log stream being output from a software source via the host machine node: identifying a template identifier within a template repository for a template string matching an invariant string of the log record; and identifying an attribute identifier in an attribute repository for an attribute string matching a variant string of the log record; partitioning the log records into a plurality of batches, each of the plurality of batches comprising a plurality of log records, each of the plurality of batches defined by a data structure comprising: a batched log record identifier that uniquely identifies the batch relative to the other batches; the template identifier for each of the plurality of log records within the batch; the attribute identifier for each of the plurality of log records within the batch; and a list of timestamps of the plurality of log records within the batch; and for each batch of the plurality of batches, storing the data structure for the batch into a log repository in a computer readable medium, the storing comprising: for each batch of the plurality of batches, performing data compression on the data structure for the batch to generate a compressed data structure; and for each batch of the plurality of batches, separately communicating, through a data network to a data server, an instruction to write the compressed data structure for the batch into memory of a log repository.
2. The method of claim 1 , wherein the identifying a template identifier within the template repository for a template string matching an invariant string of the log record, comprises: parsing content of the log record to generate strings; comparing the strings to template strings within the template repository; identifying one of the strings of the log record as the invariant string based on a match between the one of the strings and the template string; and identifying the attribute identifier associated with the template string.
3. The method of claim 1 , wherein the identifying a template identifier within the template repository for a template string matching an invariant string of the log record, comprises: parsing content of the log records to generate strings; comparing the strings of the log records to template strings within the template repository; identifying one of the strings of selected ones of the log records as the invariant string of the selected ones of the log records based on at least a threshold number of matches occurring between the one of the strings of the selected ones of the log records to a same one of the template strings within the template repository; and identifying the attribute identifier associated with the one of the template strings.
4. The method of claim 1 , wherein the identifying a template identifier within the template repository for a template string matching an invariant string of the log record, comprises: parsing content of a sequence of the log record to generate strings; comparing the strings to template strings within the template repository that are ordered in a defined sequence corresponding to an output sequence from a software source on the host machine node; identifying one of the strings of the log record as the invariant string based on a match between the one of the strings and one of the template strings and further based on a previous match identified between one of the strings of a previous one of the log records and a previous one of the template strings in the defined sequence; and identifying the attribute identifier associated with the one of the template strings.
5. The method of claim 1 , further comprising: generating a new template identifier for the invariant string of the log record based on identifying that no template string in the template repository matches the invariant string of the log record; and storing the new template identifier and the invariant string of the log record in the template repository with a logical association between the new template identifier and the invariant string of the log record.
6. The method of claim 1 , further comprising: generating a new attribute identifier for the variant string of the log record based on identifying that no attribute string in the attribute repository matches the variant string of the log record; and storing the new attribute identifier associated with the variant string of the log record in the attribute repository with a logical association between the new attribute identifier and the variant string of the log record.
7. The method of claim 1 , wherein: for each of the plurality of batches, the log records within the batch have timestamps within a defined time period.
8. The method of claim 1 , further comprising: receiving a search query defining a search term and a time period to be searched; determining a range of log identifiers to search based on the time period; selecting among the plurality of batches of the log records based on the range of log identifiers; retrieving at least one compressed data structure corresponding to at least one batch of the plurality of batches of the log records from the log repository based on the selecting; for each batch of the at least one batch of the plurality of batches retrieved from the log repository, performing decompression on the compressed data structure; for each of the plurality of log records of the batch, identifying the template identifier and the attribute identifier of the log record; retrieving the template string corresponding to the template identifier from the template repository; retrieving the attribute string corresponding to the attribute identifier from the attribute repository; and generating the log record based on the template string and the attribute string; searching for the search term defined by the search query among the log records; and returning the log records containing the search term as a response to the search query.
9. The method of claim 1 , further comprising: decompressing the plurality of batches of the log records retrieved from the log repository before the identifying the template identifier and the attribute identifier of the log record.
10. The method of claim 1 , wherein each log record corresponds to a routine, wherein identifying a template identifier within a template repository for a template string matching an invariant string of the log record comprises matching the invariant string with the invariant string of at least one other log record corresponding to the routine; and wherein identifying an attribute identifier in an attribute repository for an attribute string matching a variant string of the log record comprises identifying an attribute identifier in an attribute repository for an attribute string matching a variant string of the log record that is different from the variant string of at least one other corresponding log record.
11. A computer program product comprising: a non-transitory computer readable storage medium having computer readable program code embodied therewith, the computer readable program code comprising: computer readable program code to, for each of a plurality of log records received from a host machine node as part of a log stream from a software source via the host machine node, perform: identifying a template identifier within a template repository for a template string matching an invariant string of the log record; and identifying an attribute identifier in an attribute repository for an attribute string matching a variant string of the log record; computer readable program code to partition the log records into a plurality of batches, each of the plurality of batches comprising a plurality of log records, each of the plurality of batches defined by a data structure comprising: a batched log record identifier that uniquely identifies the batch relative to the other batches; the template identifier for each of the plurality of log records within the batch; the attribute identifier for each of the plurality of log records within the batch; and a list of timestamps of the plurality of log records within the batch; and computer readable program code to, for each batch of the plurality of batches, store the data structure for the batch into a log repository, the storing comprising: for each batch of the plurality of batches, performing data compression on the data structure for the batch to generate a compressed data structure; and for each batch of the plurality of batches, separately communicating, through a data network to a data server, an instruction to write the compressed data structure for the batch into memory of a log repository.
12. The computer program product of claim 11 , wherein the computer readable program code to identify a template identifier within the template repository for a template string matching an invariant string of the log record, comprises: computer readable program code to parse content of the log record to generate strings; computer readable program code to compare the strings to template strings within the template repository; computer readable program code to identify one of the strings of the log record as the invariant string based on a match between the one of the strings and the template string; and computer readable program code to identify the attribute identifier associated with the template string.
13. The computer program product of claim 11 , wherein the computer readable program code to identify a template identifier within the template repository for a template string matching an invariant string of the log record, comprises: computer readable program code to parse content of the log records to generate strings; computer readable program code to compare the strings of the log records to template strings within the template repository; computer readable program code to identify one of the strings of selected ones of the log records as the invariant string of the selected ones of the log records based on at least a threshold number of matches occurring between the one of the strings of the selected ones of the log records to a same one of the template strings within the template repository; and computer readable program code to identify the attribute identifier associated with the one of the template strings.
14. The computer program product of claim 11 , wherein the computer readable program code to identify a template identifier within the template repository for a template string matching an invariant string of the log record, comprises: computer readable program code to parse content of a sequence of the log record to generate strings; computer readable program code to compare the strings to template strings within the template repository that are ordered in a defined sequence corresponding to an output sequence from a software source on the host machine node; computer readable program code to identify one of the strings of the log record as the invariant string based on a match between the one of the strings and one of the template strings and further based on a previous match identified between one of the strings of a previous one of the log records and a previous one of the template strings in the defined sequence; and computer readable program code to identify the attribute identifier associated with the one of the template strings.
15. The computer program product of claim 11 , further comprising: computer readable program code to generate a new template identifier for the invariant string of the log record based on identifying that no template string in the template repository matches the invariant string of the log record; and computer readable program code to store the new template identifier and the invariant string of the log record in the template repository with a logical association between the new template identifier and the invariant string of the log record.
16. The computer program product of claim 11 , further comprising: computer readable program code to generate a new attribute identifier for the variant string of the log record based on identifying that no attribute string in the attribute repository matches the variant string of the log record; and computer readable program code to store the new attribute identifier associated with the variant string of the log record in the attribute repository with a logical association between the new attribute identifier and the variant string of the log record.
17. The computer program product of claim 11 , further comprising: computer readable program code to receive a search query defining a search term and a time period to be searched; computer readable program code to determine a range of log identifiers to search based on the time period; computer readable program code to select among the plurality of batches of the log records based on the range of log identifiers; computer readable program code to retrieve at least one compressed data structure corresponding to at least one batch of the plurality of batches of the log records from the log repository based on the selecting; for each batch of the plurality of batches retrieved from the log repository, computer readable program code to perform decompression on the compressed data structure; for each of the plurality of log records of the batch, computer readable program code to identify the template identifier and the attribute identifier of the log record; computer readable program code to retrieve the template string corresponding to the template identifier from the template repository; computer readable program code to retrieve the attribute string corresponding to the attribute identifier from the attribute repository; and computer readable program code to generate the log record based on the template string and the attribute string; computer readable program code to search for the search term defined by the search query among the log records; and computer readable program code to return the log records containing the search term as a response to the search query.
Unknown
August 14, 2018
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.