Two Stage Log Normalization

PublishedDecember 13, 2016

Assigneenot available in USPTO data we have

InventorsPhillip A.J. Cooper Jevon J.C. Hill Fiona L. Lam Kalvinder P. Singh

Technical Abstract

Patent Claims

17 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A method for two stage log normalization, the method comprising: retrieving, by one or more computer processors, a message format and a plurality of parameters from one or more log files; determining, by one or more computer processors, a classification for one or more first sequence files, wherein the one or more first sequence files includes the message format from the one or more log files; determining, by one or more computer processors, a classification of error for the one or more first sequence files; determining, by one or more computer processors, whether there is a high confidence in the classification of error for the one or more first sequence files; and responsive to a determination that there is not a high confidence in the classification of error for the one or more first sequence files, determining, by one or more computer processors, whether there is an improvement in confidence in the classification of error from one or more second sequence files, wherein the one or more second sequence files includes the message format and the plurality of parameters from the one or more log files, wherein determining whether there is an improvement in confidence in the classification of error from the one or more second sequence files includes: creating the one or more second sequence files, wherein creating includes retrieving a corresponding unique parameter identifier (ID) for each of the one or more log files and sequencing a corresponding unique message ID for each of the one or more log files with the respective unique parameter ID; re-classifying the one or more second sequence files, wherein re-classifying includes determining a classification of error for the one or more second sequence files; and determining a level of similarity between the classification of error for the one or more second sequence files and a plurality of existing trained data.

2. The method of claim 1 , wherein determining a classification for one or more first sequence files, further comprises: creating, by one or more computer processors, the one or more first sequence files, wherein creating includes retrieving the corresponding unique message identifier (ID) for each of the one or more log files; and utilizing, by one or more computer processors, conventional machine learning processes to train data for the classification of the one or more first sequence files.

3. The method of claim 1 , wherein determining a classification of error for the one or more first sequence files, further comprises: determining, by one or more computer processors, one or more similarities between the one or more first sequence files and a plurality of existing trained data.

4. The method of claim 1 , wherein determining whether there is a high confidence in the classification of error for the one or more first sequence files, further comprises: determining, by one or more computer processors, a level of similarity between the classification of error for the one or more first sequence files and a plurality of existing trained data.

5. The method of claim 4 , further comprising: responsive to a determination that the level of similarity between the classification of error for the one or more first sequence files and the plurality of existing trained data is high, determining, by one or more computer processors, that there is a high confidence in the classification of error; and responsive to a determination that the level of similarity between the classification of error for the one or more first sequence files and the plurality of existing trained data is low, determining, by one or more computer processors, that there is not a high confidence in the classification of error.

6. The method of claim 1 , further comprising: responsive to a determination that the level of similarity between the classification of error for the one or more second sequence files and the plurality of existing trained data has improved over a level of similarity between a classification of error for the one or more first sequence files and the plurality of existing trained data, determining, by one or more computer processors, there is an improvement in confidence in the classification of error for the one or more second sequence files; and responsive to a determination that the level of similarity between the classification of error for the one or more second sequence files and the plurality of existing trained data has not improved over a level of similarity between a classification of error for the one or more first sequence files and the plurality of existing trained data, determining, by one or more computer processors, there is not an improvement in confidence in the classification of error for the one or more second sequence files.

7. A computer program product for two stage log normalization, the computer program product comprising: one or more computer readable storage media and program instructions stored on the one or more computer readable storage media, the program instructions comprising: program instructions to retrieve a message format and a plurality of parameters from one or more log files; program instructions to determine a classification for one or more first sequence files, wherein the one or more first sequence files includes the message format from the one or more log files; program instructions to determine a classification of error for the one or more first sequence files; program instructions to determine whether there is a high confidence in the classification of error for the one or more first sequence files; and responsive to a determination that there is not a high confidence in the classification of error for the one or more first sequence files, program instructions to determine whether there is an improvement in confidence in the classification of error from one or more second sequence files, wherein the one or more second sequence files includes the message format and the plurality of parameters from the one or more log files, wherein determining whether there is an improvement in confidence in the classification of error from the one or more second sequence files includes: creating the one or more second sequence files, wherein creating includes retrieving a corresponding unique parameter identifier (ID) for each of the one or more log files and sequencing a corresponding unique message ID for each of the one or more log files with the respective unique parameter ID; re-classifying the one or more second sequence files, wherein re-classifying includes determining a classification of error for the one or more second sequence files; and determining a level of similarity between the classification of error for the one or more second sequence files and a plurality of existing trained data.

8. The computer program product of claim 7 , wherein program instructions to determine a classification for one or more first sequence files, further comprises: program instructions to create the one or more first sequence files, wherein creating includes retrieving the corresponding unique message identifier (ID) for each of the one or more log files; and program instructions to utilize conventional machine learning processes to train data for the classification of the one or more first sequence files.

9. The computer program product of claim 7 , wherein program instructions to determine a classification of error for the one or more first sequence files, further comprises: program instructions to determine one or more similarities between the one or more first sequence files and a plurality of existing trained data.

10. The computer program product of claim 7 , wherein program instructions to determine whether there is a high confidence in the classification of error for the one or more first sequence files, further comprises: program instructions to determine a level of similarity between the classification of error for the one or more first sequence files and a plurality of existing trained data.

11. The computer program product of claim 10 , further comprising: responsive to a determination that the level of similarity between the classification of error for the one or more first sequence files and the plurality of existing trained data is high, program instructions to determine that there is a high confidence in the classification of error; and responsive to a determination that the level of similarity between the classification of error for the one or more first sequence files and the plurality of existing trained data is low, program instructions to determine that there is not a high confidence in the classification of error.

12. The computer program product of claim 7 , further comprising: responsive to a determination that the level of similarity between the classification of error for the one or more second sequence files and the plurality of existing trained data has improved over a level of similarity between a classification of error for the one or more first sequence files and the plurality of existing trained data, program instructions to determine there is an improvement in confidence in the classification of error for the one or more second sequence files; and responsive to a determination that the level of similarity between the classification of error for the one or more second sequence files and the plurality of existing trained data has not improved over a level of similarity between a classification of error for the one or more first sequence files and the plurality of existing trained data, program instructions to determine there is not an improvement in confidence in the classification of error for the one or more second sequence files.

13. A computer system for two stage log normalization, the computer system comprising: one or more computer processors; one or more computer readable storage media; program instructions stored on at least one of the one or more computer readable storage media for execution by at least one of the one or more computer processors, the program instructions comprising: program instructions to retrieve a message format and a plurality of parameters from one or more log files; program instructions to determine a classification for one or more first sequence files, wherein the one or more first sequence files includes the message format from the one or more log files; program instructions to determine a classification of error for the one or more first sequence files; program instructions to determine whether there is a high confidence in the classification of error for the one or more first sequence files; and responsive to a determination that there is not a high confidence in the classification of error for the one or more first sequence files, program instructions to determine whether there is an improvement in confidence in the classification of error from one or more second sequence files, wherein the one or more second sequence files includes the message format and the plurality of parameters from the one or more log files, wherein determining whether there is an improvement in confidence in the classification of error from the one or more second sequence files includes: creating the one or more second sequence files, wherein creating includes retrieving a corresponding unique parameter identifier (ID) for each of the one or more log files and sequencing a corresponding unique message ID for each of the one or more log files with the respective unique parameter ID; re-classifying the one or more second sequence files, wherein re-classifying includes determining a classification of error for the one or more second sequence files; and determining a level of similarity between the classification of error for the one or more second sequence files and a plurality of existing trained data.

14. The computer system of claim 13 , wherein program instructions to determine a classification for one or more first sequence files, further comprises: program instructions to create the one or more first sequence files, wherein creating includes retrieving the corresponding unique message identifier (ID) for each of the one or more log files; and program instructions to utilize conventional machine learning processes to train data for the classification of the one or more first sequence files.

15. The computer system of claim 13 , wherein program instructions to determine whether there is a high confidence in the classification of error for the one or more first sequence files, further comprises: program instructions to determine a level of similarity between the classification of error for the one or more first sequence files and a plurality of existing trained data.

16. The computer system of claim 15 , further comprising: responsive to a determination that the level of similarity between the classification of error for the one or more first sequence files and the plurality of existing trained data is high, program instructions to determine that there is a high confidence in the classification of error; and responsive to a determination that the level of similarity between the classification of error for the one or more first sequence files and the plurality of existing trained data is low, program instructions to determine that there is not a high confidence in the classification of error.

17. The computer system of claim 13 , further comprising: responsive to a determination that the level of similarity between the classification of error for the one or more second sequence files and the plurality of existing trained data has improved over a level of similarity between a classification of error for the one or more first sequence files and the plurality of existing trained data, program instructions to determine there is an improvement in confidence in the classification of error for the one or more second sequence files; and responsive to a determination that the level of similarity between the classification of error for the one or more second sequence files and the plurality of existing trained data has not improved over a level of similarity between a classification of error for the one or more first sequence files and the plurality of existing trained data, program instructions to determine there is not an improvement in confidence in the classification of error for the one or more second sequence files.

Patent Metadata

Filing Date

Unknown

Publication Date

December 13, 2016

Inventors

Phillip A.J. Cooper

Jevon J.C. Hill

Fiona L. Lam

Kalvinder P. Singh

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search