Methods and Apparatus for Recommending Computer Program Updates Utilizing a Trained Model

PublishedMay 11, 2021

Assigneenot available in USPTO data we have

InventorsShengtian Zhou Mohammad Mejbah ul Alam Justin Gottschlich

Technical Abstract

Patent Claims

25 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A method comprising: parsing a software file into a plurality of abstract syntax trees (ASTs), the ASTs including a plurality of subtrees corresponding to a plurality of functions of the software files; generating a plurality of code vectors representative of one or more semantic properties of the plurality of subtrees; identifying a plurality of clusters for the plurality of subtrees; assigning a cluster identifier and a function label to the plurality of subtrees; storing the plurality of subtrees into a tree database and mapping the plurality of subtrees to respective ones of cluster identifiers and function names; training a model based on a feature vector and the plurality of clusters stored in the tree database, the feature vector including descriptive information corresponding to a function of at least one of the plurality of clusters; and predicting the cluster identifier for at least one of the plurality of subtrees, based on the trained model, to identify a name of the function.

2. The method of claim 1 , further including utilizing a k-nearest neighbors algorithm (KNN) to train the model.

3. The method of claim 1 , further including retrieving a plurality of subtrees from at least one of the tree database or a subtree encoder to extract features of the subtrees.

4. The method of claim 3 , further including initiating a training mode when the subtrees are retrieved from the tree database and initiating an inference mode when the subtrees are retrieved from the subtree encoder.

5. The method of claim 1 , further including determining a list of functions in a cluster that can replace a function corresponding to a legacy software.

6. The method of claim 5 , further including ranking the list of functions based on text-based similarity between functions in the list of functions and the function corresponding to the legacy software.

7. An apparatus comprising: a software parser to generate a plurality of abstract syntax trees (ASTs) based on a plurality of software files, the ASTs including a plurality of subtrees corresponding to a plurality of functions of the software files; a subtree encoder to generate a plurality of code vectors representative of one or more semantic properties of the plurality of subtrees; a function identifier to determine a plurality of clusters for the plurality of subtrees, the function identifier to assign a cluster identifier and a function label to the plurality of subtrees; a tree database to store the plurality of subtrees and map the plurality of subtrees to respective ones of cluster identifiers and function names; and a processor to: train a model based on a feature vector and the plurality of clusters stored in the tree database, the feature vector including descriptive information corresponding to a function of at least one of the plurality of clusters; and predict the cluster identifier for at least one of the plurality of subtrees, based on the trained model, to identify a name of the function.

8. The apparatus of claim 7 , wherein the processor includes a model trainer to train the model based on a k-nearest neighbors algorithm (KNN).

9. The apparatus of claim 7 , further including a feature extractor to receive a plurality of subtrees from at least one of the tree database or the subtree encoder to extract features of the subtrees.

10. The apparatus of claim 9 , wherein the feature extractor is to initiate a training mode when the subtrees are retrieved from the tree database and initiates an inference mode when the subtrees are provided by the subtree encoder.

11. The apparatus of claim 7 , wherein the processor further includes an inference generator in an inference mode to utilize the trained model to predict a cluster identifier based on a feature vector.

12. The apparatus of claim 7 , further including a ranking generator to determine a list of functions in a cluster that can replace a function corresponding to a legacy software.

13. The apparatus of claim 12 , wherein the ranking generator is to rank the list of functions based on text-based similarity between functions in the list of functions and the function corresponding to the legacy software.

14. A non-transitory computer readable storage medium comprising instructions that, when executed, cause a processor to at least: parse a software file into a plurality of abstract syntax trees (ASTs), the ASTs including a plurality of subtrees corresponding to a plurality of functions of the software files; generate a plurality of code vectors representative of one or more semantic properties of the plurality of subtrees; identify a plurality of clusters for the plurality of subtrees; assign a cluster identifier and a function label to the plurality of subtrees; store the plurality of subtrees into a tree database and map the plurality of subtrees to respective ones of cluster identifiers and function names; train a model based on a feature vector and the plurality of clusters stored in the tree database, the feature vector including descriptive information corresponding to a function of at least one of the plurality of clusters; and predict the cluster identifier for at least one of the plurality of subtrees, based on the trained model, to identify a name of the function.

15. The non-transitory computer readable storage medium as defined in claim 14 , wherein the instructions, when executed, cause the processor to train the model based on a k-nearest neighbors algorithm (KNN).

16. The non-transitory computer readable storage medium as defined in claim 14 , wherein the instructions, when executed, cause the processor to retrieve a plurality of subtrees from at least one of the tree database or a subtree encoder to extract features of the subtrees.

17. The non-transitory computer readable storage medium as defined in claim 16 , wherein the instructions, when executed, cause the processor to enter a training mode when the subtrees are retrieved from the tree database and enter an inference mode when the subtrees are retrieved from the subtree encoder.

18. The non-transitory computer readable storage medium as defined in claim 14 , wherein the instructions, when executed, cause the processor to determine a list of functions in a cluster that can replace a function corresponding to a legacy software.

19. The non-transitory computer readable storage medium as defined in claim 18 , wherein the instructions, when executed, cause the processor to rank the list of functions based on text-based similarity between functions in the list of functions and the function corresponding to the legacy software.

20. The non-transitory computer readable storage medium as defined in claim 14 , wherein the instructions, when executed, cause the processor to generate a viewable list of functions for a developer to review.

21. An apparatus for evolving computer programs, the apparatus comprising: means for parsing, the means for parsing to parse a plurality of abstract syntax trees (ASTs) based on a plurality of software files, the ASTs including a plurality of subtrees corresponding to a plurality of functions of the software files; means for encoding, the means for encoding to generate a plurality of code vectors representative of one or more semantic properties of the plurality of subtrees; means for determining, the means for determining to determine a plurality of clusters for the plurality of subtrees and to assign a cluster identifier and a function label to the plurality of subtrees; means for storing, the means for storing to store the plurality of subtrees and map the plurality of subtrees to respective ones of cluster identifiers and function names; and means for processing, the means for processing to: train a model based on a feature vector and the plurality of clusters, the feature vector including descriptive information corresponding to a function of at least one of the plurality of clusters; and predict the cluster identifier for at least one of the plurality of subtrees, based on the trained model, to identify a name of the function.

22. The apparatus of claim 21 , wherein the means for processing are to train the model based on a k-nearest neighbors algorithm (KNN).

23. The apparatus of claim 21 , wherein the means for processing are to operate in an inference mode to utilize the trained model to predict a cluster identifier based on a feature vector.

24. The apparatus of claim 21 , further including a means for generating, the means for generating to determine a list of functions in a cluster that can replace a function corresponding to a legacy software.

25. The apparatus of claim 24 , wherein the means for generating are to list functions based on text-based similarity between functions in the list of functions and the function corresponding to the legacy software.

Patent Metadata

Filing Date

Unknown

Publication Date

May 11, 2021

Inventors

Shengtian Zhou

Mohammad Mejbah ul Alam

Justin Gottschlich

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search