Legal claims defining the scope of protection, as filed with the USPTO.
1. A method comprising: parsing a software file into a plurality of abstract syntax trees (ASTs), the ASTs including a plurality of subtrees corresponding to a plurality of functions of the software files; generating a plurality of code vectors representative of one or more semantic properties of the plurality of subtrees; identifying a plurality of clusters for the plurality of subtrees; assigning a cluster identifier and a function label to the plurality of subtrees; storing the plurality of subtrees into a tree database and mapping the plurality of subtrees to respective ones of cluster identifiers and function names; training a model based on a feature vector and the plurality of clusters stored in the tree database, the feature vector including descriptive information corresponding to a function of at least one of the plurality of clusters; and predicting the cluster identifier for at least one of the plurality of subtrees, based on the trained model, to identify a name of the function.
2. The method of claim 1 , further including utilizing a k-nearest neighbors algorithm (KNN) to train the model.
3. The method of claim 1 , further including retrieving a plurality of subtrees from at least one of the tree database or a subtree encoder to extract features of the subtrees.
4. The method of claim 3 , further including initiating a training mode when the subtrees are retrieved from the tree database and initiating an inference mode when the subtrees are retrieved from the subtree encoder.
5. The method of claim 1 , further including determining a list of functions in a cluster that can replace a function corresponding to a legacy software.
6. The method of claim 5 , further including ranking the list of functions based on text-based similarity between functions in the list of functions and the function corresponding to the legacy software.
7. An apparatus comprising: a software parser to generate a plurality of abstract syntax trees (ASTs) based on a plurality of software files, the ASTs including a plurality of subtrees corresponding to a plurality of functions of the software files; a subtree encoder to generate a plurality of code vectors representative of one or more semantic properties of the plurality of subtrees; a function identifier to determine a plurality of clusters for the plurality of subtrees, the function identifier to assign a cluster identifier and a function label to the plurality of subtrees; a tree database to store the plurality of subtrees and map the plurality of subtrees to respective ones of cluster identifiers and function names; and a processor to: train a model based on a feature vector and the plurality of clusters stored in the tree database, the feature vector including descriptive information corresponding to a function of at least one of the plurality of clusters; and predict the cluster identifier for at least one of the plurality of subtrees, based on the trained model, to identify a name of the function.
8. The apparatus of claim 7 , wherein the processor includes a model trainer to train the model based on a k-nearest neighbors algorithm (KNN).
9. The apparatus of claim 7 , further including a feature extractor to receive a plurality of subtrees from at least one of the tree database or the subtree encoder to extract features of the subtrees.
10. The apparatus of claim 9 , wherein the feature extractor is to initiate a training mode when the subtrees are retrieved from the tree database and initiates an inference mode when the subtrees are provided by the subtree encoder.
11. The apparatus of claim 7 , wherein the processor further includes an inference generator in an inference mode to utilize the trained model to predict a cluster identifier based on a feature vector.
12. The apparatus of claim 7 , further including a ranking generator to determine a list of functions in a cluster that can replace a function corresponding to a legacy software.
13. The apparatus of claim 12 , wherein the ranking generator is to rank the list of functions based on text-based similarity between functions in the list of functions and the function corresponding to the legacy software.
14. A non-transitory computer readable storage medium comprising instructions that, when executed, cause a processor to at least: parse a software file into a plurality of abstract syntax trees (ASTs), the ASTs including a plurality of subtrees corresponding to a plurality of functions of the software files; generate a plurality of code vectors representative of one or more semantic properties of the plurality of subtrees; identify a plurality of clusters for the plurality of subtrees; assign a cluster identifier and a function label to the plurality of subtrees; store the plurality of subtrees into a tree database and map the plurality of subtrees to respective ones of cluster identifiers and function names; train a model based on a feature vector and the plurality of clusters stored in the tree database, the feature vector including descriptive information corresponding to a function of at least one of the plurality of clusters; and predict the cluster identifier for at least one of the plurality of subtrees, based on the trained model, to identify a name of the function.
15. The non-transitory computer readable storage medium as defined in claim 14 , wherein the instructions, when executed, cause the processor to train the model based on a k-nearest neighbors algorithm (KNN).
16. The non-transitory computer readable storage medium as defined in claim 14 , wherein the instructions, when executed, cause the processor to retrieve a plurality of subtrees from at least one of the tree database or a subtree encoder to extract features of the subtrees.
17. The non-transitory computer readable storage medium as defined in claim 16 , wherein the instructions, when executed, cause the processor to enter a training mode when the subtrees are retrieved from the tree database and enter an inference mode when the subtrees are retrieved from the subtree encoder.
18. The non-transitory computer readable storage medium as defined in claim 14 , wherein the instructions, when executed, cause the processor to determine a list of functions in a cluster that can replace a function corresponding to a legacy software.
19. The non-transitory computer readable storage medium as defined in claim 18 , wherein the instructions, when executed, cause the processor to rank the list of functions based on text-based similarity between functions in the list of functions and the function corresponding to the legacy software.
20. The non-transitory computer readable storage medium as defined in claim 14 , wherein the instructions, when executed, cause the processor to generate a viewable list of functions for a developer to review.
21. An apparatus for evolving computer programs, the apparatus comprising: means for parsing, the means for parsing to parse a plurality of abstract syntax trees (ASTs) based on a plurality of software files, the ASTs including a plurality of subtrees corresponding to a plurality of functions of the software files; means for encoding, the means for encoding to generate a plurality of code vectors representative of one or more semantic properties of the plurality of subtrees; means for determining, the means for determining to determine a plurality of clusters for the plurality of subtrees and to assign a cluster identifier and a function label to the plurality of subtrees; means for storing, the means for storing to store the plurality of subtrees and map the plurality of subtrees to respective ones of cluster identifiers and function names; and means for processing, the means for processing to: train a model based on a feature vector and the plurality of clusters, the feature vector including descriptive information corresponding to a function of at least one of the plurality of clusters; and predict the cluster identifier for at least one of the plurality of subtrees, based on the trained model, to identify a name of the function.
22. The apparatus of claim 21 , wherein the means for processing are to train the model based on a k-nearest neighbors algorithm (KNN).
23. The apparatus of claim 21 , wherein the means for processing are to operate in an inference mode to utilize the trained model to predict a cluster identifier based on a feature vector.
24. The apparatus of claim 21 , further including a means for generating, the means for generating to determine a list of functions in a cluster that can replace a function corresponding to a legacy software.
25. The apparatus of claim 24 , wherein the means for generating are to list functions based on text-based similarity between functions in the list of functions and the function corresponding to the legacy software.
Unknown
May 11, 2021
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.