Automated Software Program Repair

PublishedSeptember 1, 2020

Assigneenot available in USPTO data we have

InventorsHiroaki YOSHIDA Mukul R. PRASAD

Technical Abstract

Patent Claims

20 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A method comprising: generating a first abstract syntax tree with respect to a first iteration of first source code of a first software program, the first iteration excluding a particular change in a particular portion of the first source code; generating a second abstract syntax tree with respect to a second iteration of the first source code, the second iteration including the particular change in the particular portion, the particular change including a plurality of modifications made with respect to the particular portion of the first source code; identifying a first sub-tree of the first abstract syntax tree that corresponds to the particular portion with respect to the first iteration of the first source code; identifying a plurality of second sub-trees of the second abstract syntax tree that correspond to the particular portion with respect to the second iteration of the first source code; generating a first textual representation of the first sub-tree; generating a plurality of second textual representations in which a respective second textual representation is generated for each of the second sub-trees; performing a difference determination between the first textual representation and each of the second textual representations; identifying, from the second textual representations based on the difference determination, one or more differing textual representations that differ from the first textual representation, each differing textual representation corresponding to one or more respective modifications of the particular change; determining a smallest-sized set of the differing textual representations that corresponds to a same particular event as the particular change, the particular event occurring with respect to the first source code from the first iteration to the second iteration; identifying, as secondary textual representations, the differing textual representations that are outside of the smallest-sized set, the secondary textual representations corresponding to secondary modifications of the plurality of modifications; identifying, as secondary trees, the second sub-trees that correspond to the secondary textual representations; modifying the second abstract syntax tree by removing the secondary trees from the second abstract syntax tree; obtaining a third iteration of the first source code by regenerating the first source code based on the modified second abstract syntax tree; and performing repair operations with respect to one or more of the first source code and second source code of a second software program based on the third iteration of the first source code.

2. The method of claim 1 , wherein performing the repair operations with respect to the second source code includes: identifying one or more errors in the second source code of based on executing a test suite with respect to the second source code; and identifying one or more repair candidates for the one or more errors based on the third iteration of the first source code.

3. The method of claim 2 , wherein identifying the one or more repair candidates based on the third iteration of the first source code is based on the one or more repair candidates having a code pattern similar to that of the third iteration of the first source code.

4. The method of claim 1 , further comprising: identifying a particular second sub-tree that corresponds to a particular differing textual representation that is included in the smallest-sized set, the identifying of the particular second sub-tree being based on the particular second sub-tree having a larger number of levels than the other second sub-trees that correspond to the other differing textual representations included in the smallest-sized set; identifying a plurality of additional sub-trees that are sub-trees of the particular second sub-tree; generating a plurality of additional textual representations in which a respective additional textual representation is generated for each of the additional sub-trees; performing an additional difference determination between the first textual representation and each of the additional textual representations; identifying, based on the additional difference determination, one or more additional differing textual representations that differ from the first textual representation, each additional differing textual representation corresponding to one or more respective modifications of the particular change; determining an additional smallest-sized set of the differing textual representations that corresponds to the same particular event as the first textual representation; identifying, as additional secondary textual representations, the additional differing textual representations that are outside of the additional smallest-sized set, the additional secondary textual representations corresponding to the secondary modifications of the plurality of modifications; and identifying, as additional secondary trees, the additional sub-trees that correspond to the additional secondary textual representations; wherein modifying the second abstract syntax tree further includes removing the additional secondary trees from the second abstract syntax tree.

5. The method of claim 1 , wherein determining the smallest-sized set includes: performing an event correspondence determination with respect to the particular change, the event correspondence determination identifying the particular event as corresponding to the particular change; performing the event correspondence determination with respect to each possible set of a plurality of possible sets of differing textual representations in which each possible set of differing textual representations includes one or more differing textual representation; identifying, as matching sets and based on the event correspondence determinations made with respect to the plurality of possible sets, which of the plurality of possible sets of differing textual representations correspond to the particular event; and identifying, as the smallest-sized set, a particular matching set of the plurality of possible sets that includes the fewest number of differing textual representations.

6. The method of claim 5 , wherein performing the event correspondence determination with respect to the particular change includes: identifying the particular event as a fault introduction event that corresponds to the particular change based on identifying a first software test of the first source code that passed without the particular change included in the first source code and that failed with the particular change included in the first source code; identifying the particular event as a fault correction event that corresponds to the particular change based on identifying a second software test of the first source code that failed without the particular change included in the first source code and that passed with the particular change included in the first source code; identifying the particular event as a defect introduction event that corresponds to the particular change based on a first defect not being identified from a first static analysis performed on the first source code without the particular change being included in the first source code and based on the first defect being identified from a second static analysis performed on the first source code with the particular change included in the first source code; identifying the particular event as a defect correction event that corresponds to the particular change based on a second defect that is identified from a third static analysis performed on the first source code with the particular change included in the first source code and based on the second defect not being identified from a fourth static analysis performed on the first source code with the particular change included in the first source code; or identifying the particular event as a platform migration event that corresponds to the particular change based on a first build of the first source code with the particular change included therein having an error that is omitted with respect to a second build of the first source code with the particular change included therein, the first build being performed using a first version of a particular platform and the second build being performed using a second version of the particular platform.

7. The method of claim 1 , wherein the particular change introduces a particular error in the first source code and the method further comprises: determining that a sub-portion of the particular portion corresponds to the particular error based on a comparison between the first iteration of the first source code and the third iteration of the first source code; wherein performing the repair operations includes modifying the sub-portion in response to determining that the sub-portion corresponds to the particular error.

8. One or more non-transitory computer-readable storage media configured to store instructions that, in response to being executed, cause a system to perform operations, the operations comprising: generating a first abstract syntax tree with respect to a first iteration of first source code of a first software program, the first iteration excluding a particular change in a particular portion of the first source code; generating a second abstract syntax tree with respect to a second iteration of the first source code, the second iteration including the particular change in the particular portion, the particular change including a plurality of modifications made with respect to the particular portion of the first source code; identifying a first sub-tree of the first abstract syntax tree that corresponds to the particular portion with respect to the first iteration of the first source code; identifying a plurality of second sub-trees of the second abstract syntax tree that correspond to the particular portion with respect to the second iteration of the first source code; generating a first textual representation of the first sub-tree; generating a plurality of second textual representations in which a respective second textual representation is generated for each of the second sub-trees; performing a difference determination between the first textual representation and each of the second textual representations; identifying, from the second textual representations based on the difference determination, one or more differing textual representations that differ from the first textual representation, each differing textual representation corresponding to one or more respective modifications of the particular change; determining a smallest-sized set of the differing textual representations that corresponds to a same particular event as the particular change, the particular event occurring with respect to the first source code from the first iteration to the second iteration; identifying, as secondary textual representations, the differing textual representations that are outside of the smallest-sized set, the secondary textual representations corresponding to secondary modifications of the plurality of modifications; identifying, as secondary trees, the second sub-trees that correspond to the secondary textual representations; modifying the second abstract syntax tree by removing the secondary trees from the second abstract syntax tree; obtaining a third iteration of the first source code by regenerating the first source code based on the modified second abstract syntax tree; and performing repair operations with respect to one or more of the first source code and second source code of a second software program based on the third iteration of the first source code.

9. The one or more computer-readable storage media of claim 8 , wherein performing the repair operations with respect to the second source code includes: identifying one or more errors in the second source code of based on executing a test suite with respect to the second source code; and identifying one or more repair candidates for the one or more errors based on the third iteration of the first source code.

10. The one or more computer-readable storage media of claim 9 , wherein identifying the one or more repair candidates based on the third iteration of the first source code is based on the one or more repair candidates having a code pattern similar to that of the third iteration of the first source code.

11. The one or more computer-readable storage media of claim 8 , wherein the operations further comprise: identifying a particular second sub-tree that corresponds to a particular differing textual representation that is included in the smallest-sized set, the identifying of the particular second sub-tree being based on the particular second sub-tree having a larger number of levels than the other second sub-trees that correspond to the other differing textual representations included in the smallest-sized set; identifying a plurality of additional sub-trees that are sub-trees of the particular second sub-tree; generating a plurality of additional textual representations in which a respective additional textual representation is generated for each of the additional sub-trees; performing an additional difference determination between the first textual representation and each of the additional textual representations; identifying, based on the additional difference determination, one or more additional differing textual representations that differ from the first textual representation, each additional differing textual representation corresponding to one or more respective modifications of the particular change; determining an additional smallest-sized set of the differing textual representations that corresponds to the same particular event as the first textual representation; identifying, as additional secondary textual representations, the additional differing textual representations that are outside of the additional smallest-sized set, the additional secondary textual representations corresponding to the secondary modifications of the plurality of modifications; and identifying, as additional secondary trees, the additional sub-trees that correspond to the additional secondary textual representations; wherein modifying the second abstract syntax tree further includes removing the additional secondary trees from the second abstract syntax tree.

12. The one or more computer-readable storage media of claim 8 , wherein determining the smallest-sized set includes: performing an event correspondence determination with respect to the particular change, the event correspondence determination identifying the particular event as corresponding to the particular change; performing the event correspondence determination with respect to each possible set of a plurality of possible sets of differing textual representations in which each possible set of differing textual representations includes one or more differing textual representation; identifying, as matching sets and based on the event correspondence determinations made with respect to the plurality of possible sets, which of the plurality of possible sets of differing textual representations correspond to the particular event; and identifying, as the smallest-sized set, a particular matching set of the plurality of possible sets that includes the fewest number of differing textual representations.

13. The one or more computer-readable storage media of claim 12 , wherein performing the event correspondence determination with respect to the particular change includes: identifying the particular event as a fault introduction event that corresponds to the particular change based on identifying a first software test of the first source code that passed without the particular change included in the first source code and that failed with the particular change included in the first source code; identifying the particular event as a fault correction event that corresponds to the particular change based on identifying a second software test of the first source code that failed without the particular change included in the first source code and that passed with the particular change included in the first source code; identifying the particular event as a defect introduction event that corresponds to the particular change based on a first defect not being identified from a first static analysis performed on the first source code without the particular change being included in the first source code and based on the first defect being identified from a second static analysis performed on the first source code with the particular change included in the first source code; identifying the particular event as a defect correction event that corresponds to the particular change based on a second defect that is identified from a third static analysis performed on the first source code with the particular change included in the first source code and based on the second defect not being identified from a fourth static analysis performed on the first source code with the particular change included in the first source code; or identifying the particular event as a platform migration event that corresponds to the particular change based on a first build of the first source code with the particular change included therein having an error that is omitted with respect to a second build of the first source code with the particular change included therein, the first build being performed using a first version of a particular platform and the second build being performed using a second version of the particular platform.

14. The one or more computer-readable storage media of claim 8 , wherein the particular change introduces a particular error in the first source code and the operations further comprise: determining that a sub-portion of the particular portion corresponds to the particular error based on a comparison between the first iteration of the first source code and the third iteration of the first source code; wherein performing the repair operations includes modifying the sub-portion in response to determining that the sub-portion corresponds to the particular error.

15. A system comprising: one or more computer-readable storage media configured to store instructions; and one or more processors communicatively coupled to the one or more computer- readable storage media and configured to, in response to execution of the instructions, cause the system to perform operations, the operations comprising: generating a first abstract syntax tree with respect to a first iteration of first source code of a first software program, the first iteration excluding a particular change in a particular portion of the first source code; generating a second abstract syntax tree with respect to a second iteration of the first source code, the second iteration including the particular change in the particular portion, the particular change including a plurality of modifications made with respect to the particular portion of the first source code; identifying a first sub-tree of the first abstract syntax tree that corresponds to the particular portion with respect to the first iteration of the first source code; identifying a plurality of second sub-trees of the second abstract syntax tree that correspond to the particular portion with respect to the second iteration of the first source code; generating a first textual representation of the first sub-tree; generating a plurality of second textual representations in which a respective second textual representation is generated for each of the second sub- trees; performing a difference determination between the first textual representation and each of the second textual representations; identifying, from the second textual representations based on the difference determination, one or more differing textual representations that differ from the first textual representation, each differing textual representation corresponding to one or more respective modifications of the particular change; determining a smallest-sized set of the differing textual representations that corresponds to a same particular event as the particular change, the particular event occurring with respect to the first source code from the first iteration to the second iteration; identifying, as secondary textual representations, the differing textual representations that are outside of the smallest-sized set, the secondary textual representations corresponding to secondary modifications of the plurality of modifications; identifying, as secondary trees, the second sub-trees that correspond to the secondary textual representations; modifying the second abstract syntax tree by removing the secondary trees from the second abstract syntax tree; obtaining a third iteration of the first source code by regenerating the first source code based on the modified second abstract syntax tree; and performing repair operations with respect to one or more of the first source code and second source code of a second software program based on the third iteration of the first source code.

16. The system of claim 15 , wherein performing the repair operations with respect to the second source code includes: identifying one or more errors in the second source code of based on executing a test suite with respect to the second source code; and identifying one or more repair candidates for the one or more errors based on the third iteration of the first source code.

17. The system of claim 15 , wherein the operations further comprise: identifying a particular second sub-tree that corresponds to a particular differing textual representation that is included in the smallest-sized set, the identifying of the particular second sub-tree being based on the particular second sub-tree having a larger number of levels than the other second sub-trees that correspond to the other differing textual representations included in the smallest-sized set; identifying a plurality of additional sub-trees that are sub-trees of the particular second sub-tree; generating a plurality of additional textual representations in which a respective additional textual representation is generated for each of the additional sub-trees; performing an additional difference determination between the first textual representation and each of the additional textual representations; identifying, based on the additional difference determination, one or more additional differing textual representations that differ from the first textual representation, each additional differing textual representation corresponding to one or more respective modifications of the particular change; determining an additional smallest-sized set of the differing textual representations that corresponds to the same particular event as the first textual representation; identifying, as additional secondary textual representations, the additional differing textual representations that are outside of the additional smallest-sized set, the additional secondary textual representations corresponding to the secondary modifications of the plurality of modifications; and identifying, as additional secondary trees, the additional sub-trees that correspond to the additional secondary textual representations; wherein modifying the second abstract syntax tree further includes removing the additional secondary trees from the second abstract syntax tree.

18. The system of claim 15 , wherein determining the smallest-sized set includes: performing an event correspondence determination with respect to the particular change, the event correspondence determination identifying the particular event as corresponding to the particular change; performing the event correspondence determination with respect to each possible set of a plurality of possible sets of differing textual representations in which each possible set of differing textual representations includes one or more differing textual representation; identifying, as matching sets and based on the event correspondence determinations made with respect to the plurality of possible sets, which of the plurality of possible sets of differing textual representations correspond to the particular event; and identifying, as the smallest-sized set, a particular matching set of the plurality of possible sets that includes the fewest number of differing textual representations.

19. The system of claim 18 , wherein performing the event correspondence determination with respect to the particular change includes: identifying the particular event as a fault introduction event that corresponds to the particular change based on identifying a first software test of the first source code that passed without the particular change included in the first source code and that failed with the particular change included in the first source code; identifying the particular event as a fault correction event that corresponds to the particular change based on identifying a second software test of the first source code that failed without the particular change included in the first source code and that passed with the particular change included in the first source code; identifying the particular event as a defect introduction event that corresponds to the particular change based on a first defect not being identified from a first static analysis performed on the first source code without the particular change being included in the first source code and based on the first defect being identified from a second static analysis performed on the first source code with the particular change included in the first source code; identifying the particular event as a defect correction event that corresponds to the particular change based on a second defect that is identified from a third static analysis performed on the first source code with the particular change included in the first source code and based on the second defect not being identified from a fourth static analysis performed on the first source code with the particular change included in the first source code; or identifying the particular event as a platform migration event that corresponds to the particular change based on a first build of the first source code with the particular change included therein having an error that is omitted with respect to a second build of the first source code with the particular change included therein, the first build being performed using a first version of a particular platform and the second build being performed using a second version of the particular platform.

20. The system of claim 15 , wherein the particular change introduces a particular error in the first source code and the operations further comprise: determining that a sub-portion of the particular portion corresponds to the particular error based on a comparison between the first iteration of the first source code and the third iteration of the first source code; wherein performing the repair operations includes modifying the sub-portion in response to determining that the sub-portion corresponds to the particular error.

Patent Metadata

Filing Date

Unknown

Publication Date

September 1, 2020

Inventors

Hiroaki YOSHIDA

Mukul R. PRASAD

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search