APSEC2020 Keynote
- 1. Automated Program Repair
Abhik Roychoudhury
National University of Singapore
abhik@comp.nus.edu.sg
APSEC 2020 Keynote
1
ACKNOWLEDGEMENT: National Cyber Security Research program from NRF Singapore
- 2. SUPPOSE I AM UNWELL TODAY
2
APSEC 2020 Keynote
Which one is more
manageable?
- 3. Beyond Error Detection
APSEC 2020 Keynote
In the absence of formal specifications, analyze the
buggy program and its artifacts such as execution
traces via various heuristics to glean a specification
about how it can pass tests and what could have gone
wrong!
Specification Inference
(application: self-healing)
3
Buggy
Program
Tests
- 7. Over-fitting
APSEC 2020 Keynote
Tests with
oracles
Buggy
Program
Symbolic
Formulae
Program
Repair
Patched
Program
7
Tests: (ip1,op1), (ip2,op2), (ip3,op3), …
AVOID
if (ip1) return op1
else if (ip2) return op2
else …
- 8. Example
APSEC 2020 Keynote
Test id a b c oracle Pass
1 -1 -1 -1 INVALID
2 1 1 1 EQUILATERAL
3 2 2 3 ISOSCELES
4 2 3 2 ISOSCELES
5 3 2 2 ISOSCELES
6 2 3 4 SCALANE
1 int triangle(int a, int b, int c){
2 if (a <= 0 || b <= 0 || c <= 0)
3 return INVALID;
4 if (a == b && b == c)
5 return EQUILATERAL;
6 if (a == b || b != c) // bug!
7 return ISOSCELES;
8 return SCALENE;
9 }
Correct fix
(a == b || b== c || a == c)
Traverse all mutations of line 6 ??
Hard to generate fix since (a ==c) or (c ==a) never
appear anywhere else in the program !
8
- 9. Example
APSEC 2020 Keynote
Test id a b c oracle Pass
1 -1 -1 -1 INVALID
2 1 1 1 EQUILATERAL
3 2 2 3 ISOSCELES
4 2 3 2 ISOSCELES
5 3 2 2 ISOSCELES
6 2 3 4 SCALANE
1 int triangle(int a, int b, int c){
2 if (a <= 0 || b <= 0 || c <= 0)
3 return INVALID;
4 if (a == b && b == c)
5 return EQUILATERAL;
6 if (a == b || b != c) // bug!
7 return ISOSCELES;
8 return SCALENE;
9 }
Correct fix
(a == b || b== c || a == c)
Automatically generate the constraint
f(2,2,3) f(2,3,2) f(3,2,2) f(2,3,4)
Solution
f(a,b,c) = (a == b || b == c || a == c)
9
- 10. Comparison
Where to fix, which
line?
Generate patches in
the candidate line
Validate the candidate
patches against
correctness criterion.
Where to fix, which
line(s)?
What values should be
returned by those lines,
• e.g. <inp ==1, ret== 0>
What are the
expressions which will
return such values?
APSEC 2020 Keynote
Syntax-based Schematic
for e in Search-space{
Validate e againstTests
}
Semantics-basedSchematic
for t inTests {
generate repair constraintΨt
}
Synthesize e from ∧tΨt
10
- 12. Example
inhibit up_sep down_sep Observed
o/p
Oracle Pass
1 0 100 0 0
1 11 110 0 1
0 100 50 1 1
1 -20 60 0 1
0 0 10 0 0
APSEC 2020 Keynote
1 int is_upward( int inhibit, int up_sep, int down_sep){
2 int bias;
3 if (inhibit)
4 bias = down_sep; // bias= up_sep + 100
5 else bias = up_sep ;
6 if (bias > down_sep)
7 return 1;
8 else return 0;
9 }
12
- 13. Debugging
• Given a test-suiteT
– fail(s) º # of failing executions in which s occurs
– pass(s) º # of passing executions in which s occurs
– allfail ºTotal # of failing executions
– allpass º Total # of passing executions
• allfail+ allpass = |T|
• Can also use other metric likeOchiai.
Score(s) =
fail(s)
allfail
fail(s)
allfail
pass(s)
allpass
+
Buggy
Program
Test Suite
-Investigate what
this statement
should be.
- Generate a fixed
statement
Fixed
Program
YES
NO
APSEC 2020 Keynote
13
- 15. Symbolic
Execution (Inset)
APSEC 2020 Keynote
int test_me(int Climb, int Up){
int sep, upward;
if (Climb > 0){
sep = Up;}
else {sep = add100(Up);}
if (sep > 150){
upward = 1;
} else {upward = 0;}
if (upward < 0){
abort;
} else return upward;
}
15
- 17. Example
APSEC 2020 Keynote
• Accumulated constraints
– f(1,11, 110) > 110
– f(1,0,100) ≤ 100
– …
• Find a f satisfying this constraint
– By fixing the set of operators appearing in f
• Candidate methods
• Search over the space of expressions
• Program synthesis with fixed set of operators
– Can also be achieved by second-order constraint solving
• Generated fix
– f(inhibit,up_sep,down_sep) = up_sep + 100
17
- 18. Second-order
Reasoning
APSEC 2020 Keynote
18
• Two approaches
– Get property of function f via symbolic execution, and
synthesize a function f satisfying these properties.
– Directly solve for function f by building a second-order
symbolic execution engine.
• Allow for existentially quantified second order variables.
• Restrict their interpretation to a language e.g. linear
integer arithmetic
Term =Var |Constant |Term +Term |Term –Term |Constant *Term
• Example SAT
– (0) > 0 (1) ≤ 0
– Satisfying solution = x. 1 – x
- 25. Repair Constraint
APSEC 2020 Keynote
• SemFix work (ICSE 2013)
– Example: for an identified expression e to be fixed
• [ X > 0 ] ∧ f(t) == X for each test t
• DirectFix work (ICSE 2015)
– Whole Program as repair constraint
– Use the principle of minimality to synthesize a minimal patch.
• Angelix work (ICSE 2016)
– Example: for identified expressions e1, e2, … to be fixed
– [ (X == 1) ∨ (X == 2) ∨ (X== 3)] ∧ f(t) ==X for each test t.
– [ (X== 1 ∧Y == 1) ∨ (X==2 ∧Y ==2)] ∧ f(t) ==X ∧g(t)==Y for each test t.
25
- 29. Test-
equivalence
APSEC 2020 Keynote
scanf ("%d" ,&x);
for (i = 0; i < 10; i++)
if (x – i > 0)
printf ("1");
else
printf ("0");
Consider all
inequalities
𝛼𝑥 ± 𝛽𝑖 [>≥=≠] 𝛾
Sequence of values: Equivalence class (x = 4):
{T, T, T, T, T, T, T, T, T, T} {x > 0, x > 1, …}
{T, T, T, T, T, T, T, T, T, F} {x – i > -5, …}
{T, T, T, T, T, T, T, T, F, T} EMPTY
{T, T, T, T, T, T, T, T, F, F} {x – i > -4, …}
{T, T, T, T, T, T, T, F, T, T} EMPTY
{T, T, T, T, T, T, T, F, T, F} EMPTY
{T, T, T, T, T, T, T, F, F,T} EMPTY
…
29
- 31. Combat over-fitting: Fuzz Testing
APSEC 2020 Keynote
31
Crashing patches
Search space Crash-free patches
Distinguish crashing and crash-free patches (practical)
Correct patches
Crashing patches may (1) partially fix the crash or (2) unexpectedly introduce new crash
Test
generation
Test cases Repair
Buggy
program
Patched program
Auto-generate
tests
P
P
- 32. APSEC 2020 Keynote
32
Fix2Fit char* strncpy(char* s,char* t, int n) {
for(int i=0; i<n;i++) // buffer overflow or data leakage
t[i]=s[i];
}
copy the first n characters of s to t.
{p1, p2,p3}
{p1, p3} {p2}
{p1} {p3}
ID Plausible patch
P1 i <n && i!=3
p2 i <5
p3 i <n && i<strlen(s)
correct patch
crashing patch
s=“foo”, n=5
s=“fo”, n=5
mutate
crashing patch
- 35. Application: Security
APSEC 2020 Keynote
35
“The C and C++ programming languages are notoriously insecure yet remain indispensable. Developers
therefore resort to a multi-pronged approach to find security issues before adversaries. These include
manual, static, and dynamic program analysis. Dynamic bug finding tools or "sanitizers" --- can find bugs
that elude other types of analysis because they observe the actual execution of a program, and can
therefore directly observe incorrect program behavior as it happens.” Song et al 2018.
Time to Fix
Number of vulnerabilities in 2019
overall number of new vulnerabilities: (20,362)
- 36. Combat
Overfitting:
Constraint
Extraction
APSEC 2020 Keynote
36
Repair
Buggy program
Patched program
P
P
Constraints
• Program vulnerability can be formalized as violations of
constraints, e.g. buffer overflow
access(buffer) < base(buffer) + size(buffer)
char getValue(char[] arr,int index){
intlen =size(arr);
if (index <= len) // errorlocation
return arr[index];
return 0;
}
failing input: arr={1, 2, 3}, index=3
additional specifications to fix the bug for all tests
Concrete Buggy
state: arr[3]
Abstracted constraint
violation: index > len
- 37. Constraint
Propagation
APSEC 2020 Keynote
37
𝜑’ {P} 𝜑
crashing locationfix location
• Propagate crash-free constraint 𝜑 from crash location
to fix location by calculating weakest precondition
[e ⟼ e’]𝜑’ {P} 𝜑
• The goal of repair is to ensure 𝜑’ is satisfied at the fix
location.
ExtractFix
- 43. Reference Solution Incorrect Student Program
def search(x, seq):
for i in range(len(seq)):
if x <= seq[i]:
return i
return len(seq)
def search(e, lst):
for j in range(len(lst)):
if e < lst[j]:
return j
else:
j = j + 1
return len(lst) + 1
Repair Incorrect Student Program
def search(e, lst):
for j in range(len(lst)):
if e <= lst[j]:
return j
else:
pass
return len(lst)
def search(e, lst):
for j in range(len(lst)):
if e < lst[j]:
return j
else:
j = j + 1
return len(lst) + 1
Refactored Correct Solution Incorrect Student Program
def search(x, seq):
for i in range(len(seq)):
if x <= seq[i]:
return i
else:
pass
return len(seq)
def search(e, lst):
for j in range(len(lst)):
if e < lst[j]:
return j
else:
j = j + 1
return len(lst) + 1
Example:
Write a Python program which
* Given a sorted sequence seq
* Counts the number of elements smaller than x
43
- 44. Most Relevant Results
Semantic Program Repair Using a Reference Implementation ( PDF )
ICSE 2018.
Angelix: Scalable Multiline Program Patch Synthesis via Symbolic Analysis ( pdf )
ICSE 2016.
DirectFix: Looking for Simple Program Repairs ( PDF )
ICSE 2015.
SemFix: Program Repair via Semantic Analysis ( pdf )
ICSE 2013.
Symbolic execution with second order existential constraints
ESEC-FSE 2018.
ACKNOWLEDGEMENT: National Cyber Security Research program from NRF Singapore
http://www.comp.nus.edu.sg/~tsunami/ https://www.comp.nus.edu.sg/~nsoe-tss/
Crash-Avoiding Program Repair
ISSTA 2019.
A Feasibility Study of Using Automated Program Repair for Introductory Programming Assignments
ESEC-FSE 2017.
44
- 45. Perspective
APSEC 2020 Keynote
Automated Program Repair
C. Le Goues, M. Pradel, A. Roychoudhury
Review Article,Communications of the ACM, 2019.
45
abhik@comp.nus.edu.sg
https://www.comp.nus.edu.sg/~abhik