APSEC2020 Keynote

Automated Program Repair
Abhik Roychoudhury
National University of Singapore
abhik@comp.nus.edu.sg
APSEC 2020 Keynote
1
ACKNOWLEDGEMENT: National Cyber Security Research program from NRF Singapore

SUPPOSE I AM UNWELL TODAY
2
APSEC 2020 Keynote
Which one is more
manageable?

Beyond Error Detection
APSEC 2020 Keynote
In the absence of formal specifications, analyze the
buggy program and its artifacts such as execution
traces via various heuristics to glean a specification
about how it can pass tests and what could have gone
wrong!
Specification Inference
(application: self-healing)
3
Buggy
Program
Tests

Repair: Why?
4
APSEC 2020 Keynote
Education
Productivity
Security

Search
APSEC 2020 Keynote
Applicability
Scalability
Over-fitting
Large program?
Large search space?
5
Ack: figure from C Le Goues

Program
Repair
APSEC 2020 Keynote
REPLACETHIS FLOW
Buggy
Program
Tests
6

Over-fitting
APSEC 2020 Keynote
Tests with
oracles
Buggy
Program
Symbolic
Formulae
Program
Repair
Patched
Program
7
Tests: (ip1,op1), (ip2,op2), (ip3,op3), …
AVOID
if (ip1) return op1
else if (ip2) return op2
else …

Example
APSEC 2020 Keynote
Test id a b c oracle Pass
1 -1 -1 -1 INVALID
2 1 1 1 EQUILATERAL
3 2 2 3 ISOSCELES
4 2 3 2 ISOSCELES
5 3 2 2 ISOSCELES
6 2 3 4 SCALANE
1 int triangle(int a, int b, int c){
2 if (a <= 0 || b <= 0 || c <= 0)
3 return INVALID;
4 if (a == b && b == c)
5 return EQUILATERAL;
6 if (a == b || b != c) // bug!
7 return ISOSCELES;
8 return SCALENE;
9 }
Correct fix
(a == b || b== c || a == c)
Traverse all mutations of line 6 ??
Hard to generate fix since (a ==c) or (c ==a) never
appear anywhere else in the program !
8

Example
APSEC 2020 Keynote
Test id a b c oracle Pass
1 -1 -1 -1 INVALID
2 1 1 1 EQUILATERAL
3 2 2 3 ISOSCELES
4 2 3 2 ISOSCELES
5 3 2 2 ISOSCELES
6 2 3 4 SCALANE
1 int triangle(int a, int b, int c){
2 if (a <= 0 || b <= 0 || c <= 0)
3 return INVALID;
4 if (a == b && b == c)
5 return EQUILATERAL;
6 if (a == b || b != c) // bug!
7 return ISOSCELES;
8 return SCALENE;
9 }
Correct fix
(a == b || b== c || a == c)
Automatically generate the constraint
f(2,2,3)  f(2,3,2)  f(3,2,2)   f(2,3,4)
Solution
f(a,b,c) = (a == b || b == c || a == c)
9

Comparison
Where to fix, which
line?
Generate patches in
the candidate line
Validate the candidate
patches against
correctness criterion.
Where to fix, which
line(s)?
What values should be
returned by those lines,
• e.g. <inp ==1, ret== 0>
What are the
expressions which will
return such values?
APSEC 2020 Keynote
Syntax-based Schematic
for e in Search-space{
Validate e againstTests
}
Semantics-basedSchematic
for t inTests {
generate repair constraintΨt
}
Synthesize e from ∧tΨt
10

Specification
Inference
APSEC 2020 Keynote
var = f(live_vars) // X
Test input t
Concrete
values
Oracle (expected output)
Output:
Value-set or Constraint
Symbolic
execution
Program
Concrete Execution
[ICSE13] 11

Example
inhibit up_sep down_sep Observed
o/p
Oracle Pass
1 0 100 0 0
1 11 110 0 1
0 100 50 1 1
1 -20 60 0 1
0 0 10 0 0
APSEC 2020 Keynote
1 int is_upward( int inhibit, int up_sep, int down_sep){
2 int bias;
3 if (inhibit)
4 bias = down_sep; // bias= up_sep + 100
5 else bias = up_sep ;
6 if (bias > down_sep)
7 return 1;
8 else return 0;
9 }
12

Debugging
• Given a test-suiteT
– fail(s) º # of failing executions in which s occurs
– pass(s) º # of passing executions in which s occurs
– allfail ºTotal # of failing executions
– allpass º Total # of passing executions
• allfail+ allpass = |T|
• Can also use other metric likeOchiai.
Score(s) =
fail(s)
allfail
fail(s)
allfail
pass(s)
allpass
+
Buggy
Program
Test Suite
-Investigate what
this statement
should be.
- Generate a fixed
statement
Fixed
Program
YES
NO
APSEC 2020 Keynote
13

Symbolic
Execution (Inset)
APSEC 2020 Keynote
int test_me(int Climb, int Up){
int sep, upward;
if (Climb > 0){
sep = Up;}
else {sep = add100(Up);}
if (sep > 150){
upward = 1;
} else {upward = 0;}
if (upward < 0){
abort;
} else return upward;
}
15

Example
APSEC 2020 Keynote
• Accumulated constraints
– f(1,11, 110) > 110 
– f(1,0,100) ≤ 100 
– …
• Find a f satisfying this constraint
– By fixing the set of operators appearing in f
• Candidate methods
• Search over the space of expressions
• Program synthesis with fixed set of operators
– Can also be achieved by second-order constraint solving
• Generated fix
– f(inhibit,up_sep,down_sep) = up_sep + 100
17

Second-order
Reasoning
APSEC 2020 Keynote
18
• Two approaches
– Get property of function f via symbolic execution, and
synthesize a function f satisfying these properties.
– Directly solve for function f by building a second-order
symbolic execution engine.
• Allow for existentially quantified second order variables.
• Restrict their interpretation to a language e.g. linear
integer arithmetic
Term =Var |Constant |Term +Term |Term –Term |Constant *Term
• Example SAT
– (0) > 0  (1) ≤ 0
– Satisfying solution  = x. 1 – x

First order vs.
Second order
19
APSEC 2020 Keynote

Combat Over-fitting:
Symbolic Inference
APSEC 2020 Keynote
20
Tests with
oracles
Buggy
Program
Symbolic
Formulae
Program
Repair
Patched
Program
TCAS

Repair
Workflow
APSEC 2020 Keynote
21

Simplified
Workflow, but
APSEC 2020 Keynote
Applicability
Over-fitting
Scalability
[DirectFix,ICSE15] 22

Workflows
APSEC 2020 Keynote
Applicability
Over-fitting
Scalability
23

Repair Constraint
APSEC 2020 Keynote
• SemFix work (ICSE 2013)
– Example: for an identified expression e to be fixed
• [ X > 0 ] ∧ f(t) == X for each test t
• DirectFix work (ICSE 2015)
– Whole Program as repair constraint
– Use the principle of minimality to synthesize a minimal patch.
• Angelix work (ICSE 2016)
– Example: for identified expressions e1, e2, … to be fixed
– [ (X == 1) ∨ (X == 2) ∨ (X== 3)] ∧ f(t) ==X for each test t.
– [ (X== 1 ∧Y == 1) ∨ (X==2 ∧Y ==2)] ∧ f(t) ==X ∧g(t)==Y for each test t.
25

PATCH
QUALITY
26
APSEC 2020 Keynote

(Test-based)
Program
Repair
Syntax-based Schematic
Semantic Schematic
for t inTests {
generate repair constraintΨt
}
Synthesize e from ∧tΨt
APSEC 2020 Keynote
for e in Search-space{
Validate e against Tests
}
27

Middle Way
中道
Madhyamāpratipada
APSEC 2020 Keynote
28

Test-
equivalence
APSEC 2020 Keynote
scanf ("%d" ,&x);
for (i = 0; i < 10; i++)
if (x – i > 0)
printf ("1");
else
printf ("0");
Consider all
inequalities
𝛼𝑥 ± 𝛽𝑖 [>≥=≠] 𝛾
Sequence of values: Equivalence class (x = 4):
{T, T, T, T, T, T, T, T, T, T} {x > 0, x > 1, …}
{T, T, T, T, T, T, T, T, T, F} {x – i > -5, …}
{T, T, T, T, T, T, T, T, F, T} EMPTY
{T, T, T, T, T, T, T, T, F, F} {x – i > -4, …}
{T, T, T, T, T, T, T, F, T, T} EMPTY
{T, T, T, T, T, T, T, F, T, F} EMPTY
{T, T, T, T, T, T, T, F, F,T} EMPTY
…
29

Repair
Efficiency
APSEC 2020 Keynote
30
[TOSEM18]

Combat over-fitting: Fuzz Testing
APSEC 2020 Keynote
31
Crashing patches
Search space Crash-free patches
Distinguish crashing and crash-free patches (practical)
Correct patches
Crashing patches may (1) partially fix the crash or (2) unexpectedly introduce new crash
Test
generation
Test cases Repair
Buggy
program
Patched program
Auto-generate
tests
P
P

APSEC 2020 Keynote
32
Fix2Fit char* strncpy(char* s,char* t, int n) {
for(int i=0; i<n;i++) // buffer overflow or data leakage
t[i]=s[i];
}
copy the first n characters of s to t.
{p1, p2,p3}
{p1, p3} {p2}
{p1} {p3}
ID Plausible patch
P1 i <n && i!=3
p2 i <5
p3 i <n && i<strlen(s)
correct patch
crashing patch
s=“foo”, n=5
s=“fo”, n=5
mutate
crashing patch

Fix2Fit
APSEC 2020 Keynote
33
Integration of repair into programming environments?
Number of plausible patches that can be reduced if the tests are
empowered with more oracles

Applications
of
Repair
34
APSEC 2020 Keynote
Repair of security vulnerability
Repair of embedded software
Repair as feedback
for programming
education
Automated
grading
Feedback to
students for
making
progress.

Application: Security
APSEC 2020 Keynote
35
“The C and C++ programming languages are notoriously insecure yet remain indispensable. Developers
therefore resort to a multi-pronged approach to find security issues before adversaries. These include
manual, static, and dynamic program analysis. Dynamic bug finding tools or "sanitizers" --- can find bugs
that elude other types of analysis because they observe the actual execution of a program, and can
therefore directly observe incorrect program behavior as it happens.” Song et al 2018.
Time to Fix
Number of vulnerabilities in 2019
overall number of new vulnerabilities: (20,362)

Combat
Overfitting:
Constraint
Extraction
APSEC 2020 Keynote
36
Repair
Buggy program
Patched program
P
P
Constraints
• Program vulnerability can be formalized as violations of
constraints, e.g. buffer overflow
access(buffer) < base(buffer) + size(buffer)
char getValue(char[] arr,int index){
intlen =size(arr);
if (index <= len) // errorlocation
return arr[index];
return 0;
}
failing input: arr={1, 2, 3}, index=3
additional specifications to fix the bug for all tests
Concrete Buggy
state: arr[3]
Abstracted constraint
violation: index > len

Constraint
Propagation
APSEC 2020 Keynote
37
𝜑’ {P} 𝜑
crashing locationfix location
• Propagate crash-free constraint 𝜑 from crash location
to fix location by calculating weakest precondition
[e ⟼ e’]𝜑’ {P} 𝜑
• The goal of repair is to ensure 𝜑’ is satisfied at the fix
location.
ExtractFix

Effectivenes
s
APSEC 2020 Keynote
38
Number of fixed vulnerabilities out of 30 subjects

Applications:
Embedded SW
APSEC 2020 Keynote
FromTests?
From Programs?
39

Analyzing
Linux Busybox
APSEC 2020 Keynote
40
[ICSE18]

Other
Applications:
Education
APSEC 2020 Keynote
Education
Productivity
Security
Intelligent tutoring systems:Automated grading and
hint generation via Program Repair
Detailed Study in IIT-Kanpur, India [FSE17, and ongoing] 41

Repair in steps
APSEC 2020 Keynote
42

Reference Solution Incorrect Student Program
def search(x, seq):
for i in range(len(seq)):
if x <= seq[i]:
return i
return len(seq)
def search(e, lst):
for j in range(len(lst)):
if e < lst[j]:
return j
else:
j = j + 1
return len(lst) + 1
Repair Incorrect Student Program
def search(e, lst):
if e <= lst[j]:
return j
else:
pass
return len(lst)
def search(e, lst):
if e < lst[j]:
return j
else:
j = j + 1
return len(lst) + 1
Refactored Correct Solution Incorrect Student Program
def search(x, seq):
for i in range(len(seq)):
if x <= seq[i]:
return i
else:
pass
return len(seq)
def search(e, lst):
if e < lst[j]:
return j
else:
j = j + 1
return len(lst) + 1
Example:
Write a Python program which
* Given a sorted sequence seq
* Counts the number of elements smaller than x
43

Most Relevant Results
Semantic Program Repair Using a Reference Implementation ( PDF )
ICSE 2018.
Angelix: Scalable Multiline Program Patch Synthesis via Symbolic Analysis ( pdf )
ICSE 2016.
DirectFix: Looking for Simple Program Repairs ( PDF )
ICSE 2015.
SemFix: Program Repair via Semantic Analysis ( pdf )
ICSE 2013.
Symbolic execution with second order existential constraints
ESEC-FSE 2018.
ACKNOWLEDGEMENT: National Cyber Security Research program from NRF Singapore
http://www.comp.nus.edu.sg/~tsunami/ https://www.comp.nus.edu.sg/~nsoe-tss/
Crash-Avoiding Program Repair
ISSTA 2019.
A Feasibility Study of Using Automated Program Repair for Introductory Programming Assignments
ESEC-FSE 2017.
44

Perspective
APSEC 2020 Keynote
Automated Program Repair
C. Le Goues, M. Pradel, A. Roychoudhury
Review Article,Communications of the ACM, 2019.
45
abhik@comp.nus.edu.sg
https://www.comp.nus.edu.sg/~abhik

APSEC2020 Keynote

Related slideshows

More Related Content

APSEC2020 Keynote