P106 rajagopalan-read
- 1. Profile-Directed Optimization of Event-Based Programs
Mohan Rajagopalan Saumya K. Debray
Department of Computer Science
University of Arizona
Tucson, AZ 85721, USA
mohan, debray @cs.arizona.edu
Matti A. Hiltunen Richard D. Schlichting
AT&T Labs-Research
180 Park Avenue
Florham Park, NJ 07932, USA
hiltunen, rick @research.att.com
ABSTRACT structure user interaction code in GUI systems [8, 18], form the
2010.05.17
Events are used as a fundamental abstraction in programs ranging
from graphical user interfaces (GUIs) to systems for building cus-
basis for configurability in systems to build customized distributed
services and network protocols [4, 9, 16], are the paradigm used for
tomized network protocols. While providing a flexible structuring
:
and execution paradigm, events have the potentially serious draw-
asynchronous notification in distributed object systems [19], and
are advocated as an alternative to threads in web servers and other
back of extra execution overhead due to the indirection between types of system code [20, 23]. Even operating system kernels can
modules that raise events and those that handle them. This pa- be viewed as event-based systems, with the occurrence of interrupts
per describes an approach to addressing this issue using static opti- and system calls being events that drive execution.
mization techniques. This approach, which exploits the underlying The rationale behind using events is multifaceted. Events are
predictability often exhibited by event-based programs, is based on asynchronous, which is a natural match for the reactive execution
first profiling the program to identify commonly occurring event behavior of GUIs and operating systems. Events also allow the
sequences. A variety of techniques that use the resulting profile in- modules raising events to be decoupled from those fielding the
formation are then applied to the program to reduce the overheads events, thereby improving configurability. In short, event-based
associated with such mechanisms as indirect function calls and ar- programming is generally more flexible and can often be used to
realize richer execution semantics than traditional procedural or
- 4. , but
rov-
Components
bound to more than one event. An event is ignored if no handlers
are bound to the event. The execution order of multiple handlers
time bound to the same event may be important. Bindings may be static,
sub- i.e., remain the same throughout the execution of the program, or
ased dynamic, i.e., may change at runtime. Figure 1 illustrates bindings.
ques
e de-
head
Events Handlers Handler 1
Event A
on 2 Handler 2
Event B
fol-
zing Event C
Handler 3
ction
Handler 4
ives Event D
ents Handler 5
and
Cac-
ser- Figure 1: Event bindings
ution
ws, a
cus- Bindings are maintained in a registry that maps each event to
ction a list of handlers. The registry may be implemented as a shared
- 5. onsists of two or more event handlers. Events in Cactus are user- procedure.
efined. A typical composite protocol uses 10-20 different events In addition to these three, X has a number of other mechanisms
Examples
onsisting of a few external events caused by interactions with soft-
ware outside the composite protocol and numerous internal events
sed to structure the internal processing of a message or service
equest. Each event typically has multiple event handlers. As a re-
that can be broadly classified as event handling, namely timeouts,
signal handlers, and input handlers. Each of these mechanisms
allows the program to specify a procedure to be called when a given
condition occurs. For all these handler types, X provides operations
ult, Cactus composite protocols often have long chains of events for registering the handlers and activating them.
nd event handlers activated by one event. Section 4 gives concrete
xamples of events used in a Cactus composite protocol.
The Cactus runtime system provides a variety of operations for
3. OPTIMIZATION APPROACH
managing events and event handlers. In particular, operations are Compiler optimizations are based on being able to statically pre-
rovided for binding an event handler to a specified event (bind) dict aspects of a program’s runtime behavior using either invariants
in figure 3. The X server is a program that runs on each sy
nd for activating an event (raise). Event handler binding is com- that always hold at runtime (i.e., based on dataflow analysis) or as-
supporting a graphics display and is responsible for managing
letely dynamic. Events can be raised either synchronously or sertions that are likely to hold (i.e., based on execution profiles).
vice drivers. Application programs, also called X clients, ma
Top API
synchronously, and an event can also be raised with a specified Event-based systems, in contrast, are largely unpredictable in their
elay to implement time-driven execution. The orderEvents han-
Micro!protocols of event runtime behavior remote the the display system. X serversthe be- clients
local or due to to uncertainties associated with and X
the X-protocol for communication. X clients are typically bui
ler execution can be specified if desired. Arguments can be passed the Xlib libraries using toolkits such as Xt, GTK, or Qt. X cli
DESPrivacy
o handlers in both the bind and raise operations.msgFromAbove
Other operations
are implemented as a collection X!Client Application are the b
Devices of widgets, which
re available for unbinding handlers, creating and deleting events,
KeyedMD5Integrity msgFromBelow Device Drivers blocks of X applications.
building
alting event execution, and canceling a delayed event. Handler
xecution is atomic with respect to concurrency, i.e., a handler is X!Server X event is defined as “a packet Toolkit sent by the serv
An Xt of data Qt
RSAAuthenticity openSession
xecuted to completion before any other handler is started unless it the client in response to user behavior orXLib
to window system cha
oluntarily yields the CPU. Cactus does not directly support com- resulting from interactions between windows” [18]. Example
ClientKeyDistribution keyMiss X events include mouse motion, focus change, and button p
lex events, but such events can be implemented by defining a new X!Protocol
... ... These events are recognized through device drivers and relaye
vent and having a micro-protocol raise this event when the condi-
ons for the complex event are satisfied.Bottom API the X server, which in turn conveys them to X clients. The
The X Window system. X is a popular GUI framework for Unix Figure 3: Architecture of X Window systems may choose t
framework specifies 33 basic events. X clients
ystems. The standard architecture of an X based system is shown spond to any of these based on event masks that are specifie
bind time. Events are also used for communication between
Figure 2: Cactus composite protocol gets. Events can arrive in any order and are queued by the X cl
Event activation in X is similar to synchronous activation in
108 general model.
The X architecture has three mechanisms for handling ev
bound to the event at that time results in a correct transformation. event handlers, callback functions, and action procedures. All t
Similarly, it is easy to see that sequences of nested synchronous ac- map to handlers in the general model and are used to specify di
tivations can be readily optimized. The specific optimization tech- ent granularities of control. Event handlers, the most primitive
niques and their limitations are discussed below in section 3. simply procedures bound to event names. Callback functions
action procedures are more commonly used high-level abstract
- 7. Event Profiling
EventGraph = ;
prev event = eventTrace firstEvent;
while not (end of eventTrace)
event = eventTrace nextEvent;
if (prev event,event) not in EventGraph
EventGraph += (prev event,event);
EventGraph(prev event,event) weight = 1;
else
eventGraph(prev event,event) weight++;
prev event = event;
Figure 4: GraphBuilder algorithm.
havior of their external environment, e.g., the user’s actions. We
- 8. 1
42
Sample
31 ControllerFired
479
40 310
MsgFrmUserH
1 1 1 87
392 1 SendMsg
MsgFrmUserL ControllerFiring
393 1 87 391
1 2 86
SegFromUser
SegmentSent 1
160 1
Adapt
552 86 2
391
1 SegmentTimeout
Seg2Net
8 42 ControllerClkL
actions. We SegmentAcked 317 38 47 1 391
cant amount Controller 391
1 26 ControllerClkH
xploited for 1
o levels. At AddSysInput 1 1
ll (or most) 1 Open ResizeFragment
1
n more than
dlers are ex- Synchronously Activated Events
rs are gener- Key:
nd the over- Asynchronously Activated Events
d at runtime.
m’s behavior
Figure 5: Event graph generated from video player
ndler config-
- 9. Graph Optimizations(1)
Handler Merging
392
MsgFromUserL MsgFromUserH
Event A
393 310
SegFromUser ControllerFired
552 Threshold = 300 479
Seg2Net ControllerFiring
317 391
Controller Adapt
E
391 391
ControllerClkH 392 ControllerClkL
Figure 6: Reduced event graph
- 10. Graph Optimizations(2)
Event Chains and Subsumption
392 Events Handlers
romUserL MsgFromUserH
Event A
93 310
Handler1 Handler2 Handler3
ser ControllerFiredHandler Graph View { { {
H1_code H2_code H3_code
FEC SFU1 FEC SFU2 } } }
Event Graph
Threshold = 300 479
SeqSegSFU TD S2N
SegFromUser ControllerFiring
Handler Merging
391 TDriver SFU FEC S2N
Seg2Net
Adapt S2N
PAU WFC S2N Events Handlers
Event A Handler123
{
91 391 H1_code
H2_code
392 Figure 8: Subsuming events }
H3_code
rollerClkH ControllerClkL
ure 6: Reduced event graph
from within a handler for SegFromUser, the latter will wait un- Figure 7: Handler merging
til the handling of Seg2Net has been completed, at which point
control will return to the handler for SegFromUser. In this case,
the handler for Seg2Net can be subsumed into that for SegFro-
e event graph foreliminating the synchronous im- raisetranslates into a sequence of indirect function calls. There are two
mUser, thereby a video player application event between
fthem.
a configurable transport protocol called CTP
- 12. Total Execution Time (sec)
Frame rate Orig. ( ) Opt. ( )
Experiment Results 10
15
20
25
43.1
30.9
24.5
23.9
41.9
30.3
22.1
21.3
97.2
98.0
90.2
89.1
Key: Orig: Original program;
Figure 10: Video player
Push time ( sec)
Size Orig. ( ) Opt. ( ) (%) O
Total Execution Time (sec) Event Handler Time (sec) Event Processing Time ( sec) 241
64 274 Speedup 88.0
Frame rate Orig. ( ) Opt. ( ) (%) Orig. ( ) Opt. ( ) (%) Original 287
128 Optimized 263 ( ) 91.6
10 43.1 41.9 97.2 2.3 0.9 39.1 256 304 273 89.8
Adapt 55 11 80.0
15 30.9 30.3 98.0 1.6 0.6 37.5 512 336 299 89.0
SegFromUser 346 41 88.2
20 24.5 22.1 90.2 1.5 0.5 33.3 1024 430 373 86.7
Seg2Net 137 37 73.0
25 23.9 21.3 89.1 1.5 0.5 33.3 2048 572 552 96.5
Key: Orig: Original program; Opt: Optimized program Figure 11: Event processing times in the video player.
Figure 12: Impact of optimiza
Figure 10: Video player optimization results.
Push time ( sec) optimization on overall execution time becomes more pronounced
Processing Orig. ( sec) Opt. ( )
Time ( ) Speedup that the time for thetime (portion is reduced markedly in most Event
Pop push sec) cases, Execution Time ( sec)
Size (%)with improvements of ( to 13.3%. The improvements
Orig. ( ) Opt. up ) (%) as thein the pop increases ) that when) the frame rate is low, the
frame rate Orig. ( is Opt. (
Type (%)
Original Optimized ( )
64 274 241 88.0 portion397 also noticeable although not as high as CPU is idle a large part of the time. As a result, the unoptimized
are 378 95.2 for the push por-
Scroll 158 148 93.7
55
128 28711 80.0
263 91.6 tion, typically around 5% but going as high as 12%.
460 448 97.4 program can simply use a bit more of the idle83.8 to keep up with
time
r 346 41 88.2 Popup 37 31
256 304 273 89.8 484 457 94.4 the required frame rate. However, when the frame rate increases,
An examination of the effects of our optimizations on these two
137 37 73.0
512 336 299 89.0 programs indicates two main sources of benefits: both programs must do more work in a time unit and the idle time
494 470 95.1 the reduction of
Figure 13: Optimization of X events
1024 430 373
ent processing times in the video player. 86.7 argument marshaling overhead when invoking event handlers, When the frame rate becomes high enough, the unopti-
608 570 93.8 decreases. and
2048 572 552 96.5 handler merging that leads to a reduction in the mized program runs out of extra idle time and starts falling behind
1016 893 87.9 number of han-
dler invocations. The elimination of marshaling overhead seems to
the optimized program. This indicates that our optimizations are
Figure 12: Impact of optimization largest effect on the overall performance improvements
have the in SecComm
achieved. The main effect of handler merging especially effectivePopup by over 16%.such as techniques were that
aboutto reduce that of for mobile systems These handheld PDAs
is 6% and the
rall execution time becomes more pronounced number of function calls between handlers that are executed in less powerful processors than desktop systems.
tend to havese- level of action handlers, although it would be
applied here at the op
creases is that when the frame rate is low, the quence. Merging also creates opportunities for possible to optimize configurable secure opening up callbacks in that
additional code im- a one step further by communication service
SecComm is
part of the time. As a result, the unoptimized provements due to standard compiler optimizations. sup
Execution Time (the idle time to keep up with
use a bit more of sec)
theallowsway. customization of security attributes for a communica-
same the eve
Codeifin event handlersevent B changed
event binding for is usually a small fraction connection, including performed using the Athena widget
tion of the total
Orig. ( ) Opt. ( ) (%)
rate. However, when the frame rate increases, These optimizations were privacy, authenticity, integrity, and non-
program size. To measure the effects B); our optimization on code eve
call(original code for event of repudiation. One of the features of SecComm is its support for
do 158 work in a time unit93.7 the idle time
more 148 and size, we counted the number of instructions in family based on Xt and Xlib provided with XFree86. The Athena
else the original and op- can
e frame rate becomes high enough, the unopti-
37 31 83.8 timized programs using theand optimized code for event is a minimalsecuritywith limited configurability, and there- se-
merged, inlined, command objdump implementing a toolkit property using combinations of basic
toolkit program |
-d B;
out of extra idle time and starts falling behind fore provided limited scope for applying configuration of SecComm
curity micro-protocols. We optimized aour optimizations. The
wc -l. Our optimizations produce a code size increase of 1.3% mi
am. Optimization of X our optimizations are
13: This indicates that events for the video player and 1.1% for SecComm. event model in more recent (and popular) toolkits such message body
with three micro-protocols, two of which encrypt the as Gnome all
for mobile systems such as handheld PDAs that