Embedded Recipes 2018 - Finding sources of Latency In your system - Steven Rostedt

© 2016 VMware Inc. All rights
reserved.
© 2016 VMware Inc. All rights
reserved.
Finding sources of Latency
In your system (via tracing tools)
Steven Rostedt
9/24/2018

What is Latency?
3
“Latency is a time interval between the stimulation and
response, or, from a more general point of view, a time
delay between the cause and the effect of some physical
change in the system being observed.” - Wikipedia

What is Latency?
●
The time between
– When something needs to be done
– When that something actually gets done
4

What is Latency?
●
The time between
– When something needs to be done
– When that something actually gets done
●
The time between
– When something happens
– When it is seen
5

There is ALWAYS Latency
●
Nothing happens instantaneously
●
There’s always going to be some “lag”
●
Real Time systems avoid “Unbounded Latency”
– When all latency is bound to a worse case scenario
– Any latency that is indeterminate can cause problems
●
Non Real-Time systems still care about latency
6

Why do we care about Latency?
●
For Real Time tasks, it is critical
– Need to know worse case latency
– Guarantee that it can achieve its objective
●
Don’t want airplane flaps to not react quick enough!

Why do we care about Latency?
●
Practically everyone/everything cares about Latency
– How long would you wait for ‘a’ to show up after clicking ‘a’?
– A search result that takes 5 minutes to answer!
– All systems are therefore “real time”!
●
Needs to be bounded to some arbitrary number
●
Latency adds up
– A system’s latency is the combination of its sub-components

Where is that Latency?
●
Latency adds up
– A system’s latency is the combination of its sub-components
●
It can be hard finding where your latency is
●
What causes it
– The application
– The libraries
– The operating system
– The hardware

Where is that Latency?
HARDWARE
Kernel
Library
Application
BIOS

Hardware Latency
●
Can’t get better than what the hardware gives you
●
Sources of HW latency
– System Management Interrupts (SMI)
●
The BIOS takes over the machine
– Clock frequency
●
The system slows down
– Cache line bouncing
●
Sharing the same variable among CPUs

Cyclictest
●
A tool to measure latency
●
Runs a simple loop (in a high priority task)
– Sleep for a specified time
– Get timestamp when wakes up
– Compare the difference
●
Best to use nanosleep
– Can also use signals, but that has high latency!
●
Favorite application of the Linux RT folks

Cyclictest
start = gettimeofday()
Sleep 250us
user-space
kernel
Put task to sleep
Set interrupt timer for 250us
Timer Interrupt goes off
Wake up Task
Schedule Taskend = gettimeofday()
jitter = (end - start) - 250us
250us

Cyclictest
Sleep 250us
user-space
kernel
Put task to sleep
Wake up Task
250us
latency (jitter)

Cyclictest
Sleep 250us
user-space
kernel
Put task to sleep
Wake up Task
250us
latency (jitter)
interrupt latency

Cyclictest
Sleep 250us
user-space
kernel
Put task to sleep
Wake up Task
250us
latency (jitter)
interrupt latency
wakeuplatency

Cyclictest
●
The options I am using here
– “-p80” : set the starting priority to 80
– “-i250” : set the starting interval to 250 microseconds
●
Sets nanosleep to the next 250 microsend interval
– “-n” : use nanosleep and not POSIX timers
●
POSIX timers will use signals, they have horrible latency
– “-a” : Set affinity of a task to each CPU
– “-t” : one thread per available processor (or logical CPU)
– “-q” : keep quiet while running
– “-d0” : Keep the interval the same for all threads (all at 250 microseconds)

The tools
●
Measuring latency
– cyclictest
●
Using Tracing tools can help find what’s happening
– Ftrace
●
hwlat
●
function and function graph tracer
●
event tracer
●
trace-cmd / KernelShark
– Lockdep (lock contention)
– Perf
●
profiling
●
statistics
– Many others (but I’m focusing on the Ftrace and and a bit of Lockdep)

Hardware Latency Detector
●
Part of Ftrace
– CONFIG_HWLAT_TRACER (in kernel .config build file)
●
cd /sys/kernel/tracing (tracefs directory)
– mount -t tracefs nodev /sys/kernel/tracing
●
echo hwlat > current_tracer
●
cat trace

# cd /sys/kernel/tracing
# echo hwlat > current_tracer
# sleep 10
# cat trace
# tracer: hwlat
#
# entries-in-buffer/entries-written: 7/7 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
<...>-824 [001] d... 32.550243: #1 inner/outer(us): 59/53 ts:1537661092.117956843
#

Hardware Latency Algorithm
●
Runs a kernel thread
●
Does a tight loop for a period of time
●
Interrupts are disabled (during the loop)
●
Moves around all CPUs
●
Records when the timestamp happened
●
Denotes if it happened inside or outside the double timestamp

t1 = time_get()
t2 = time_get()
last_t2 == 0
last_t2 = 0
diff = t1 - last_t2
diff > outer
diff = t2 - t1
diff > inner
outer = diff
inner = diff
time < width
exit loop

# cat hwlat_detector/width
500000
# cat hwlat_detector/window
1000000
# cat tracing_thresh
10
●
Spin for “width” microseconds
●
Every “window” microseconds
●
Record a trace if the diff is greater than “tracing_thresh” microseconds

Hardware Latency Detector (with NMIs)
●
It checks if an NMI triggered
●
NMIs are not hardware, but software controlled (in most cases)
●
nmi-total - The total time in NMI during the loop
●
nmi-count - The number of NMIs that were triggered

Hardware Latency Detector (With NMIs)
# echo hwlat > current_tracer
# sleep 10
# cat trace
# tracer: hwlat
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
<...>-3808 [002] dn.. 128.034081: #3 inner/outer(us): 15/17 ts:1537718585.047027453
<...>-3808 [000] d... 130.082010: #4 inner/outer(us): 17/16 ts:1537718587.095030248 nmi-total:3 nmi-count:1
<...>-3808 [003] d... 153.633627: #19 inner/outer(us): 0/18 ts:1537718610.647010083 nmi-total:4 nmi-count:1
#

If that’s not the problem?
●
What if the hardware is fine?
●
What else could it be?
●
The Kernel?
●
Your application?

Kernel Entry Tracing (more than strace)
# echo 1 > /sys/kernel/tracing/options/function-fork
# trace-cmd record -F -c -p function_graph --max-graph-depth 1 -e syscalls
./cyclictest -p80 -i250 -n -a -t -q -d 0
plugin 'function_graph'
# /dev/cpu_dma_latency set to 0us
^C
T: 0 ( 9645) P:80 I:250 C: 45482 Min: 4 Act: 10 Avg: 6 Max: 1189
CPU0 data recorded at offset=0x60c000
10473472 bytes in size
CPU1 data recorded at offset=0x1009000
CPU2 data recorded at offset=0x19f8000
CPU3 data recorded at offset=0x23eb000

More Efficient Trace
# trace-cmd report -F 'funcgraph_entry'| cut -d'|' -f 2 | cut -d'(' -f1 | sort -u
cpus=4
do_IRQ
__do_page_fault
do_syscall_64
exit_to_usermode_loop
fsnotify
__fsnotify_parent
__f_unlock_pos
mutex_unlock
__sb_end_write
schedule_tail
smp_apic_timer_interrupt
smp_irq_work_interrupt
syscall_slow_exit_work

More Efficient Trace (I need to make this easier)
# trace-cmd record -F -c -p function_graph --max-graph-depth 1 -e syscalls
-l do_IRQ -l __do_page_fault -l do_syscall_64 -l exit_to_usermode_loop -l fsnotify
-l __fsnotify_parent -l __f_unlock_pos -l mutex_unlock -l __sb_end_write
-l schedule_tail -l smp_apic_timer_interrupt -l smp_irq_work_interrupt
-l syscall_slow_exit_work
./cyclictest -p80 -i250 -n -a -t -q -d 0
plugin 'function_graph'
^C
CPU0 data recorded at offset=0x60c000
CPU1 data recorded at offset=0x1529000

More Accurate Trace
# trace-cmd report | less
cpus=4
cyclictest-10668 [003] 16938.997362: funcgraph_entry: | mutex_unlock() {
cyclictest-10668 [003] 16938.997366: funcgraph_entry: 8.785 us | smp_irq_work_interrupt();
cyclictest-10668 [003] 16938.997377: funcgraph_exit: 0.142 us | }
cyclictest-10668 [003] 16938.997378: funcgraph_entry: 0.215 us | __fsnotify_parent();
cyclictest-10668 [003] 16938.997379: funcgraph_entry: 0.218 us | fsnotify();
cyclictest-10668 [003] 16938.997380: funcgraph_entry: 0.298 us | __sb_end_write();
cyclictest-10668 [003] 16938.997382: funcgraph_entry: 0.284 us | __f_unlock_pos();
cyclictest-10668 [003] 16938.997383: funcgraph_entry: | syscall_slow_exit_work() {
cyclictest-10668 [003] 16938.997384: sys_exit_write: 0x1
cyclictest-10668 [003] 16938.997388: funcgraph_exit: 4.939 us | }
cyclictest-10668 [003] 16938.997396: funcgraph_entry: 8.209 us | __do_page_fault();
cyclictest-10668 [003] 16938.997411: funcgraph_entry: | do_syscall_64() {
cyclictest-10668 [003] 16938.997412: sys_enter_execve: filename: 0x7fff9e60de96, argv:
0x7fff9e60c028, envp: 0x7fff9e60c078
cyclictest-10668 [003] 16938.997777: sys_exit_execve: 0x0
cyclictest-10668 [003] 16938.997782: funcgraph_exit: ! 370.796 us | }
cyclictest-10668 [003] 16938.997844: sys_enter_brk: brk: 0x00000000

More Accurate Trace
cyclictest-10670 [001] 16939.002228: sys_enter_clock_nanosleep: which_clock: 0x00000001, flags:
0x00000001, rqtp: 0x7fe7d41b4920, rmtp: 0x00000000
cyclictest-10671 [002] 16939.002315: sys_exit_clock_nanosleep: 0x0

Wakeup Latency
●
The time a task wakes up, to the time it is scheduled
●
“wakeup” tracer
– Traces the time of the highest priority task (RT or Not)
– Interesting, but not very useful (hides RT task latency)
●
“wakeup_rt” tracer
– Only monitors RT tasks
●
trace-cmd record -p wakeup_rt -d -e all
– “-d” - disable function tracing
– “-e all” enable all events
36

Wakeup Latency
# trace-cmd start -p wakeup_rt -d -e all
# ./cyclictest -p80 -i250 -n -a -t -q -d 0
^C
T: 0 (12864) P:80 I:250 C:1237799 Min: 7 Act: 12 Avg: 9 Max: 433

Wakeup Latency
# trace-cmd show | less
# tracer: wakeup_rt
#
# wakeup_rt latency trace v1.1.5 on 4.19.0-rc4-test+
# --------------------------------------------------------------------
# latency: 319 us, #2072/2072, CPU#2 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
# -----------------
# | task: cyclictest-12866 (uid:0 nice:0 policy:1 rt_prio:80)
# -----------------
#
# _------=> CPU#
# / _-----=> irqs-off
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# |||| / delay
# cmd pid ||||| time | caller
# / ||||| | /
Chrome_I-6675 3...1 0us : preempt_enable: caller=__sb_start_write+0x89/0xc0 parent=__sb_start_write+0x89/0xc0
<...>-42 2dNh4 0us : 42:139:R + [002] 12866: 19:R <...>
Chrome_I-6675 3d..2 0us : irq_disable: caller=rcu_irq_exit_irqson+0x1c/0x50 parent= (null)
<...>-42 2dNh4 0us : 0
<idle>-0 0dN.1 0us : rcu_utilization: Start context switch
Chrome_I-6675 3d..2 0us : irq_enable: caller=rcu_irq_exit_irqson+0x40/0x50 parent= (null)
<...>-42 2dNh1 0us : hrtimer_expire_exit: hrtimer=00000000f472f709
<idle>-0 0dN.1 0us : rcu_utilization: End context switch
<...>-42 2dNh1 1us : write_msr: 6e0, value 2e20f26e6f4a
Chrome_I-6675 3d..2 1us : irq_disable: caller=rcu_irq_enter_irqson+0x1c/0x50 parent= (null)
<...>-42 2dNh1 1us : local_timer_exit: vector=236
Chrome_I-6675 3d..2 1us : irq_enable: caller=rcu_irq_enter_irqson+0x40/0x50 parent= (null)

Wakeup Latency
<...>-42 2dN.1 102us : irq_disable: caller=_raw_spin_lock_irq+0x15/0x40 parent= (null)
<...>-42 2dN.2 102us : irq_enable: caller=_raw_spin_unlock_irq+0x11/0x40 parent= (null)

Wakeup Latency
# trace-cmd start -p wakeup_rt -l ‘*spin*’ -e all
# ./cyclictest -p80 -i250 -n -a -t -q -d 0
^C
T: 0 (13264) P:80 I:250 C: 756984 Min: 7 Act: 12 Avg: 9 Max: 292

Wakeup Latency
Xorg-747 2.N.2 274us : _raw_spin_lock_irqsave <-pagevec_lru_move_fn
Xorg-747 2dN.3 274us : irq_enable: caller=wakeup_tracer_call+0xcf/0xe0 parent= (null)
Xorg-747 2dN.2 275us : irq_disable: caller=_raw_spin_lock_irqsave+0x20/0x50 parent= (null)
Xorg-747 2dN.3 275us : mm_lru_activate: page=00000000d262fdc6 pfn=2466257
Xorg-747 2dN.3 276us : mm_lru_activate: page=000000008b725f88 pfn=2875792
Xorg-747 2dN.3 276us : mm_lru_activate: page=000000008e31e070 pfn=3064073
Xorg-747 2dN.3 276us : mm_lru_activate: page=00000000774be734 pfn=2917451
Xorg-747 2dN.3 276us : mm_lru_activate: page=00000000c945de9c pfn=2021174
Xorg-747 2dN.3 277us : mm_lru_activate: page=0000000062d8ec6a pfn=2744545
Xorg-747 2dN.3 277us : mm_lru_activate: page=0000000080bc2df3 pfn=1899247
Xorg-747 2dN.3 277us : mm_lru_activate: page=00000000248a1143 pfn=2886580
Xorg-747 2dN.3 277us : mm_lru_activate: page=00000000ab7567ef pfn=2708231
Xorg-747 2dN.3 278us : mm_lru_activate: page=00000000c8601ee2 pfn=2031604
Xorg-747 2dN.3 278us : mm_lru_activate: page=0000000073cddaf3 pfn=2775238
Xorg-747 2dN.3 278us : mm_lru_activate: page=00000000b558f9cf pfn=2937059
Xorg-747 2dN.3 278us : mm_lru_activate: page=00000000fedc94ed pfn=2732924
Xorg-747 2dN.3 279us : mm_lru_activate: page=00000000d304f759 pfn=2319768
Xorg-747 2dN.3 279us : mm_lru_activate: page=000000000f98cd55 pfn=2336125
Xorg-747 2dN.3 279us : _raw_spin_unlock_irqrestore <-pagevec_lru_move_fn
Xorg-747 2dN.3 279us : irq_enable: caller=_raw_spin_unlock_irqrestore+0x40/0x60 parent= (null)
Xorg-747 2dN.2 280us : irq_disable: caller=free_unref_page_list+0xc1/0x220 parent= (null)
Xorg-747 2dN.2 280us : irq_enable: caller=free_unref_page_list+0x204/0x220 parent= (null)
Xorg-747 2dN.3 283us : irq_disable: caller=wakeup_tracer_call+0x7d/0xe0 parent= (null)
Xorg-747 2.N.2 283us : _raw_spin_lock_irqsave <-pagevec_lru_move_fn
Xorg-747 2dN.3 284us : irq_enable: caller=wakeup_tracer_call+0xcf/0xe0 parent= (null)
Xorg-747 2dN.2 284us : irq_disable: caller=_raw_spin_lock_irqsave+0x20/0x50 parent= (null)
Xorg-747 2dN.3 284us : mm_lru_activate: page=0000000049c0942d pfn=2422419
Xorg-747 2dN.3 285us : mm_lru_activate: page=00000000e64c4c24 pfn=2651893
Xorg-747 2dN.3 285us : mm_lru_activate: page=000000005ca129cc pfn=2002176

IRQ and Preemption Latency
●
When interrupts or preemption is disabled
– Other tasks can not be scheduled in
●
Tracers:
– irqsoff
●
Interesting info
– preemptoff
●
interesting info
– preemptirqsoff
●
Useful info
●
The only one I use
42

Preempt and IRQs Off Latency
# trace-cmd start -p preemptirqsoff -d
# tracer: preemptirqsoff
#
# preemptirqsoff latency trace v1.1.5 on 4.19.0-rc4-test+
# --------------------------------------------------------------------
# -----------------
# | task: Xorg-747 (uid:0 nice:0 policy:0 rt_prio:0)
# -----------------
# => started at: __mutex_lock.isra.5
# => ended at: __mutex_lock.isra.5
#
#
# _------=> CPU#
# / _-----=> irqs-off
# |||| / delay
# / ||||| | /
Xorg-747 1...1 0us#: __mutex_lock.isra.5 <-__mutex_lock.isra.5
Xorg-747 1...1 1100us : __mutex_lock.isra.5 <-__mutex_lock.isra.5
Xorg-747 1...1 1101us+: tracer_preempt_on <-__mutex_lock.isra.5
Xorg-747 1...1 1138us : <stack trace>
=> i915_gem_fault
=> __do_fault
=> __handle_mm_fault
=> handle_mm_fault
=> __do_page_fault
=> page_fault

# echo 1 > /sys/kernel/debug/tracing/options/sym-offset
# tracer: preemptirqsoff
#
# preemptirqsoff latency trace v1.1.5 on 4.19.0-rc4-test+
# --------------------------------------------------------------------
# -----------------
# | task: Xorg-747 (uid:0 nice:0 policy:0 rt_prio:0)
# -----------------
# => started at: __mutex_lock.isra.5+0x3a/0x4e0
# => ended at: __mutex_lock.isra.5+0x1d6/0x4e0
#
#
# _------=> CPU#
# / _-----=> irqs-off
# |||| / delay
# / ||||| | /
Xorg-747 1...1 0us#: __mutex_lock.isra.5+0x3a/0x4e0 <-__mutex_lock.isra.5+0x3a/0x4e0
Xorg-747 1...1 1100us : __mutex_lock.isra.5+0x1d6/0x4e0 <-__mutex_lock.isra.5+0x1d6/0x4e0
Xorg-747 1...1 1101us+: tracer_preempt_on+0xf4/0x110 <-__mutex_lock.isra.5+0x1d6/0x4e0
Xorg-747 1...1 1138us : <stack trace>
=> i915_gem_fault+0xc7/0x550 [i915]
=> __do_fault+0x20/0xe0
=> __handle_mm_fault+0xdf7/0x15a0
=> handle_mm_fault+0x11e/0x260
=> __do_page_fault+0x283/0x580
=> page_fault+0x1e/0x30

IRQ and Preemption Latency
●
When interrupts or preemption is disabled
– Other tasks can not be scheduled in
●
Tracers:
– irqsoff
●
Interesting info
– preemptoff
●
interesting info
– preemptirqsoff
●
Useful info
●
The only one I use
45

The “trace_marker” and “tracing_on” files
●
/sys/kernel/tracing/trace_marker
– Let’s userspace write into the kernel ring buffer
– Application can write into it to tag where it discovered an issue
●
/sys/kernel/tracing/tracing_on
– Can enable or disable writing to the ring buffer
– Note, when off, it also prevents new “max latency” from being recorded
●
by the wakeup, and preempt/irqs off tracers
46

Using trace_marker and tracing_on in C code
static int marker_fd = -1;
static int tracing_fd = -1;
static __thread char buff[BUFSIZ+1];
static void write_marker(const char *fmt, ...)
{
va_list ap;
int n;
if (marker_fd < 0)
return;
va_start(ap, fmt);
n = vsnprintf(buff, BUFSIZ, fmt, ap);
va_end(ap);
write(marker_fd, buff, n);
}
static void set_tracing (int on)
{
char val[1];
if (tracing_fd < 0)
return;
val[0] = ‘0’ + on;
write(tracing_fd, val, 1);
}
[..]
// On error
write_marker(“Detected a latency of %d microsecondsn”, latency);
set_tracing(0);

Having cyclictest stop on max threshold
●
cyclictest is aware of the tracing infrastructure
– “-b 500” : break on if latency is greater than 500 microseconds
●
More options that we are not using but could have (All require -b option)
– “-E” enable all events
– “-f” enable function tracing
– “-I” enable irqsoff tracer
– “-P” enable preemptoff tracer
– “-w” enable wakeup tracer
– “-W” enable wakeup_rt tracer
48

# ./cyclictest -p80 -i250 -n -a -t -q -d 0 -b 500
INFO: debugfs mountpoint: /sys/kernel/debug/tracing/
# Thread Ids: 15126 15127 15128 15129
# Break thread: 15129
# Break value: 560
=> page_fault
# ./cyclictest -p80 -i250 -n -a -t -q -d 0 -b 500
# trace-cmd start -p function -e all -l '*lock*' -l '*mutex*'

The issue has been record (we hope)
●
The trace buffer has the issue (with a marker tag)
●
Save the text files (just to keep them around)
50
# cp /sys/kernel/tracing/trace /some/path/save_trace_dir/trace
# for file in /sys/kernel/tracing/per_cpu/cpu*/trace; do
cpu=${file/%/trace}
cp $file /some/path/save_trace_dir/`basename $cpu`
done
# ls /some/path/save_trace_dir
cpu0 cpu1 cpu2 cpu3 trace

For full power of trace-cmd
●
trace-cmd extract retrieves the kernel buffer into a trace.dat file
●
Can now do all sorts of filtering
51
# trace-cmd extract
# trace-cmd report 2>/dev/null |grep print
cyclictest-15129 [003] 23346.433979: print: tracing_mark_write: hit latency threshold (560 > 500)

For full power of trace-cmd
●
trace-cmd extract retrieves the kernel buffer into a trace.dat file
●
Can now do all sorts of filtering
52
# trace-cmd extract
# trace-cmd report 2>/dev/null |grep print
cyclictest-15129 [003] 23346.433979: print: tracing_mark_write: hit latency threshold (560 > 500)

# trace-cmd report -l --cpu 3 -O parent | less
cyclicte-15129 3.... 23346.433176: hrtimer_init: hrtimer=0xffffb45f4290be70 clockid=CLOCK_MONOTONIC mode=0x0
[..]
cyclicte-15129 3d..1 23346.433178: hrtimer_start: hrtimer=0xffffb45f4290be70 function=hrtimer_wakeup/0x0
expires=23346275012026 softexpires=23346275012026
[ 23346275013 - 23346175012 = 1us (from below) ]
[.. cyclictest is in nanosleep here ..]
Xorg-747 3d.h3 23346.433403: local_timer_entry: vector=236
Xorg-747 3d.h3 23346.433403: function: _raw_spin_lock_irqsave <-- hrtimer_interrupt
Xorg-747 3d.h4 23346.433403: hrtimer_cancel: hrtimer=0xffffb45f4290be70
Xorg-747 3d.h4 23346.433403: function: _raw_spin_unlock_irqrestore <-- __hrtimer_run_queues
Xorg-747 3d.h3 23346.433403: hrtimer_expire_entry: hrtimer=0xffffb45f4290be70 now=23346275013126
function=hrtimer_wakeup/0x0
[ 23346433403 - 23346275013 = 158390 (offset) ]
Xorg-747 3d.h3 23346.433403: function: _raw_spin_lock_irqsave <-- try_to_wake_up
[..]
Xorg-747 3dNh5 23346.433403: sched_wakeup: cyclictest:15129 [19] success=1 CPU:003
Xorg-747 3dNh5 23346.433403: function: __lock_text_start <-- try_to_wake_up
Xorg-747 3dNh4 23346.433403: function: _raw_spin_unlock_irqrestore <-- try_to_wake_up
Xorg-747 3dNh3 23346.433403: hrtimer_expire_exit: hrtimer=0xffffb45f4290be70
[..]
Xorg-747 3dNh3 23346.433403: local_timer_exit: vector=236
Xorg-747 3.N.1 23346.433409: function: lock_page_memcg <-- page_remove_rmap
Xorg-747 3dN.2 23346.433956: function: __rcu_read_lock <-- cpuacct_charge
Xorg-747 3dN.2 23346.433957: function: __rcu_read_unlock <-- update_curr
Xorg-747 3dN.2 23346.433957: function: __rcu_read_lock <-- update_curr
Xorg-747 3dN.2 23346.433957: function: __rcu_read_unlock <-- update_curr
Xorg-747 3d..2 23346.433958: sched_switch: Xorg:747 [120] R ==> cyclictest:15129 [19]
[..]
cyclicte-15129 3.... 23346.433979: print: tracing_mark_write: hit latency threshold (560 > 500)

Lock Stats (based off of Lockdep)
●
lockdep - checks correctness of locking
– Avoids various deadlock scenarios
●
lockstat - built off of lockdep
– Records when a task waits on a lock
– Records how much time it waited
– Records contention between different CPUs
– Records acquisitions too
●
See Documentation/locking/lockstats.txt
54

The /proc/lockstat file
●
The headers
– con-bounces : How many times contended across CPUs
– contentions : Number of locks acquisitions that had to wait
– wait time
●
min: Shortest (non-zero) wait time
●
max: Longest wait time
●
total: Total amount of time tasks had to wait
●
avg: The average time a task waited
– acq-bounces : Number of acquisitions that crossed CPUs
– acquisitions: The number of times the lock was acquired
– hold time - min: max: total: avg: Same as wait time but for time lock was held
55

trylock(lock) record lock contetion(&lock)
record lock acquisition(&lock)
lock(&lock)
How lockstats work

lock_stat version 0.4
-------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------
class name con-bounces contentions waittime-min waittime-max waittime-total waittime-avg acq-bounces
acquisitions holdtime-min holdtime-max holdtime-total holdtime-avg
-------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------
&(&n->list_lock)->rlock: 1332195 1334443 0.07 9686.41 2573984.58 1.93 3643104
26288751 0.00 21988.08 15069393.95 0.57
-----------------------
&(&n->list_lock)->rlock 210234 [<000000001b722b79>] get_partial_node.isra.69.part.70+0x38/0x3b0
&(&n->list_lock)->rlock 147421 [<00000000a298aeee>] deactivate_slab.isra.67+0x500/0x6e0
&(&n->list_lock)->rlock 683236 [<000000008a6cb3ab>] free_debug_processing+0x3f/0x270
&(&n->list_lock)->rlock 246105 [<000000007fc36e41>] __slab_free+0xc5/0x3f0
-----------------------
&(&n->list_lock)->rlock 1008985 [<000000008a6cb3ab>] free_debug_processing+0x3f/0x270
&(&n->list_lock)->rlock 77749 [<00000000a298aeee>] deactivate_slab.isra.67+0x500/0x6e0
&(&n->list_lock)->rlock 137428 [<000000001b722b79>] get_partial_node.isra.69.part.70+0x38/0x3b0
&(&n->list_lock)->rlock 83846 [<000000007fc36e41>] __slab_free+0xc5/0x3f0
.......................................................................................................................................................
......................................................................
kmemleak_lock-W: 340821 341229 0.06 10471.90 535946.05 1.57 1944828
8016988 0.00 4676.63 4406716.68 0.55
kmemleak_lock-R: 4624 4652 0.10 73.34 4135.42 0.89 4887
999302 0.00 681.39 1320770.44 1.32
---------------
kmemleak_lock 169288 [<00000000795f3b3d>] find_and_remove_object+0x1b/0x90
kmemleak_lock 171942 [<000000000071b79a>] create_object+0x161/0x2c0
kmemleak_lock 4453 [<00000000554658c3>] find_and_get_object+0x45/0xb0
kmemleak_lock 199 [<00000000dfc7b05b>] scan_block+0x2a/0x110
---------------
kmemleak_lock 198499 [<000000000071b79a>] create_object+0x161/0x2c0
kmemleak_lock 145759 [<00000000795f3b3d>] find_and_remove_object+0x1b/0x90
kmemleak_lock 1445 [<00000000554658c3>] find_and_get_object+0x45/0xb0
kmemleak_lock 179 [<00000000dfc7b05b>] scan_block+0x2a/0x110
.......................................................................................................................................................
......................................................................

Embedded Recipes 2018 - Finding sources of Latency In your system - Steven Rostedt

Related slideshows

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

More Related Content

What's hot

What's hot (20)

Similar to Embedded Recipes 2018 - Finding sources of Latency In your system - Steven Rostedt

Similar to Embedded Recipes 2018 - Finding sources of Latency In your system - Steven Rostedt (20)

More from Anne Nicolas

More from Anne Nicolas (20)

Recently uploaded

Recently uploaded (20)

Embedded Recipes 2018 - Finding sources of Latency In your system - Steven Rostedt