Using Libtracecmd to Analyze Your Latency and Performance Troubles

HOSTED BY
Using Libtracecmd to Analyze
Latency and Performance Troubles
Steven Rostedt
Software Engineer

Steven Rostedt
Software Engineer
■ One of the original authors of the PREEMPT_RT patch set
■ Creator and maintainer of ftrace
■ Creator of “make localmodconfig”
■ Creator and maintainer of “ktest.pl” Linux testing
framework

Tracing kernel latency on your application
■ The kernel is a black box

● Your application is at the whims of the kernel scheduler

● Interrupts can cause delays in your application

● Kernel lock contention could add to the latency

● Kernel lock contention could add to the latency
■ Tracing can give you insight into the happenings of the kernel
● Monitor the scheduling decisions the kernel is making
● Record when interrupts are happening and for how long
● See how long kernel locks are held

Using cyclictest
■ cyclictest tests the latency of the system
● Calls nanosleep() and sees when it woke up to when it expected to wake up

Using cyclictest
■ The Real Time Linux developers use this to test the jitter of the system
● Loads are run while cyclictest is measuring the latency
● Requires the latency to stay below a threshold
● May run for weeks or months

Using cyclictest
■ I will use cyclictest as a example for this talk

Using cyclictest
■ I will use cyclictest as a example for this talk
● But this works for any application

A typical cyclictest run
~# cyclictest -p80 -i250 -a -t -m -d 0
# /dev/cpu_dma_latency set to 0us
policy: fifo: loadavg: 0.00 0.00 0.00 1/9997 68021
T: 0 (68015) P:80 I:250 C: 18570 Min: 3 Act: 3 Avg: 3 Max: 7

A typical cyclictest run
~# cyclictest -p80 -i250 -a -t -m -d 0
-p PRIO --priority=PRIO priority of highest prio thread
-i INTV --interval=INTV base interval of thread in us default=1000
-a [CPUSET] --affinity Run thread #N on processor #N, if possible, or if CPUSET
given, pin threads to that set of processors in round-
robin order.
-t --threads one thread per available processor
-m --mlockall lock current and future memory allocations
-d DIST --distance=DIST distance of thread intervals in us, default=500

cyclictest and tracing
■ Can break break when a latency is greater than a given threshold

■ Will write into the kernel tracing buffer

■ Will write into the kernel tracing buffer
-b USEC --breaktrace=USEC send “breaktrace” command when latency > USEC
--tracemark write a “tracemark” when -b latency is exceeded

Break when greater than 100 microseconds
~# cyclictest -p80 -i250 -a -t -m -d 0 -b 100 --tracemark
INFO: debugfs mountpoint: /sys/kernel/debug/tracing/
# Thread Ids: 68066 68067 68068 68069 68070 68071 68072
# Break thread: 68066
# Break value: 561

Show the trace output with trace-cmd
~# trace-cmd show
# tracer: nop
#
# entries-in-buffer/entries-written: 1/1 #P:8
#
# _-----=> irqs-off/BH-disabled
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / _-=> migrate-disable
# |||| / delay
# TASK-PID CPU# ||||| TIMESTAMP FUNCTION
# | | | ||||| | |
<...>-68066 [000] ..... 816424.679452: tracing_mark_write: hit latency threshold (561 > 100)

Show the trace output with trace-cmd
~# trace-cmd show
# tracer: nop
#
# entries-in-buffer/entries-written: 1/1 #P:8
#
# _-----=> irqs-off/BH-disabled
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / _-=> migrate-disable
# |||| / delay
# TASK-PID CPU# ||||| TIMESTAMP FUNCTION
# | | | ||||| | |
<...>-68066 [000] ..... 816424.679452: tracing_mark_write: hit latency threshold (561 > 100)
561 us latency

Tracing cyclictest
~# trace-cmd start -e sched_switch -e sched_waking -e timer -e irq -e irq_vectors
cyclictest -p80 -i250 -a -t -m -d 0 -b 100 --tracemark

Tracing cyclictest
Scheduling
events

Tracing cyclictest
Timer events

Tracing cyclictest
Interrupt events

Tracing cyclictest
# Thread Ids: 68160 68161 68162 68163 68164 68165 68166
# Break value: 121

Tracing cyclictest
~# trace-cmd show | tail -20
<idle>-0 [006] d.h1. 818035.615492: hrtimer_expire_entry: hrtimer=00000000d57f52c5 function=hrtimer_wakeup now=8 18051328925800
<idle>-0 [006] d.h3. 818035.615492: sched_waking: comm=cyclictest pid=68165 prio=19 target_cpu=006
<idle>-0 [006] dNh1. 818035.615493: hrtimer_expire_exit: hrtimer=00000000d57f52c5
<idle>-0 [006] dNh1. 818035.615493: local_timer_exit: vector=236
<idle>-0 [006] d..2. 818035.615495: sched_switch: prev_comm=swapper/6 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=cyclictest
next_pid=68165 next_prio=19
cyclictest-68165 [006] ..... 818035.615496: hrtimer_init: hrtimer=00000000d57f52c5 clockid=CLOCK_MONOTONIC mode=ABS
cyclictest-68165 [006] d..1. 818035.615496: hrtimer_start: hrtimer=00000000d57f52c5 function=hrtimer_wakeup expires=8180 51329174969
softexpires=818051329174969 mode=ABS
cyclictest-68165 [006] d..2. 818035.615500: sched_switch: prev_comm=cyclictest prev_pid=68165 prev_prio=19 prev_state=S ==> next_comm=swapper/6
<idle>-0 [000] d.h1. 818035.615509: local_timer_entry: vector=236
<idle>-0 [000] d.h2. 818035.615509: hrtimer_cancel: hrtimer=00000000b12dfc3a
<idle>-0 [000] d.h1. 818035.615509: hrtimer_expire_entry: hrtimer=00000000b12dfc3a function=hrtimer_wakeup now=8 18051328943482
<idle>-0 [000] d.h3. 818035.615509: sched_waking: comm=cyclictest pid=68160 prio=19 target_cpu=000
<idle>-0 [000] dNh1. 818035.615511: hrtimer_expire_exit: hrtimer=00000000b12dfc3a
<idle>-0 [000] dNh1. 818035.615511: local_timer_exit: vector=236
<idle>-0 [000] d..2. 818035.615513: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=cyclictest
cyclictest-68160 [000] ..... 818035.615514: hrtimer_init: hrtimer=00000000b12dfc3a clockid=CLOCK_MONOTONIC mode=ABS
cyclictest-68160 [000] d..1. 818035.615515: hrtimer_start: hrtimer=00000000b12dfc3a function=hrtimer_wakeup expires=8180 51329192610
softexpires=818051329192610 mode=ABS
cyclictest-68160 [000] d..2. 818035.615519: sched_switch: prev_comm=cyclictest prev_pid=68160 prev_prio=19 prev_state=S ==> next_comm=swapper/0
sshd-51820 [001] d..2. 818035.615536: sched_switch: prev_comm=sshd prev_pid=51820 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest
cyclictest-68161 [001] ..... 818035.615546: tracing_mark_write: hit latency threshold (121 > 100)

Tracing cyclictest
~# trace-cmd export -o trace-cyclic.dat

Tracing cyclictest
~# trace-cmd report -l trace-cyclic.dat | tail -20
<idle>-0 6d.h1. 818035.615492: hrtimer_expire_entry: hrtimer=0xffffa2c26d5ffe70 now=818051328925800 function=hrtimer_wakeup/0x0
<idle>-0 6d.h3. 818035.615492: sched_waking: comm=cyclictest pid=68165 prio=19 target_cpu=006
<idle>-0 6dNh1. 818035.615493: hrtimer_expire_exit: hrtimer=0xffffa2c26d5ffe70
<idle>-0 6dNh1. 818035.615493: local_timer_exit: vector=236
<idle>-0 6d..2. 818035.615495: sched_switch: swapper/6:0 [120] R ==> cyclictest:68165 [19]
cyclicte-68165 6..... 818035.615496: hrtimer_init: hrtimer=0xffffa2c26d5ffe70 clockid=CLOCK_MONOTONIC mode=0x0
cyclicte-68165 6d..1. 818035.615496: hrtimer_start: hrtimer=0xffffa2c26d5ffe70 function=hrtimer_wakeup/0x0 expires=818051329174969
softexpires=818051329174969
cyclicte-68165 6d..2. 818035.615500: sched_switch: cyclictest:68165 [19] S ==> swapper/6:0 [120]
<idle>-0 0d.h1. 818035.615509: local_timer_entry: vector=236
<idle>-0 0d.h2. 818035.615509: hrtimer_cancel: hrtimer=0xffffa2c26d5efe70
<idle>-0 0d.h1. 818035.615509: hrtimer_expire_entry: hrtimer=0xffffa2c26d5efe70 now=818051328943482 function=hrtimer_wakeup/0x0
<idle>-0 0dNh1. 818035.615511: hrtimer_expire_exit: hrtimer=0xffffa2c26d5efe70
cyclicte-68160 0..... 818035.615514: hrtimer_init: hrtimer=0xffffa2c26d5efe70 clockid=CLOCK_MONOTONIC mode=0x0
cyclicte-68160 0d..1. 818035.615515: hrtimer_start: hrtimer=0xffffa2c26d5efe70 function=hrtimer_wakeup/0x0 expires=818051329192610
sshd-51820 1d..2. 818035.615536: sched_switch: sshd:51820 [120] R ==> cyclictest:68161 [19]
cyclicte-68161 1..... 818035.615546: print: tracing_mark_write: hit latency threshold (121 > 100)

Tracing cyclictest
cyclicte-68161 1..... 818035.615546: print: tracing_mark_write: hit latency threshold (121 > 100) cyclictest marker

Tracing cyclictest
CPU 1

Tracing cyclictest
># trace-cmd report -l --cpu 1 trace-cyclic.dat | tail -20
<idle>-0 1d.h3. 818035.615166: hrtimer_cancel: hrtimer=0xffffa2c26d5f7e70
<idle>-0 1d.h2. 818035.615166: hrtimer_expire_entry: hrtimer=0xffffa2c26d5f7e70 now=818051328600710 function=hrtimer_wakeup/0x0
<idle>-0 1dNh2. 818035.615168: hrtimer_expire_exit: hrtimer=0xffffa2c26d5f7e70
cyclicte-68161 1..... 818035.615171: hrtimer_init: hrtimer=0xffffa2c26d5f7e70 clockid=CLOCK_MONOTONIC mode=0x0
cyclicte-68161 1d..1. 818035.615171: hrtimer_start: hrtimer=0xffffa2c26d5f7e70 function=hrtimer_wakeup/0x0 expires=818051328849935
<idle>-0 1d..2. 818035.615339: sched_switch: swapper/1:0 [120] R ==> sshd:51820 [120]
sshd-51820 1d.h1. 818035.615416: local_timer_entry: vector=236
sshd-51820 1d.h2. 818035.615416: hrtimer_cancel: hrtimer=0xffffa2c26d5f7e70
sshd-51820 1d.h1. 818035.615417: hrtimer_expire_entry: hrtimer=0xffffa2c26d5f7e70 now=818051328850788 function=hrtimer_wakeup/0x0
sshd-51820 1d.h3. 818035.615417: sched_waking: comm=cyclictest pid=68161 prio=19 target_cpu=001
sshd-51820 1dNh1. 818035.615419: hrtimer_expire_exit: hrtimer=0xffffa2c26d5f7e70
sshd-51820 1dNh1. 818035.615419: local_timer_exit: vector=236

Tracing cyclictest
CPU 1

Tracing cyclictest
scheduled in

Tracing cyclictest
Woke up

Tracing cyclictest
Current time
Time to wake up

libtracecmd to read the trace.dat file
■ Automate the latency process

■ First search from the end of the trace
● Find the marker or flag that tells you where the problem happened

■ First search from the end of the trace
● Find the marker or flag that tells you where the problem happened
■ Continue backwards looking for other events
■ Search forward to collect timings

int main (int argc, char **argv)
{
struct tracecmd_input *handle;
struct tep_handle *tep;
struct data data;
int ret;
handle = tracecmd_open(argv[1], 0);
tep = tracecmd_get_tep(handle);
init_data(tep, &data);
tracecmd_iterate_events_reverse(handle, NULL, 0, find_trace_marker, &data, false);
printf("cpu=%d threshold=%d latency=%dnn", data.cpu, data.thresh, data.lat);
/* Now we know what CPU it is on, look at just this CPU */
cpus = tep_get_cpus(tep);
cpu_size = CPU_ALLOC_SIZE(cpus);
cpu_set = CPU_ALLOC(cpus);
CPU_ZERO_S(cpu_size, cpu_set);
CPU_SET_S(data.cpu, cpu_size, cpu_set);
/* Find where cyclictest was scheduled in */
tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size,
find_sched, &data, true);
/* Find where cyclictest timer went off */
find_timer_expire, &data, true);

static int find_trace_marker(struct tracecmd_input *handle, struct tep_record *record,
int cpu, void *data)
{
struct tep_handle *tep = tracecmd_get_tep(handle);
struct tep_event *marker;
static struct trace_seq seq;
struct data *d = data;
const char *str;
int thresh;
int type;
int lat;
marker = d->marker;
type = tep_data_type(tep, record);
if (type != marker->id)
return 0;
/* Make sure that the print has the data we want */
if (!seq.buffer)
trace_seq_init(&seq);
tep_print_event(tep, &seq, record, "%s", TEP_PRINT_INFO);
trace_seq_terminate(&seq);

static int find_trace_marker(struct tracecmd_input *handle, struct tep_record *record,
{
struct tep_event *marker;
static struct trace_seq seq;
const char *str;
int thresh;
int type;
int lat;
marker = d->marker;
if (type != marker->id)
return 0;
/* Make sure that the print has the data we want */
if (!seq.buffer)
trace_seq_init(&seq);
tep_print_event(tep, &seq, record, "%s", TEP_PRINT_INFO);
trace_seq_terminate(&seq);
Skip non-marker events

str = strstr(seq.buffer, "hit latency threshold");
if (!str) {
/* This is not the string you are looking for */
trace_seq_reset(&seq);
return 0;
}
sscanf(str, "hit latency threshold (%d > %d)", &lat, &thresh);
d->cpu = record->cpu;
d->lat = lat;
d->thresh = thresh;
d->pid = tep_data_pid(tep, record);
d->marker_time = record->ts;
trace_seq_destroy(&seq);
seq.buffer = NULL;
/* Stop the iterator, as we will now only search the current CPU */
return 1;
}
Parse the tracemarker

str = strstr(seq.buffer, "hit latency threshold");
if (!str) {
/* This is not the string you are looking for */
trace_seq_reset(&seq);
return 0;
}
sscanf(str, "hit latency threshold (%d > %d)", &lat, &thresh);
d->cpu = record->cpu;
d->lat = lat;
d->thresh = thresh;
d->pid = tep_data_pid(tep, record);
d->marker_time = record->ts;
trace_seq_destroy(&seq);
seq.buffer = NULL;
/* Stop the iterator, as we will now only search the current CPU */
return 1;
}
Record which CPU

{
struct data data;
int ret;
Only follow 1 CPU

{
struct data data;
int ret;
Look for sched_switch

static int find_sched(struct tracecmd_input *handle, struct tep_record *record,
{
struct tep_event *sched;
unsigned long long val;
int type;
sched = d->sched;
if (type != sched->id)
return 0;
tep_read_number_field(d->next_pid, record->data, &val);
d->sched_time = record->ts;
return -1;
}

{
int type;
sched = d->sched;
return 0;
return -1;
}
Exit if not sched_switch

{
int type;
sched = d->sched;
return 0;
return -1;
}
Record sched_switch time

{
struct data data;
int ret;
/* Find where cyclictest was woken up */
find_wakeup, &data, true);
Look for sched_waking

static int find_wakeup(struct tracecmd_input *handle, struct tep_record *record,
{
struct tep_format_field *waking_pid;
struct tep_event *waking;
int flags;
int type;
waking = d->waking;
waking_pid = d->waking_pid;
if (type != waking->id)
return 0;
tep_read_number_field(waking_pid, record->data, &val);
if (val != d->pid)
return 0;
/* Found the wakeup! */
d->wakeup_time = record->ts;
return -1;
}
Record wake up time

{
[..]
/* Find where the timer was added (the start of this cycle */
find_timer_start, &data, true);
printf("expected expire: %lldn", data.timer_expires_expect);
printf("timer expired: %lldn", data.timer_expire);
printf(" jitter: (%lld)n", data.timer_expire - data.timer_expires_expect);
printf("wake up time: %lldn", data.wakeup_time);
printf(" from timer: (%lld)n", data.wakeup_time - data.timer_expire);
printf("scehduled time: %lldn", data.sched_time);
printf(" from wakeup: (%lld)n", data.sched_time - data.wakeup_time);
printf("marker time: %lldn", data.marker_time);
printf(" from schedule:(%lld)n", data.marker_time - data.sched_time);
tracecmd_close(handle);
return 0;
}
Look for timer expire event

static int find_timer_expire(struct tracecmd_input *handle, struct tep_record *record,
{
struct tep_event *timer;
int type;
timer = d->hrtimer_expire;
if (type != timer->id)
return 0;
tep_read_number_field(d->hrtimer_expire_ptr, record->data, &val);
d->hrtimer = val;
tep_read_number_field(d->timer_now, record->data, &val);
d->timer_expire = record->ts;
d->timer_delta = val - record->ts;
return -1;
}

static int find_timer_expire(struct tracecmd_input *handle, struct tep_record *record,
{
int type;
timer = d->hrtimer_expire;
return 0;
tep_read_number_field(d->hrtimer_expire_ptr, record->data, &val);
d->hrtimer = val;
tep_read_number_field(d->timer_now, record->data, &val);
d->timer_expire = record->ts;
d->timer_delta = val - record->ts;
return -1;
}
Calculate timer event
time to ring buffer time

{
[..]
return 0;
}
Look for timer start event

static int find_timer_start(struct tracecmd_input *handle, struct tep_record *record,
{
int type;
timer = d->hrtimer_start;
return 0;
tep_read_number_field(d->hrtimer_start_ptr, record->data, &val);
if (val != d->hrtimer)
return 0;
tep_read_number_field(d->hrtimer_start_expires, record->data, &val);
d->timer_expires_expect = val - d->timer_delta;
d->timer_start = record->ts;
return -1;
}

static int find_timer_start(struct tracecmd_input *handle, struct tep_record *record,
{
int type;
timer = d->hrtimer_start;
return 0;
tep_read_number_field(d->hrtimer_start_ptr, record->data, &val);
if (val != d->hrtimer)
return 0;
tep_read_number_field(d->hrtimer_start_expires, record->data, &val);
d->timer_expires_expect = val - d->timer_delta;
d->timer_start = record->ts;
return -1;
}
Calculate expected time
vs actual time

{
[..]
return 0;
}

Now analyze the trace.dat file
~# trace-cyclic trace-cyclic.dat
cpu=1 threshold=100 latency=121
expected expire: 818035615415764
timer expired: 818035615416617
jitter: (853)
wake up time: 818035615416993
from timer: (376)
scehduled time: 818035615535531
from wakeup: (118538)
marker time: 818035615546097
from schedule:(10566)
Time in nanoseconds

Now analyze the trace.dat file
jitter: (853)
wake up time: 818035615416993
from timer: (376)
marker time: 818035615546097
Problem here!

Can analyze any events you want
■ This is just a small example of what you can do

■ Let’s look at locks

■ Using function tracing, trace all locking functions
● function tracing gives you the parent function too
● Can see where a lock was called

■ Using function tracing, trace all locking functions
● function tracing gives you the parent function too
● Can see where a lock was called
■ Trace both the lock and unlock functions
● Can get the latency of how long they are held

Function trace all lock functions
># trace-cmd start -e sched_switch -e sched_waking -e timer -e irq -e irq_vectors
-p function -l '*spin_lock*' -l '*spin_unlock*' -l '*spin_try*' -l '*read_lock*'
-l '*read_unlock*' -l '*read_try*' -l '*write_lock*' -l '*write_try*' -l '*write_unlock'
plugin 'function'
# Thread Ids: 69847 69848 69849 69850 69851 69852 69853
# Break value: 102

{
[..]
/* Now that we know all the times, look for the locks (going forward now) */
tracecmd_iterate_events(handle, cpu_set, cpu_size,
find_locks, &data);

{
[..]
/* Now that we know all the times, look for the locks (going forward now) */
find_locks, &data);
Go forward and find all locks

static int find_locks(struct tracecmd_input *handle, struct tep_record *record,
{
enum data_state state;
if (record->ts == d->marker_time)
return -1;
if (record->ts < d->timer_expires_expect)
state = STATE_START;
else if (record->ts < d->timer_expire)
state = STATE_EXPECT;
else if (record->ts < d->wakeup_time)
state = STATE_EXPIRED;
else if (record->ts < d->sched_time)
state = STATE_WAKEUP;
else
state = STATE_SCHED;
handle_locks(tep, record, data, state);
return 0;
}
Use saved times to know where
the lock happened

static int handle_locks(struct tep_handle *tep, struct tep_record *record,
struct data *data, enum data_state state)
{
struct lock_stack *lock;
struct lock_list *llist;
unsigned long long pip, ip;
const char *func;
bool start = true;
int type;
int id;
switch (state) {
case STATE_START:
llist = &data->locks_start;
break;
case STATE_EXPECT:
llist = &data->locks_start_expire;
break;
case STATE_EXPIRED:
llist = &data->locks_timer;
break;
case STATE_WAKEUP:
llist = &data->locks_wakeup;
break;
case STATE_SCHED:
llist = &data->locks_sched;
break;
}
if (!data->function || data->function->id != type)
return 0;
Save lock information
with associated state

tep_read_number_field(data->function_ip, record->data, &ip);
tep_read_number_field(data->function_pip, record->data, &pip);
func = tep_find_function(tep, ip);
id = find_start_lock(func);
if (id < 0) {
id = find_stop_lock(func);
if (id < 0)
return 1;
start = false;
}
if (start) {
push_lock(data, id, pip, record->ts, func);
return 1;
}
/* Pop the locks until we find our id */
lock = NULL;
do {
if (lock)
free(lock);
lock = pop_stack(data);
} while (lock && lock->id != id);
if (!lock)
return 1;
/* Add the total time of this lock */
end_timer(&llist->list, pip, record->ts, lock->name);
start_timer(&llist->list, pip, lock->time, NULL);
free(lock);
return 1;
}
Calculate how long lock was held

static const char *locks[] = {
"spin_", "raw_spin_", "_raw_spin_", "__raw_spin_",
"read_", "raw_read_", "_raw_read_", "__raw_read_",
"write_", "raw_write_", "_raw_write_", "__raw_write_",
};
static int find_start_lock(const char *func)
{
const char *lock;
int len;
int i;
for (i = 0; i < ARRAY_SIZE(locks); i++) {
lock = locks[i];
len = strlen(lock);
if (strncmp(lock, func, len) != 0)
continue;
if (strncmp(func + len, "lock", 4) == 0 ||
strncmp(func + len, "try", 3) == 0)
return i;
break;
}
return -1;
}
All known lock types

static const char *locks[] = {
"spin_", "raw_spin_", "_raw_spin_", "__raw_spin_",
"read_", "raw_read_", "_raw_read_", "__raw_read_",
"write_", "raw_write_", "_raw_write_", "__raw_write_",
};
static int find_start_lock(const char *func)
{
const char *lock;
int len;
int i;
for (i = 0; i < ARRAY_SIZE(locks); i++) {
lock = locks[i];
len = strlen(lock);
if (strncmp(lock, func, len) != 0)
continue;
if (strncmp(func + len, "lock", 4) == 0 ||
strncmp(func + len, "try", 3) == 0)
return i;
break;
}
return -1;
}
See if it is a lock (not unlock)

{
[..]
find_locks, &data);
print_lock_list(tep, "Locks taken and released from start to expected time:",
&data.locks_start);
print_lock_list(tep, "Locks held between expected time and timer:",
&data.locks_start_expire);
print_lock_list(tep, "Locks taken from timer to wakeup:", &data.locks_timer);
print_lock_list(tep, "Locks taken from wake up to sched:", &data.locks_wakeup);
print_lock_list(tep, "Locks taken from sched to print:", &data.locks_sched);

static void print_lock_list(struct tep_handle *tep,
const char *msg, struct lock_list *llist)
{
struct time_list *list = llist->list;
bool printed = false;
for (; list; list = list->next) {
if (!list->total)
continue;
if (!printed) {
printf("n%sn", msg);
printed = true;
}
print_lock(tep, list->id);
printf("total time: %lld (%d time%s) [%s]n",
list->total, list->cnt,
list->cnt != 1 ? "s" : "",
list->name);
}
}

Analyze the locks in the trace
jitter: (1038)
wake up time: 850947083217999
from timer: (358)
marker time: 850947083328710
Locks taken and released from start to expected time:
do_nanosleep+0x5f total time: 610 (1 time) [_raw_spin_lock_irqsave]
dequeue_task_rt+0x28 total time: 180 (1 time) [_raw_spin_lock]
finish_task_switch.isra.0+0x9b total time: 1450 (2 times) [_raw_spin_lock]
get_next_timer_interrupt+0x7b total time: 155 (7 times) [_raw_spin_lock]
hrtimer_get_next_event+0x47 total time: 140 (7 times) [_raw_spin_lock_irqsave]
hrtimer_next_event_without+0x67 total time: 161 (7 times) [_raw_spin_lock_irqsave]
sched_ttwu_pending+0xed total time: 1522 (1 time) [_raw_spin_lock]
poll_freewait+0x3d total time: 182 (7 times) [_raw_spin_lock_irqsave]
[..]

jitter: (1038)
wake up time: 850947083217999
from timer: (358)
marker time: 850947083328710
Locks taken and released from start to expected time:
do_nanosleep+0x5f total time: 610 (1 time) [_raw_spin_lock_irqsave]
dequeue_task_rt+0x28 total time: 180 (1 time) [_raw_spin_lock]
finish_task_switch.isra.0+0x9b total time: 1450 (2 times) [_raw_spin_lock]
get_next_timer_interrupt+0x7b total time: 155 (7 times) [_raw_spin_lock]
hrtimer_get_next_event+0x47 total time: 140 (7 times) [_raw_spin_lock_irqsave]
hrtimer_next_event_without+0x67 total time: 161 (7 times) [_raw_spin_lock_irqsave]
sched_ttwu_pending+0xed total time: 1522 (1 time) [_raw_spin_lock]
poll_freewait+0x3d total time: 182 (7 times) [_raw_spin_lock_irqsave]
[..]
Biggest difference

sigprocmask+0x85 total time: 199 (4 times) [_raw_spin_lock_irq]
n_tty_read+0x217 total time: 198 (2 times) [_raw_spin_lock_irqsave]
n_tty_read+0x5d9 total time: 233 (2 times) [_raw_spin_lock_irqsave]
try_to_wake_up+0x77 total time: 183 (1 time) [_raw_spin_lock_irqsave]
__wake_up_common_lock+0x7e total time: 415 (2 times) [_raw_spin_lock_irqsave]
tcp_poll+0x2b total time: 178 (1 time) [_raw_spin_lock_irqsave]
tcp_sendmsg+0x19 total time: 231 (1 time) [_raw_spin_lock_bh]
nf_conntrack_tcp_packet+0x8e4 total time: 447 (1 time) [_raw_spin_lock_bh]
sch_direct_xmit+0x43 total time: 168 (1 time) [_raw_spin_lock]
Locks held between expected time and timer:
__hrtimer_run_queues+0x120 total time: 456 (1 time) [_raw_spin_lock_irqsave]
Locks taken from wake up to sched:
enqueue_task_rt+0x1e8 total time: 169 (1 time) [_raw_spin_lock]
try_to_wake_up+0x246 total time: 1674 (1 time) [_raw_spin_lock]
hrtimer_interrupt+0x11d total time: 222 (1 time) [_raw_spin_lock_irq]
sch_direct_xmit+0x13a total time: 168948 (1 time) [_raw_spin_lock]
__dev_queue_xmit+0x7fe total time: 279 (1 time) [_raw_spin_lock]
rcu_note_context_switch+0x386 total time: 237 (1 time) [_raw_spin_lock]
Locks taken from sched to print:
finish_task_switch.isra.0+0x9b total time: 2056 (1 time) [_raw_spin_lock]

sigprocmask+0x85 total time: 199 (4 times) [_raw_spin_lock_irq]
n_tty_read+0x217 total time: 198 (2 times) [_raw_spin_lock_irqsave]
n_tty_read+0x5d9 total time: 233 (2 times) [_raw_spin_lock_irqsave]
__wake_up_common_lock+0x7e total time: 415 (2 times) [_raw_spin_lock_irqsave]
tcp_poll+0x2b total time: 178 (1 time) [_raw_spin_lock_irqsave]
tcp_sendmsg+0x19 total time: 231 (1 time) [_raw_spin_lock_bh]
nf_conntrack_tcp_packet+0x8e4 total time: 447 (1 time) [_raw_spin_lock_bh]
sch_direct_xmit+0x43 total time: 168 (1 time) [_raw_spin_lock]
Locks held between expected time and timer:
__hrtimer_run_queues+0x120 total time: 456 (1 time) [_raw_spin_lock_irqsave]
Locks taken from wake up to sched:
enqueue_task_rt+0x1e8 total time: 169 (1 time) [_raw_spin_lock]
try_to_wake_up+0x246 total time: 1674 (1 time) [_raw_spin_lock]
hrtimer_interrupt+0x11d total time: 222 (1 time) [_raw_spin_lock_irq]
sch_direct_xmit+0x13a total time: 168948 (1 time) [_raw_spin_lock]
__dev_queue_xmit+0x7fe total time: 279 (1 time) [_raw_spin_lock]
rcu_note_context_switch+0x386 total time: 237 (1 time) [_raw_spin_lock]
Locks taken from sched to print:
finish_task_switch.isra.0+0x9b total time: 2056 (1 time) [_raw_spin_lock]
Long lock held!

Questions?
Code is here:
https://www.trace-cmd.org
https://rostedt.org/code/trace-cyclic.c

Steven Rostedt
rostedt@goodmis.org
@rostedt@social.kernel.org
github.com/rostedt
Thank you! Let’s connect.

Using Libtracecmd to Analyze Your Latency and Performance Troubles

Recommended

Recommended

More Related Content

Similar to Using Libtracecmd to Analyze Your Latency and Performance Troubles

Similar to Using Libtracecmd to Analyze Your Latency and Performance Troubles (20)

More from ScyllaDB

More from ScyllaDB (20)

Recently uploaded

Recently uploaded (20)

Using Libtracecmd to Analyze Your Latency and Performance Troubles