Debug performance issues with strace

Have you ever been asked why is this program running so slowly?  Or why does it take 15 seconds to startup?

Strace might be the tool for you.  It’s a system call tracer which let’s you see what this program is requesting from the system. For example, if you run ls to a directory some key thins will be:
getdents(3, /* 47 entries */, 32768)    = 1928  (Get entries from a directory)
write(1, ..   (write out to stdout the output of ls)

You can add timing information to get more of an idea of what’s happening, I usually add:
-tt  (print absolute timestamps in usecs, very useful for comparing to nmap output for instance)
-T  (print time in each syscall)
-ff (follow forks, if it spawns other processes this will you separate them out to different files)

So run it all `strace -tt -T -ff ls’ and now it looks like:
14:09:10.747357 getdents(3, /* 48 entries */, 32768) = 1952 <0.000056>
14:09:10.747724 write(1, .....) = 57 <0.000031

Absolute time  syscall 

When using this on anything non-trivial, I suggest exporting it to a file (-o file).  You can also try analyzing it live as the long delays should cause strace output to stop.

I’ve got the strace output, it’s huge!  Now what?

Let’s start at the first one, which will be the lowest number, output_name.10001 for example. Check the time stamp at the top and the bottom, does it cover the time frame where you are investigating?  If so, dig in and find where the jump in time was.

In this example it was:
09:13:32.329980 write(1, "[2]... ", 7)  = 7 <0.000033>
09:13:32.330041 wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 4}], 0, NULL) = 14963 <4512.975521>
10:28:45.305644 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=14963, si_status=4, si_utime=0, si_stime=0} ---
10:28:45.305907 write(1, "[1]... ", 7)  = 7 <0.000036>

Or in other words, it’s really in that child process 14963 so let’s open up output_name.14963
Scroll throw to the first jump in time and we see:
09:13:20.418886 connect(5, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("")}, 16) = -1 ETIMEDOUT (Connection timed out) <127.317319>
... about 30 lines...
09:15:28.738327 connect(5, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("")}, 16) = -1 ETIMEDOUT (Connection timed out) <127.253925>
... many more lines here ...
10:26:37.977564 connect(5, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("")}, 16) = -1 ETIMEDOUT (Connection timed out) <127.326639>

So it’s timing out trying to connect to an IP address and then trying again over 10 times.  Using strace will help you the most when the wait is caused by something on the system level (disk, networking, etc).

Leave a Reply

Your email address will not be published. Required fields are marked *