It's not super user-friendly, but I think this does what the OP wants:
callgrind_annotate --tree=both --inclusive=yes --auto=yes --show-percs=yes callgrind.out.${pid} > tree.${pid}
This creates a text file with two sections. The first section is a summary of profiled functions, sorted by the time spent in each. Example of one function:
3,090,761,742,907 (22.39%) < /home/me/proj/src/dir0/file0.cpp:caller0() (79,173,695x) [/home/me/proj/build/reldbg/lib/libzero.so]
768,380,030,255 ( 5.57%) < /home/me/proj/src/dir0/file1.cpp:caller1(unsigned int) (19,814,000x) [/home/me/proj/build/reldbg/lib/libzero.so]
3,861,537,817,283 (27.97%) * /home/me/proj/src/dir0/Abc.cpp:Abc::Abc() [/home/me/proj/build/reldbg/lib/libzero.so]
2,458,035,862,297 (17.80%) > /home/me/proj/src/dir1/file2.h:callee0() (99,049,427x) [/home/me/proj/build/reldbg/lib/libone.so]
1,378,046,252,252 ( 9.98%) > /home/me/proj/src/dir0/file3.cpp:callee1() (99,049,427x) [/home/me/proj/build/reldbg/lib/libzero.so]
The profiled function has the asterisk, its callers have less-than, the functions it calls have greater-than.
The second section is annotated source code, with the amount of time allocated on each line. For example:
--------------------------------------------------------------------------------
-- Auto-annotated source: /home/me/proj/src/dir0/file1.cpp
--------------------------------------------------------------------------------
Ir
-- line 37 ----------------------------------------
7,822,093,316 ( 0.06%) for (uint32 i = 0; i < VEC_SIZE; i++)
. {
21,018,143,872 ( 0.15%) data[i].reset();
110,822,940,416 ( 0.80%) => /home/me/proj/src/dir0/abc.cpp:Abc::reset() (1,910,740,352x)
. }