This page is here to collect tips and tricks that are useful while hunting memory leaks at the C level, in the Tcl core or in extensions. To track script-level leaks (like lingering keys in global arrays, objects, channels, etc), see sibling page "[Leak Hunt (Tcl level)]". ** Description ** Once [Valgrind] has given you a detailed C-level stack trace of the point of allocation and a leak is spotted, of course it's time to switch to [gdb]. Different techiques are needed for different kinds of allocations, but a rather frequent class is that of refcounted things (like Tcl_Obj's). Here gdb's watchpoint tool is very useful: * breakpoint just after the allocation * set a watchpoint on ((ProperType *)0xHEX_ADDRESS_OF_OBJECT)->refCount * hit "cont" repeatedly and follow its Incr and Decr * it never hits zero (it's a leak, remember). Usually ends up at 1. * replay the scenario while taking note of where the extra refs are kept on Incr, and which are discarded on Decr * In the end you'll know whether: ** a Decr is simply missing ** a reference is mistakenly overwritten ** a reference is kept in other stuff that leaks too (in that case the block is "indirectly lost") ** Reference Loops ** '''One thing that should never occur in Tcl''' is a cycle in the graph of references among Tcl_Objs. Indeed, since we depend on refcounting for memory management, a cycle is an absolute show stopper. Fortunately, by design, the language is strictly unable to produce such "reference loops": the copy-on-write principle prevents all in-place operations which would "close the loop". Any exception to this rule is a bug, not in the script, but in the core. But here we're discussing core debugging, right ? So these things may happen. It happened to me in issue [https://core.tcl.tk/tcl/tktview?name=3386417%|%3386417], where the newly introduced [info errorstack] (TIP [https://core.tcl.tk/tips/doc/trunk/tip/348.md%|%#348]) was somehow plugged back into a compiled scriplet that it referred to. The interesting generalization is as follows: If: * you get a "definitely lost" object * you know that references to it remain * you know that these referers are still alive too, though they shouldn't * these referers end up in "indirectly lost" (not "still reachable", nor "definitely lost") Then it is very likely that you have a Reference Loop. Of course you're on your own to actually track it down, but knowing the mere existence of their kind may save you hours (it would have, in my case :/ ). ** When Valgrind Is Not Enough ** Sometimes, the above techniques are insufficient. For example, valgrind may pinpoint the full stack trace of the offending allocation... but if the same stack happens many times in the program's lifespan, it is hard to identify the interesting one. In that case, switching back to Tcl's own memdebug mode is helpful: revert the above advice and do ====== ./configure --enable-symbols=mem ====== `[memory]` is then the tool of choice to get to individual leaked blocks. Particularly interesting is [memory onexit], since it takes a snapshot at roughly the same time as valgrind does. However, the result is slightly noisier than with [valgrind], because it occurs slightly before the end of the universe (some guts remain to allow the dump to occur). [PYK] 2018-05-20: It can also be helpful to call `[memory%|%memory active]` right before a suspected trouble spot, once again immediately after, and then inspect a [diff] of the two results. ** Case Histories ** [https://www.mail-archive.com/fossil-users@lists.fossil-scm.org/msg20732.html%|%Re: Version 1.33 - /reports failing], 2015-06-02: Not a leak, but a tricky bug that was initially attributed to a C compiler optimisation bug, but turned out to be an array having automatic storage scope declared in a conditional block and then getting passed to another function which squirrels the pointer away in a variable with static duration. The automatic storage then goes out of scope and is reclaimed, leaving a dangling pointer in the static variable. Compilers don't catch this situation and [valgrind] often doesn't either, leaving human alertness as nearly the only defense. The fix is [http://fossil-scm.org/index.html/info/8184f39d803f9ad6%|%here]. [AMG]: Actually, Valgrind can show this class of problem. ======c #include int *global; void f(void) {int local = 42; global = &local;} void g(void) {printf("%d\n", *global);} int main(void) {f(); g(); return 0;} ====== Build and run this, and "42" is probably displayed. Maybe. Call something else between f() and g(), and who knows what you'll get. Run using `valgrind --tool=memcheck`, and this is the result: ======none ==22929== Conditional jump or move depends on uninitialised value(s) ==22929== at 0x4E834E0: vfprintf (in /lib64/libc-2.21.so) ==22929== by 0x4E8A8E8: printf (in /lib64/libc-2.21.so) ==22929== by 0x400675: g (in /home/andy/test) ==22929== by 0x400685: main (in /home/andy/test) ==22929== ==22929== Use of uninitialised value of size 8 ==22929== at 0x4E7F4BB: _itoa_word (in /lib64/libc-2.21.so) ==22929== by 0x4E837C8: vfprintf (in /lib64/libc-2.21.so) ==22929== by 0x4E8A8E8: printf (in /lib64/libc-2.21.so) ==22929== by 0x400675: g (in /home/andy/test) ==22929== by 0x400685: main (in /home/andy/test) ==22929== ==22929== Conditional jump or move depends on uninitialised value(s) ==22929== at 0x4E7F4C5: _itoa_word (in /lib64/libc-2.21.so) ==22929== by 0x4E837C8: vfprintf (in /lib64/libc-2.21.so) ==22929== by 0x4E8A8E8: printf (in /lib64/libc-2.21.so) ==22929== by 0x400675: g (in /home/andy/test) ==22929== by 0x400685: main (in /home/andy/test) ==22929== ==22929== Conditional jump or move depends on uninitialised value(s) ==22929== at 0x4E83839: vfprintf (in /lib64/libc-2.21.so) ==22929== by 0x4E8A8E8: printf (in /lib64/libc-2.21.so) ==22929== by 0x400675: g (in /home/andy/test) ==22929== by 0x400685: main (in /home/andy/test) ==22929== ==22929== Conditional jump or move depends on uninitialised value(s) ==22929== at 0x4E835BA: vfprintf (in /lib64/libc-2.21.so) ==22929== by 0x4E8A8E8: printf (in /lib64/libc-2.21.so) ==22929== by 0x400675: g (in /home/andy/test) ==22929== by 0x400685: main (in /home/andy/test) ==22929== ==22929== Conditional jump or move depends on uninitialised value(s) ==22929== at 0x4E8364A: vfprintf (in /lib64/libc-2.21.so) ==22929== by 0x4E8A8E8: printf (in /lib64/libc-2.21.so) ==22929== by 0x400675: g (in /home/andy/test) ==22929== by 0x400685: main (in /home/andy/test) ====== Lots of badness. But change the variable `local` to be static, and all the above warnings go away. <> Debugging