wiki:WikiStart
Last modified 3 years ago Last modified on 06/10/11 19:05:59

gdb-heap

"gdb-heap" is an extension for the GNU debugger, gdb.

It's aimed at developers and sysadmins who wish to track down and fix excessive memory usage within programs and libraries.

There are many projects which allow you to trace dynamic memory usage within a user-space process, but these typically work by requiring you to set up something ahead of time to replace the standard "malloc", "free", "realloc" etc routines, overriding them with instrumented versions.

gdb-heap is different in that it allows for unplanned memory usage debugging: if a process unexpectedly starts using large amounts of memory you can attach to it with gdb, and use a new heap command to figure out where the memory is going. You should also be able to use it on core dumps.

It equips gdb with implementation details about glibc's implementation of "malloc", so that it's possible to walk over all dynamically-allocated memory blocks.

It has heuristics that examine the memory in each block and categorize them, either by what type of object they represent, or for such things as alignment wastage. Currently the project has been focusing on Python memory usage: it is able to determine the class of each Python object in memory. C++ object support is in the works (ticket #3).

In addition, it has some support for layered allocation schemes. Currently the only scheme it knows about is Python's, which allocates 256kb blocks using "malloc", which are then carved up for small allocations. Most memory usage tools will only tell you that a 256kb block was allocated; gdb-heap will give a detailed breakdown on exactly what each byte of the allocation is being used for.

Status

The project is currently an experimental prototype - bug reports are expected (patches most welcome!).

It's not yet as fast as I'd like it to be, and, ironically, it itself uses too much memory. I hope to fix both of these issues.

It works for the use-case of seeing what processes show up at the top of "top" in memory-sorted view, and using "gdb attach PID".

I gave a talk on Python's memory usage at PyCon US 2011, of which the second half is mostly about gdb-heap.

Usage

If running from a git working copy, you need to set PYTHONPATH to a path containing the code in the environment in which you invoke gdb e.g. at the command line

[david@fedora-14] $ export PYTHONPATH=path-to-gdb-heap

You can attach gdb to a running process: By PID:

[david@fedora-14] $ gdb attach 1976

By name:

[david@fedora-14] $ gdb attach $(pidof -x name-of-program)

Or a coredump:

[david@fedora-14] $ gdb -c core.1976

Or launch the program directly from gdb:

[david@fedora-14] $ gdb --args some-program some-args

You should then "import heap" within the python binding of gdb:

(gdb) python import heap

In the Fedora RPM package this is done for you when gdb loads glibc's debugging information (typically immediately after the program starts up).

The latest version of the code in git has a helper script to do all this for you when running from a git checkout:

$ ./run-gdb-heap NAME_OF_PROGRAM_TO_DEBUG ARGUMENTS TO THE PROGRAM

This registers the new heap command. By default it displays a breakdown of memory usage in the inferior process by type:

(gdb) heap
       Domain                        Kind              Detail   Count  Allocated size
-------------  --------------------------  ------------------  ------  --------------
       python                         str                       6,689         477,840
      cpython           PyDictEntry table                         167         456,944
      cpython           PyDictEntry table            interned       1         200,704
       python                         str            bytecode     648          92,024
uncategorized                                        32 bytes   2,866          91,712
       python                        code                         648          82,944
uncategorized                                      4128 bytes      19          78,432
       python                    function                         609          73,080
       python          wrapper_descriptor                         905          72,400
       python                        dict                         247          71,200
(snipped)

The categorization is divided into three parts:

  • "domain": high-level grouping e.g. "python", "C++", etc
  • "kind": type information, appropriate to the domain e.g. a class/type
    Domain Meaning of 'kind'
    'C' 'string data' signifies a NUL-terminated string
    'C++' the C++ class (the heuristic for this needs to be optimized)
    'python' the python class
    'cpython' C structure/type (implementation detail within Python)
    'pyarena' Python's optimized memory allocator
    'uncategorized' (none; gdb-heap wasn't able to identify what this is used for)
    'GType' the GLib type/GObject class, within the GTK+ stack (needs to be made robust and optimized)
  • detail: additional detail (e.g. the size of a buffer)

(Packagers should ensure that the heap command is automatically available, without requiring the user to set an environment variable or manually import the code)

There are numerous subcommands. heap is integrated into gdb's tab-completion, so that you can see the available commands with the TAB key:

(gdb) heap
[TAB pressed]
all    diff   label  log    sizes  used   

The precise subcommands are still in flux.

(gdb) heap all

shows all dynamic memory areas, both used and free

Query language

gdb-heap has a heap select subcommand, which provides a simple language for querying for blocks matching criteria:

(gdb) heap select size == 1778224
             Start                 End         Domain  Kind         Detail                                                                             Hexdump
------------------  ------------------  -------------  ----  -------------  ----------------------------------------------------------------------------------
0x000000000360a810  0x00000000037bca3f  uncategorized        1778224 bytes  00 00 00 43 00 00 86 60 00 00 00 3f 00 00 00 07 00 00 80 fc |...C...`...?........|
0x00000000068596c0  0x0000000006a0b8ef  uncategorized        1778224 bytes  00 00 00 43 00 00 86 60 00 00 00 3f 00 00 00 07 00 00 80 fc |...C...`...?........|

(gdb) heap select kind="string data" and size > 512
Blocks retrieved 10000
             Start                 End  Domain         Kind  Detail                                                                             Hexdump
------------------  ------------------  ------  -----------  ------  ----------------------------------------------------------------------------------
0x0000000000624070  0x000000000062430f       C  string data          41 20 63 6f 6e 74 65 78 74 20 6d 61 6e 61 67 65 72 20 74 68 |A context manager th|
0x0000000000627b50  0x0000000000627e8f       C  string data          41 20 64 65 63 6f 72 61 74 6f 72 20 69 6e 64 69 63 61 74 69 |A decorator indicati|
0x0000000000628b90  0x0000000000628e0f       C  string data          4d 65 74 61 63 6c 61 73 73 20 66 6f 72 20 64 65 66 69 6e 69 |Metaclass for defini|
0x0000000000661320  0x000000000066170f       C  string data          20 10 65 00 00 00 00 00 01 00 00 00 00 00 00 00 20 2e 78 05 | .e............. .x.|
0x00000000006a2410  0x00000000006a27ff       C  string data          20 13 66 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | .f.................|

Hexdumps

In addition to "heap", gdb-heap also provides a hexdump command, to help you figure out what a block of data is being used for.

(gdb) hexdump 0x000000000360a810

0x000000000360a810 -> 0x000000000360a82f 00 00 00 43 00 00 86 60 00 00 00 3f 00 00 00 07 00 00 80 fc 00 00 00 10 00 00 00 64 00 00 00 08 |...C...`...?...............d....|
0x000000000360a830 -> 0x000000000360a84f 00 00 00 00 00 00 00 01 00 00 03 e8 00 00 00 06 00 00 00 02 00 00 00 01 00 00 03 e9 00 00 00 06 |................................|
0x000000000360a850 -> 0x000000000360a86f 00 00 00 10 00 00 00 01 00 00 03 ea 00 00 00 06 00 00 00 16 00 00 00 01 00 00 03 ec 00 00 00 09 |................................|
(snip)

History

heap provides a "history" feature, somewhat analogous to a revision-control system such as git

(gdb) heap label

allows you to take a named snapshot of the current state of the heap.

(gdb) heap log

shows you all such named snapshots

(gdb) heap diff

allows you to compare dynamic memory allocations between two different states: either those in the log, or with the current state.

You can use this in conjunction with breakpoints and stepping through the code to get a sense of how different parts of the program affect memory usage.

Compatibility with gdb versions

The Archer project is a development branch of gdb. A number of its enhancements have landed in the main FSF version of gdb, for example the python support in gdb 7.

Unfortunately, the exact capabilities of gdb's python support varies between the FSF version of gdb, the archer version of gdb, and the versions in any given distribution.

gdb-heap is currently in "experimental prototype" stage, and makes heavy usage of the python interface to gdb, which makes compatibility difficult. For that reason we're targeting the gdb version in Fedora 13, with the possibility of requiring additional hooks from archer (to land in the gdb version in Fedora 14). Patches to make it work with other versions of gdb are welcome.

Some specific RFEs for gdb have been filed in Fedora's bug tracker:

Similarly, given that we rely heavily on the implementation details of glibc's heap, we are targeting the glibc version in Fedora 13 and Fedora 14.

Getting involved

Discussion about gdb-heap (usage and development) should happen on the mailing list: https://fedorahosted.org/mailman/listinfo/gdb-heap

Getting the code

The code can be obtained from this git repository: ssh://git.fedorahosted.org/git/gdb-heap.git

FIXME: ticket #5

Anonymous access

The code is tracked in a git repository. You can obtain it using:

git clone http://git.fedorahosted.org/git/gdb-heap.git

These is also a web-based browser for the code.

Authenticated access

git clone ssh://git.fedorahosted.org/git/gdb-heap.git

For push rights, you'll need to be a member of the gitgdb-heap group within the Fedora account system.

The maintainer and original author of gdb-heap is Dave Malcolm <dmalcolm@…>