Last modified 8 years ago Last modified on 08/25/08 17:49:02

What is rpmgrok?

rpmgrok is a web application for browsing the payloads of a large collection of RPM software packages. It digests a set of input RPMs, analysing the metadata and payload, and storing the results in a database. There's a web UI for viewing the data, an XML-RPC interface for querying it, and a command-line tool for using the XML-RPC interface.

The idea is to provide a new way for Fedora developers, testers, and other enthusiasts to track various things across the entire distribution, without having to have a full tree installed. It's probably usable by other Linux distributions.

rpmgrok is Free/Open? Source software (licensed under the LGPLv2.1). It's at an early stage of development.

What does it track?

  • manifests of all RPMs, so that you can browse the files in packages via a web UI.
  • all symbols in binaries/libraries, and the dependencies between them, so that you can see e.g. exactly what calls a particular function. This can also be used to locate instances of static linkage.
  • all shared objects names, and the dependencies between them
  • results of rpmlint of all rpms
  • all .desktop files and their fields so that you can e.g. find applications that can handle PDF files
  • SLOCcount stats for prepped source trees (e.g "what % of Fedora is in C/C++/Python?" etc)
  • any other kind of thing we want to add (provided there's a sane way to gather it in a script and slurp it into the database, of course...)
    • sizes of packages; why is package foo so big?
    • report on all fonts in the distro, and what packages provide them etc

Where is it?

A public prototype was viewable at though this is currently down. I've reopened the hosting ticket so hopefully I'll be able to put a public prototype up again soon.

Who's involved?

Dave Malcolm (dmalcolm@…) is the original author.

It doesn't yet have a dedicated mailing list; for now use to discuss it.

Implementation Notes

It's implemented using TurboGears? and SQLAlchemy (specifically, sqlalchemy 0.4, since it uses polymorphic inheritance features from that version).

It also has a somewhat general-purpose task scheduler, used to control a pool of worker hosts that do the actual analysis.

Source Code

rpmgrok uses git to store its source code.

The source code can be viewed via a web UI at

If you're interested in hacking on rpmgrok, get in touch. The README.txt file is hopefully of interest.

You can get anonymous access to the source code via:

git clone git://

For write-access you'll need to be in the gitrpmgrok of the Fedora Accounts System to have push privileges; talk to me if you want to get involved. For this case, the invocation is

git clone ssh://


Related work

Inspiration includes

  • the OpenGrok project (though that appears to focus on source trees, whereas rpmgrok focuses on built binary packages)
  • the Debian project's lintian tool

Other stuff