I've been working on a new web-based tool for analyzing Fedora: rpmgrok
It digests built RPMs, analysing the metadata and payload, and stores the results in a database. There's a web UI for viewing the data, an XML-RPC interface for querying it, and a command-line tool for using the XML-RPC interface.
I've got a prototype running on: http://publictest7.fedoraproject.org/rpmgrok
More info (e.g. source code) can be seen at https://fedorahosted.org/rpmgrok
The idea is to provide a new way for Fedora developers, testers, and other enthusiasts to track various things across the entire distribution, without having to have a full tree installed. It's probably usable by other Linux distributions.
rpmgrok is Free/Open? Source software (licensed under the LGPLv2.1)
What does it track?
- all symbols in binaries/libraries, and the dependencies between them, so that you can see e.g. exactly what calls a particular function. This can also be used to locate instances of static linkage. See e.g. http://publictest7.fedoraproject.org/rpmgrok/elffile/258085 (details of /lib/libexpat.so.1.5.2 from a built RPM)
- manifests of all RPMs, so that you can browse the files in packages via a web UI. See http://publictest7.fedoraproject.org/rpmgrok/files
- all shared objects names, and the dependencies between them. See e.g.
- http://publictest7.fedoraproject.org/rpmgrok/sonames (browsable view of all sonames in the distro)
- http://publictest7.fedoraproject.org/rpmgrok/elffile/739167 (details of /usr/lib/libxml2.so.2.6.32 from within a built libxml2 rpm)
- Everything implementing or linking against libpcre.so.0 down to the level of individual binaries: http://publictest7.fedoraproject.org/rpmgrok/soname/libpcre.so.0
- results of rpmlint of all rpms. See http://publictest7.fedoraproject.org/rpmgrok/rpmlint_messages for a UI to browse by error message, and e.g. http://publictest7.fedoraproject.org/rpmgrok/rpmlint/dangerous-command-in-%25post for an example of all error messages of a particular kind. It may be worth fixing some rpmlint errors (though others look like false-positives, and others are probably not worth it)
- all .desktop files and their fields so that you can e.g. find applications that can handle PDF files. See e.g. http://publictest7.fedoraproject.org/rpmgrok/mimetype/application/pdf for a view of all desktop files that can handle "application/pdf", and e.g. http://publictest7.fedoraproject.org/rpmgrok/desktopfile/253580 showing a specific desktop file
- SLOCcount stats for prepped source trees (e.g "what % of Fedora is in C/C++/Python?" etc). Don't have the data prepped yet.
- any other kind of thing we want to add (provided there's a sane way to gather it in a script and slurp it into the database, of course...)
- sizes of packages; why is package foo so big?
- report on all fonts in the distro, and what packages provide them etc
Note that due to my poor css there are lots of links that don't show up as such in the various table views. You may need to explore with the mouse to find all of the cross-referencing that the web UI has.
What's it currently showing?
I queued up an analysis of all of rawhide as of 2008-07-25 on i386; a little over 10000 built packages. It took about a week to process, and about 200 of these jobs failed for one reason or another. See https://fedorahosted.org/rpmgrok/ticket/9 for more info.
So the db is currently just showing a snapshot in time of rawhide two weeks ago, on one architecture (and missing 2% of the packages due to errors).
Ultimately I want to build things up so that we can show time-based trend reports e.g. the size of a minimal install over time (or whatever).
Hopefully this looks of interest to people.
I need help with coding, with sysadmin work, with making the UI better, and with things I probably haven't thought of yet etc. I hope this can be a useful tool for Fedora.
If you're interested in hacking on rpmgrok, get in touch. The README.txt file is hopefully of interest.
It's implemented using TurboGears? and SQLAlchemy (specifically, sqlalchemy 0.4, since it uses polymorphic inheritance features from that version).
It also has a somewhat general-purpose task scheduler, used to control a pool of worker hosts that do the actual analysis. It ought to be pluggable to do other types of task.
Git URLS are:
(you need to be in the gitrpmgrok of the Fedora Accounts System to have git push privileges; talk to me if you want to get involved)