Welcome to dropwatch
What is Dropwatch
Dropwatch is a project I am tinkering with to improve the visibility developers and sysadmins have into the Linux networking stack. Specifically I am aiming to improve our ability to detect and understand packets that get dropped within the stack. I've spent some time talking with many people about what they see as shorcommings in this area, and have come away with 4 points:
- Consolidation: Finding dropped packets in the network stack is currently very fragmented. There are numerous statistics proc files and other utilities that need to be consulted in order to have a full view of what packets are getting dropped within the stack. Consolidating all these utilities into one place is very helpful
- Clarity: Understanding which statistics and utility outputs correlate to actual dropped packets requres a good deal of knoweldge. Being able to simplify the ability to recognize a dropped packet is helpful
- Disambiguation: There is a gap between the recognition of a dropped packet and its root cause. Several statistics can be incremented at multiple points in the kernel, and sometimes for multiple reasons. Being able to point out, with specificity where and why a packet was dropped decreases the time it takes for a admin or developer to correct the problem.
- Performance: Checking the current user space utilities and stats for dropped packets is currently an exercise in polling. Its performance is sub-optimal and makes sysadmins hesitant to implement investigations on production systems due to potential performance impact. Improving performance would make admins more likely to use the tools to diagnose the problems.
How does dropwatch work
Normally, monitoring for dropped packets requires the creation of a script that periodically polls all the aformentioned interfaces, checking for a change in various counter values. Dropwatch uses the kernels dropmonitor netlink family protocol to listen for dropped packets. This protocol reports the count and exact location in the source code of each dropped packet, allowing an admin to know precisely where network traffic is lost within a host system
How do I get dropwatch?
dropwatch is built and available with kernel support in Fedora. Currently if you want the code, you can browse it here. The git repository address is: git://git.fedorahosted.org/dropwatch.git if you want to clone the tree and tinker Official releases are here. Dropwatch is also available in the fedora repositories as an RPM install
Current future enhancements for dropwatch:
- Configuration of protocol (delay hysteresis, alert bundle size, etc)
- Exporting of drop history to a file for later analysis
- Integration with debuginfo packages
- Graphing of drop histories
- Filtering locations
- Catching drops in hardware
Suggestions are of course welcome!
If you have a problem with dropwatch, or want to want to request a feature, you are welcome to contact me directly at nhorman@…, or open a trac bug here.
- Sep 20 2011 : I've incorporated a dropwatch perf script into the upstream tree, so you now have a choice - you can use the netlink protocol for the perf infrastructure to gather your drop stats
- Apr 7 2010 : Finally had some free time so I managed to get the /proc/kallsyms lookup method working. no more raw program counter output (if you don't want that :) ). I'll push an update to Fedora/EPEL soon!
- Apr 18 2009 : There seems to be a significant interest in dropwatch! I've actually gotten several bug reports/feature requests. Please open trac tickets here for the things you want and want fixed!
- Apr 6 2009 : Just had a thought, wonder if I could catch drops in hardware by checking dev->stats when a napi poll is scheduled, and when its serviced
- Mar 24 2009: Dropwatch has been approved for fedora inclusion. I'm importing it for F10, rawhide and forward, and starting to pull in the kernel bits
- Mar 19 2009: I've submitted for a review to package dropwatch for fedora, follow the review here
- Mar 14 2009: Kernel bits have been accepted and will be available starting with 2.6.30! I'm going to start working on adding some packaging code and getting the userspace bits into fedora in time for F11. Then I'll start working on the Roadmap items
- Mar 5, 2009: Request has come in to modify the kernel component to use generic netlink, so I've updated the user code to do the same
- Mar 3, 2009: The kernel bits are proposed for upstream review. Follow the review here