I attended a session by Bill Pugh (although sometimes it seemed more like a TB ward with all the coughing and sneezing going on) about using FindBugs on large code bases. FindBugs is a static analysis tool that analyses your class files without executing the program. Some people don’t think it should be needed but smart programmers still make dumb mistakes and FindBugs can catch these.
FindBugs can scale to very large code bases; Google has fixed more than 1000 issues discovered by FindBugs. Bill’s talk described ways of using FindBugs on a large project where the number of issues found can be overwhelming. For example, running FindBugs on Eclipse 3.4M2 discovered 36,000 issues. This can be made manageable by using FindBugs filters to filter out:
- Low priority issues (leaves 26,000)
- Vulnerability to malicious code (5,000)
- issues also present in v3.3 (now down to 62 issues)
The reason the vulnrability to malicious code is filtered as this is mainly for code that will run untrusted code, like the JVM.
Another key point was to integrate FindBugs into your CI. Hudson has a good plugin that can display historical results and cause FindBugs issues to affect the health of a build. It can also notify who caused the issue.
Bill gave some typical warnings density of 0.3 – 0.6 medium or high priority warnings per 1000 LOC and about 1 – 4 other potentially relevant warnings per 1000 LOC. But don’t use these numbers to judge whether your project is good or bad!
To narrow what issues you should be investigating he suggested ignoring the low priority issues. High/medium issues are useful for ranking issues within a pattern but not across patterns, ie, don’t just look at high issues. Each bug has a category, for example, correctness (code seems clearly wrong), security (xss, sql injection), bad practice (violates good practice), dodgy code (something weird that might be wrong), i18n, etc.
We use Hudson at work, so it’s probably worth trying the plugin to see what it turns up.