I am the author of CRM114 and I corresponded with Professor Carmack for setup assistance during his study; he did have some problems with CRM114 that he brought to my attention and which were possibly never quite resolved.
I *do* run CRM114 myself; I also run SpamAssassin (regularly maintained and updated by the systems staff) on a parallel account. I find that SA gets about 90+ percent of what makes it past the firewall's immediate RBL lists (which matches Prof. Cormack's Figure 8 pretty closely); CRM114 nails 99.9% or more (this week, ending June 21, 2004, my CRM114 stats are 2528 nonspam and 1114 spam messages, and had just 1 error (a false reject) which is 99.972% accuracy.
I have gotten reports from some very happy users who are seeing similar accuracies; I've also gotten sad reports similar to Prof. Carmack's that show very weak accuracy.
I can conclude from this (and other reports) that filter performance varies _greatly_ with spam mix - that is to say, Your Mileage Will Vary.
As an example, consider the report's Fig 15, which compares CRM114's accuracy with respect to nonspam v. spam. Note that the two curves are displaced considerably, by a factor of accuracy between 3 and 5 times!
This is peculiar, because CRM114 is _entirely_ symmetrical; it does NOT have any predisposition toward (or against) erring on the side of caution; the only difference between nonspam and spam is the names of their statistics database files, which could be interchanged without affecting the filtering results.
Therefore, the two accuracy curves (errors per N emails) _should_ lie on top of each other; there is no difference in the processing. The fact that the nonspam v. spam error rate curves seem to differ by a factor of 3 to 5 in magnitude gives me some reason to believe that the setup issues Prof. Carmack encountered never really were completely addressed.
In short, there's unresolved questions in my mind on both sides of the fence, and I'd be glad to find answers anywhere.