CRM114 is a system to examine incoming e-mail, system log streams, data files or other data streams, and to sort, filter, or alter the incoming files or data streams according to the user's wildest desires. Criteria for categorization of data can be via a host of methods, including regexes, approximate regexes, a Hidden Markov Model, Bayesian Chain Rule Orthogonal Sparse Bigrams, Winnow, Correlation, KNN/Hyperspace, Bit Entropy, CLUMP, SVM, Neural Networks ( or by other means- it's all programmable).
Spam is the big target with CRM114, but it's not a specialized Email-only tool. CRM114 has been used to sort web pages, resumes, blog entries, log files, and lots of other things. Accuracy can be as high as 99.9 %. In other words, CRM114 learns, and it learns fast .
CRM114 is compatible with SpamAssassin or other spam-flagging software; it can also be pipelined in front of or behind procmail. CRM114 is also useful as a syslog or firewall log filter, to alert you to important events but ignore the ones that aren't meaningful.
People have been able to run CRM114 on Linux, BSD, Mac OS-X, and MS-Windows (natively and with Cygwin), and it has even been integrated with Microsoft Outlook and Qualcomm Eudora. See the CoolThings link below for details. I can't help on any of these except Linux, though if you ask on the mailing lists, someone might be able to assist you.
Not every user gets great results with the default classifier; that's why CRM114 has several different classifiers available. It's easy to switch classifiers and run a script to see what the tradeoffs are in terms of speed, accuracy, disk space, rate of learning, etc.
CRM114 is licensed under the GPL V2, so it's free and open for everyone.
NEW: The C-callable library, LIBCRM114, is available for experimental purposes
It's ANSI C callable, does everything in-memory (hence no slow file I/O), 32/64 bit portable, and portable across Linux/Mac/Windows. Also, it's LGPLed, so you can link it right into your app. Grab it from the download page. The only downside: it's experimental, so the API may change, and not all the classifiers are supported.