July 2010 - Release of the new EXPERIMENTAL libcrm114. This is a pure ANSI C API to get access to the classifiers without having to use the .crm language. It is FAST, as accurate or better, and not hard to use. But remember, there's a certain amount of MENTAL in any software labeled EXPERIMENTAL.

August 10, 2007 - 20070810-BlameTheSegfault (562532 bytes) - bugfixes.

June 22, 2007 - 20070731-BlameTheInterns released - new classifiers: CLUMP and SVM - and bugfixes again (of course).

April 28, 2007 - 20070428-BlameSpamConf release is out . Latest release in src/ and CVS - bugfixes (of course) and new feature: INSERT [:var:foo.crm] now does var-expansion.

October 13, 2006: The MORE IMPROVED BIG CRM114 BOOK is HERE. (includes the entropic and hyperspace classifiers, and an even better index)! You can download a proofreader's copy at:

This is 285 pages of how-to-build the filter of your dreams with CRM114.

Current statistics:

  • January 1, 2007 to April 1, 2007 : 20,000 messages (~12K good, ~8K spam). Zero hard errors, one unsure on the wrong side. –wsy
  • August 20, 2006, I reinitialized my learning files and started retraining from scratch. On September 1, I started counting errors. As of 13-October-2006 I got ONE. That's all; over 10,000 messages and 1 error. –wsy
  • Month of April, 2005, I receieved over 10,000 emails. About 60% were spam. I had ZERO classification errors. ZERO. –wsy
  • Feb 1 through March 1, 2004, 8738 messages (4240 spam, 4498 nonspam), and my total error rate was ONE. That translates to better than 99.984% accuracy, which is over ten times more accurate than human accuracy –wsy
  • Comparison with a Human
    For comparison, I once measured my human accuracy to be around 99.84%, by classifying the same set of about 3000 messages twice over a period of about a week, reading each message from the top until I feel “confident” of the message status, (one message per screen unless I want more than one screen to decide on a message.) and doing the classification in small batches with plenty of breaks and other office tasks to avoid fatigue. Then I diff()ed the two passes to generate a result. Assuming I never duplicate the same mistake, I, as an unassisted human, under nearly optimal conditions, am 99.84% accurate.). CRM114 was more than ten times better. –wsy
  • Current filtering speed is about 120 kbyte/sec for a moderate (P-iii 1.4 GHz) mailserver.
