these news are lazily (un)maintained, please check the README in the sources.
check the current version here. See Download for how to get latest version.
July 2010 - Release of the new EXPERIMENTAL libcrm114. This is a pure ANSI C API to get access to the classifiers without having to use the .crm language. It is FAST, as accurate or better, and not hard to use. But remember, there's a certain amount of MENTAL in any software labeled EXPERIMENTAL.
August 10, 2007 - 20070810-BlameTheSegfault (562532 bytes) - bugfixes.
June 22, 2007 - 20070731-BlameTheInterns released - new classifiers: CLUMP and SVM - and bugfixes again (of course).
April 28, 2007 - 20070428-BlameSpamConf release is out . Latest release in src/ and CVS - bugfixes (of course) and new feature:
INSERT [:var:foo.crm] now does var-expansion.
October 13, 2006: The MORE IMPROVED BIG CRM114 BOOK is HERE. (includes the entropic and hyperspace classifiers, and an even better index)!
You can download a proofreader's copy at:
This is 285 pages of how-to-build the filter of your dreams with CRM114.
January 1, 2007 to April 1, 2007 : 20,000 messages (~12K good, ~8K spam). Zero hard errors, one unsure on the wrong side. –wsy
August 20, 2006, I reinitialized my learning files and started retraining from scratch. On September 1, I started counting errors. As of 13-October-2006 I got ONE. That's all; over 10,000 messages and 1 error. –wsy
Month of April, 2005, I receieved over 10,000 emails. About 60% were spam. I had ZERO classification errors. ZERO. –wsy
Feb 1 through March 1, 2004, 8738 messages (4240 spam, 4498 nonspam), and my total error rate was ONE. That translates to better than 99.984% accuracy, which is over ten times more accurate than human accuracy –wsy
Comparison with a Human
For comparison, I once measured my human accuracy to be around 99.84%, by classifying the same set of about 3000 messages twice over a period of about a week, reading each message from the top until I feel “confident” of the message status, (one message per screen unless I want more than one screen to decide on a message.) and doing the classification in small batches with plenty of breaks and other office tasks to avoid fatigue. Then I diff()ed the two passes to generate a result. Assuming I never duplicate the same mistake, I, as an unassisted human, under nearly optimal conditions, am 99.84% accurate.). CRM114 was more than ten times better. –wsy
Current filtering speed is about 120 kbyte/sec for a moderate (P-iii 1.4 GHz