CRM114 - the Controllable Regex Mutilator


Nice logo, eh? It was created by Liz Manicatide, a very nice artist friend as a commissioned work- she's a hired-gun artist. You can hire her for artistic, web, and user-interface work as well:
lizm at emphasiscreative dot com .

CRM114 is a system to examine incoming e-mail, system log streams, data files or other data streams, and to sort, filter, or alter the incoming files or data streams according to the user's wildest desires. Criteria for categorization of data can be via a host of methods, including regexes, approximate regexes, a Hidden Markov Model, Orthogonal Sparse Bigrams, WINNOW, Correllation, KNN/Hyperspace, or Bit Entropy ( or by other means- it's all programmable).

Accuracy has been seen in excess of 99.9 per cent. In other words, CRM114 learns, and it learns fast .

CRM114 is compatible with SpamAssassin or other spam-flagging software; it can also be pipelined in front of or behind procmail. CRM114 is also useful as a syslog or firewall log filter, to alert you to important events but ignore the ones that aren't meaningful.

People have been able to run CRM114 on Linux, BSD, Mac OS-X, and Windows (natively and with Cygwin), and it has even been integrated with Microsoft Outlook and QUALCOMM Eudora. See the "Cool Things" link below for details. I can't help on any of these except Linux, though if you ask on the mailing list, someone might be able to assist you.

Not every user gets great results with the default classifier; that's why CRM114 has several different classifiers available. It's easy to switch classifiers and run a script to see what the tradeoffs are in terms of speed, accuracy, disk space, rate of learning, etc.

You can get at all of these exciting interconnects (including the Outlook macros) in Cool Things in the wiki.

CRM114 is licensed under the GPLv2; it is WITHOUT WARRANTY of ANY KIND, and although it is now in production on many sites, it will always be in perpetual BETA because the primary mission (antispam) is chasing a moving, actively evading target.

Use at your own risk, and send me bug reports! Or even better, send me improvements! If your code is substantial, I prefer to dual-license the code (i.e. we both get full rights to it, including the right to reuse and relicense under other licenses).

>>> Documentation, links, examples etc. now are all maintained in the wiki <<< Logo Valid HTML 4.0!