CRM114 Newest Slightly Unstable Mainline (online updates)
wgetto pull down the (possibly) slightly unstable latest mainline version by typing
wget -m -np http://crm114.sourceforge.net/src/
in a directory where you want the
unstable mainline to be downloaded to. Note that this is a smart download and will only pull down new or modified files. CAUTION - for this wget pulldown, you *must* download TRE (Ville Laurikari's enhanced regex library) and install it ( using the ./configure –enable-static option). You can get a known-and-tested version of TRE right here in .bzip format or in .gz format.
or the most recent version from http://www.laurikari.net/tre
CRM114 C-callable Library
This is the callable library version of CRM114. It has most of the classifiers as the standalone language (with some significant improvements- one alpha tester says they saw a 10x speedup in their application). This version is LGPLed (Library GPL) so you can link it with your own code, whether open-source or proprietary. You still need TRE (on Fedora, “yum install tre-devel”). Note that with improvements come costs: libcrm114 classifiers are NOT compatible with standalone CRM114 class files (necessary, because libcrm114 classifiers can work even on systems that don't have filesystems, like embedded processors). The code is now pretty stable and the API solidly entrenched by use in several real products, so the api is unlikely to change in unpleasant ways.
Advantages of libcrm114: It's much faster; everything is in-memory. You can call everything directly from ANSI C. Because everything is in memory, it's good for embedded systems where you don't _have_ a unix-style file system to talk to. No arcane language to learn, it's all just ANSI C. You can export classifiers as ASCII “CSV-like” format so trained classifiers are 32/64-bit portable and cross-platform Linux/Mac/Windows portable (the internal binary classifier format is still tied to a particular architecture, but that's never exported any more).
Disadvantages of libcrm114: Not all classifiers are currently supported (in particular, Neural Net, Correllator, OSBF, and Winnow are NOT yet supported). There's no crazy language, so you need to get your data into memory on your own. You still need TRE. You do pay a (not horrible) startup cost loading a classifier from a an ASCII CSV-like file, but since you can then reuse the classifier for as many documents as you want, in the long term this cost is amortized down to zero and you get significant speedup.
LIBCRM114 Sources tarballs
CRM114 Red Hot Bleeding Edge Version
If you got any recent version, you may want to check+grab+apply diffs for a quicker/smaller download.
CAUTION - for all of these source tarballs, you *must* download TRE (Ville Laurikari's enhanced regex library) and install it ( using the ./configure –enable-static option). You can get a known-and-tested version of TRE right here in .bzip format or in .gz format.
Binary progs only
See the changelog for details.
md5sums for the Properly Paranoid
f68682ed4c821235c9de7f931a4d1529 crm114-20090807-BlameThorstenAndJenny.i386.tar.gz e3bccda2a497aa1bc78999229b427650 crm114-20090807-BlameThorstenAndJenny.src.tar.gz 8bcce8bb0d4e3659adbfbc16b7bc2332 crm114-200904023-BlameSteveJobs.i386.tar.gz 60c3bb5f0408341ce02c0571308259f3 crm114-200904023-BlameSteveJobs.src.tar.gz 15a6cb56ff19fea89d2376ecc745a704 20070731-BlameTheInterns_C99-noC99.diff.gz 600323e5fb8eeded15e8568f4e9d5e60 crm114-20070731-BlameTheInterns_libc-2.2.5-shared-s_bin.tar.bz2 563e4367ed4fd20d440f46b6eb8d1b97 crm114-20070731-BlameTheInterns_libc-2.2.5-static-s_bin.tar.bz2 eaa6ba4cbe38bfd35a89c63861d71d42 crm114-20070217-BlameBaltar_libc-2.2.5-shared-s_bin.tar.bz2 fcaf47e329da7228ec0d9c51d33347c3 crm114-20070320-BlameBaltar.i386.tar.gz c6ca19cce9bc40c6d10f7911686ea72d crm114-20070320-BlameBaltar.src.tar.gz ba545e4a536b5cf8dceb4a571579bfd4 crm114-20070329-BlameSpamConf.i386.tar.gz b1ab2a4fe1c573538e1f5bb9151133f1 crm114-20070428-BlameSpamConf.src.tar.gz 13a146583eca6ed079a72db69e67a947 crm114-20070731-BlameTheInterns.src.tar.gz 50419b5d563da414c5c0c0f256236fbc crm114-20070810-BlameTheSegfault.src.tar.gz e72e5c94008865cf720992a0b25d6e89 ../coolthings/tre-0.7.5.tar.bz2