CRM114 Newest Slightly Unstable Mainline (online updates)

  • You can use wget to pull down the (possibly) slightly unstable latest mainline version by typing
     wget -m -np

    in a directory where you want the unstable mainline to be downloaded to. Note that this is a smart download and will only pull down new or modified files. CAUTION - for this wget pulldown, you *must* download TRE (Ville Laurikari's enhanced regex library) and install it ( using the ./configure –enable-static option). You can get a known-and-tested version of TRE right here in .bzip format or in .gz format.

or the most recent version from

CRM114 C-callable Library

This is the callable library version of CRM114. It has most of the classifiers as the standalone language (with some significant improvements- one alpha tester says they saw a 10x speedup in their application). This version is LGPLed (Library GPL) so you can link it with your own code, whether open-source or proprietary. You still need TRE (on Fedora, “yum install tre-devel”). Note that with improvements come costs: libcrm114 classifiers are NOT compatible with standalone CRM114 class files (necessary, because libcrm114 classifiers can work even on systems that don't have filesystems, like embedded processors). The code is now pretty stable and the API solidly entrenched by use in several real products, so the api is unlikely to change in unpleasant ways.

Advantages of libcrm114: It's much faster; everything is in-memory. You can call everything directly from ANSI C. Because everything is in memory, it's good for embedded systems where you don't _have_ a unix-style file system to talk to. No arcane language to learn, it's all just ANSI C. You can export classifiers as ASCII “CSV-like” format so trained classifiers are 32/64-bit portable and cross-platform Linux/Mac/Windows portable (the internal binary classifier format is still tied to a particular architecture, but that's never exported any more).

Disadvantages of libcrm114: Not all classifiers are currently supported (in particular, Neural Net, Correllator, OSBF, and Winnow are NOT yet supported). There's no crazy language, so you need to get your data into memory on your own. You still need TRE. You do pay a (not horrible) startup cost loading a classifier from a an ASCII CSV-like file, but since you can then reuse the classifier for as many documents as you want, in the long term this cost is amortized down to zero and you get significant speedup.

LIBCRM114 Sources tarballs

CRM114 Red Hot Bleeding Edge Version

If you got any recent version, you may want to check+grab+apply diffs for a quicker/smaller download.

Sources tarballs

CAUTION - for all of these source tarballs, you *must* download TRE (Ville Laurikari's enhanced regex library) and install it ( using the ./configure –enable-static option). You can get a known-and-tested version of TRE right here in .bzip format or in .gz format.

Binary progs only

See the changelog for details.

md5sums for the Properly Paranoid

f68682ed4c821235c9de7f931a4d1529  crm114-20090807-BlameThorstenAndJenny.i386.tar.gz
e3bccda2a497aa1bc78999229b427650  crm114-20090807-BlameThorstenAndJenny.src.tar.gz
8bcce8bb0d4e3659adbfbc16b7bc2332  crm114-200904023-BlameSteveJobs.i386.tar.gz
60c3bb5f0408341ce02c0571308259f3  crm114-200904023-BlameSteveJobs.src.tar.gz
15a6cb56ff19fea89d2376ecc745a704  20070731-BlameTheInterns_C99-noC99.diff.gz
600323e5fb8eeded15e8568f4e9d5e60  crm114-20070731-BlameTheInterns_libc-2.2.5-shared-s_bin.tar.bz2
563e4367ed4fd20d440f46b6eb8d1b97  crm114-20070731-BlameTheInterns_libc-2.2.5-static-s_bin.tar.bz2
eaa6ba4cbe38bfd35a89c63861d71d42  crm114-20070217-BlameBaltar_libc-2.2.5-shared-s_bin.tar.bz2
fcaf47e329da7228ec0d9c51d33347c3  crm114-20070320-BlameBaltar.i386.tar.gz
c6ca19cce9bc40c6d10f7911686ea72d  crm114-20070320-BlameBaltar.src.tar.gz
ba545e4a536b5cf8dceb4a571579bfd4  crm114-20070329-BlameSpamConf.i386.tar.gz
b1ab2a4fe1c573538e1f5bb9151133f1  crm114-20070428-BlameSpamConf.src.tar.gz
13a146583eca6ed079a72db69e67a947  crm114-20070731-BlameTheInterns.src.tar.gz
50419b5d563da414c5c0c0f256236fbc  crm114-20070810-BlameTheSegfault.src.tar.gz
e72e5c94008865cf720992a0b25d6e89  ../coolthings/tre-0.7.5.tar.bz2
