Entropic Classification Basics
  This is an implementation of an entropic bitwise lattice Markov
  classifier.  This is very roughly based on some ideas in the DMC
  (Dynamic Markov Compressor) data compression technique of Gordon
  Cormack et al and inspired by Andrej Bratko's use of it in spam
  filtering ( Gordon originally proposed DMC for compression and
  then Andrej used it for spam filtering.  Matthew Young-Lai
  then proposed a technique for merging nodes which is limited
  to re-merging the "cloned" nodes in Gordon's work.  Our merging
  method here has no such limitation.)  

  This particular implementation is entirely separate, similar in
  tht it uses a Markov chain, but does not use cloning.  Rather it
  uses cross- linking, in a minimum-entropy format, and supports 
  "open-ended" Markov chains, rather than the toroidal lattices 
  required in Cormack's DMC.
  For those of you into references as "closest prior", see:
  1) Cormack, Gordon V. et al, Data Compression using Dynamic Markov 
     Modeling (Computer Journal #6, 1987
  2) Bratko, Andrej, Markov Modeling for Spam Filtering (?) TREC
  3) Matthew Young-Lai,  Adding state merging to the DMC data 
     compression algorithm, Information Processing Letters 70 
     (1999) 223�228.
  The basic concept is to construct a Markov chain that describes
  the sequential bit pattern of a message, and then use that chain
  to compress the incoming unknown text; the smaller the entropy
  the better the quality of match.  Yes, this sounds like
  a horrible idea, but Andrej Bratko at TREC 2005 has shown that
  it does work.  One advantage is that it can work on files that
  have no obvious tokenizations, like JPGs.
  Like every other Markov chain, there's a start node, and then
  for each observed 1 or 0 there's an observed freqency of 1 or 0,
  and the associated next node for a 1 or 0.  The quality of match
  is just the sums of the entropy (*) of each path taken.  
  More correctly, entropy is the weighted sum for an ensemble of
  signals, and is -Psig * log2 (Psig), and the sum over an
  ensemble of possible texts yields the entropy which is the
  average bit rate.  Whichever Markov chain shows less entropy
  is the one that matches best.  However, in this case we're 
  only given one example, which is the text in consideration,
  and so the probability of each transition is 1.0000 exactly
  (as we have only a single sample of the unknown).  Thus, we 
  actually sum the bits required; that is, log2 (Psig).
  Nota Bene 1: in order to make this code a little bit more
  general, we have each node accept an alphabet of N different
  symbols, where N defaults to 2 (that is, bitwise) but we'll
  write the code to be more general than that; it's a compile-time
  change rather than a hack-the-code change.  States *are* always
  numbered starting at zero, though, so consider the alphabet to
  always be an ENUM.
  Nota Bene 2: If you happen to be familiar with the other CRM114
  classifiers, note that this classifier does NOT use the tokenizer
  regex system, nor the SBPH or the OSB feature set.  Instead,
  all that state information is captured by where you are in the
  Markov chain, and the concept of "token" is replaced by the 
  concept of "alphabet".  By default, right now the "alphabet"
  is single bits.
  Nota Bene 3: As currently designed, the learning algorithm
  defaults to initialize with a toroidal perfect shuffle- thus the
  default model always takes constant memory and you can't run off
  the "edge" of the model.  We still need to fix it so that
  "crosslink" will allow a jump to another more appropriate (lower
  entropy) part of the braid; that's not there yet.
  Nota Bene 4: As the toroidal shuffle didn't give extremely good
  results, there's also a "unique state" classifier, set by the
  flag < unique >.  This generates a *loopless* chain where each
  text learned produces a single linear chain; this means we
  follow whatever model has already had the same bit pattern, but
  whenever we diverge, we start allocating new bit nodes
  immediately.  This works decently but uses up a lot of nodespace
  quickly.  It also means that you can run off the "edge" of the
  model because it's not closed like the toroidal shuffle.  By
  default, running off the edge causes a restart at the START node
  (node 1, as it turns out; node 0 is a
  bookkeeping/anchor/sentinel node).  This works, but doesn't give
  very high accuracy as the reSTARTs are not time-aligned with the
  model; they might even occur in the middle of a byte.
  Nota Bene 5: In the quest for better accuracy, instead of doing
  a reSTART when we're classifying and fall off the edge of a
  'unique' model, we branch to a node that has a very similar
  prior history.  How this is done is seen below but it boils down
  to finding a node that already exists that has had a very
  similar prior history bit pattern.  This can actually be done in
  near-constant (and pretty small!) time; the magic to do this is
  all described below.
  Nota Bene 6: In the quest for smaller (and faster!) models and
  the ability to not run out of nodes, we have the <crosslink>
  flag, which allows the LEARNing process to do the same branching
  that is done when we run off the end of the model chain during
  classification.  More specifically, before allocating a new
  node, we look for the closest prior history in already-allocated
  nodes, and if it's "close enough" (as set by a threshold), we
  make that the next node in our chain.  This produces Markov
  models with both acyclic joins and repeating loops, and
  effectively prevents the model from growing without bound, as
  the model reuses nodes quite effectively (sometimes more than a
  300:1 reuse!).  It also produces a model that is quite accurate
  when trained with SSTTT.  Although we can argue that ROCAC isn't
  the best possible figure of merit, SSTTT can yield very good
  ROCAC values (0.03 per cent); the terminal error rate on the
  TREC06 public corpus can be as low as 3 errors per 10,000 (that
  is, three errors in the final ten thousand messages, of the
  92Kmsgs corpus.)
       Algorithm for Classification:
  Follow the Markov chain from START with total entropy of 0 to
  begin, then walk the Markov chain taking each branch as
  instructed by the successive bits of incoming text.  Along the
  path, accumulate the compressed size as defined by - log2 (Pt)
  for the current transition.  Here, we define probability for a
  transition Pt = Nt + c / (Nt + N~t + 2c).  Note that for c = 0
  this can yield probabilities == 0.00 exactly, which will yield
  an infinite entropy (log of zero is... not a good thing).
  Gordon Cormack's paper indicates a "small" value of c is good
  for shorter texts; unfortunately there's no mention of what
  "small" means.
  If you run off the end of the chain (some learning algorithms
  generate terminating chains, others always generate loops) then
  do one of : (1) stop accumulating entropy; you can't go any
  further; (2) go back to the START node and keep accumulating, or
  (3) jump to the node with the closest prior history and keep
  Do this on each bitwise markov file; the file which accumulates
  the lowest compressed size is the best matching file.
        Algorithm for Learning - Overview
  Before you learn in the first text, initialize the data
  structure with a START node with NULL exit nodes for each of the
  possible exits.  If you want, you can put something bigger or
  more complex in, but it's not necessary.  If you're running with
  constant storage, allocate the rest of the storage, preformat
  them as a linked list of free nodes, and off you go.
  Follow the chain, incrementing "seen" counts as you go.  If
  you're learning a file, and you end up at a node exit that's
  a NULL (meaning no node is yet linked in for the current bit
  pattern) then grab a node off the free list (or malloc one).
  See below under "advanced methods" for how to get more 
  nodes by merging very similar nodes together.

Whenever a node is seen with both a sufficiently large number of

  seens, and a sufficiently balanced number of differing symbols,
  "clone" the node so that the Markov chain branches on the
  previous node.  This is to split the chain where the current
  state is insufficient to model the underlying process.  This
  only can work if the Markov model somehow contains a merge;
  otherwise each node can only be reached from a single other node
  (and so on, all the way back to the START node).
  In Cormack's version, when you run out of free nodes, you must
  reinitialize the storage and rebuild the model (using only the
  last part of the corpus data), but there are ways to merge
  similar nodes (see "advanced methods" below) to get more free
  nodes without throwing away data.
  Note - there is a "classical" method for building a hidden
  Markov model, but it's O(n^2) in the number of nodes in the
  output model!  It basically uses hillclimbing techniques to form
  the convex hull and then climbing to the local top of the hull
  repeatedly.  That's fine for small HMMs but here, when we may
  have a million nodes, that's way too slow.

Method 1 - for both initial-loop or terminated Markov chains

      Follow the Markov chain from START, incrementing the seen-0
      and seen-1 bits of the Markov chain.  When you hit a 0 or 1
      choice pointing to NULL, grab an unused node off of the free
      list.  If the free list is empty, use an advanced method to
      get more.
     Advanced Method - Cloning a Node
  If a node has more than one infeeder, there is always the
  possibility that the node really should be two nodes.  This
  could happen either because of a prior merge, a prior clone, or
  because the initial node setup had a loop in it (which it may or
  may not; it can work both ways.)
  A node should be cloned when there is a sufficient number of
  incoming events to be statistically significant, and the
  outgoing events are sufficiently balanced (i.e. the current node
  is of sufficiently low predictive value - that is, it's high
  entropy no matter what you do ) to lend credence to the idea
  that this one node really represents two states and splitting
  the node is a good idea.
  Of course, there's no reason (or way!) to clone a node that has
  only one infeeder even if the node state itself has low
  predictive power; in this case what's really happening is that
  the real corpus' hidden Markov model is diverging at a prior
  node.  There simply isn't any predictive state available.
  Topic for further work - given only a single infeeder, maybe we
  should back propagate the "split" up the chain of single priors
  until a state is found with at least two infeeders, and clone
  *that* node and all successor nodes down to the current node.
  Note that if we hit the START node we don't do this as there's
  no way to clone START.  
  The simple and obvious way to clone a node of low predictive
  power is to keep track of the previous node while doing a
  training traverse, and only clone off the path from the previous
  node.  This means creating a new node with the same successor
  nodes as the current node (or not; it depends on whether the
  current node has a successor or is "end of the line"), and has
  only a single infeeder, which we know is at best a single line
  of prior states of arbitrary length, and at worst a series of
  nodes that all looked similar at some time in the past.
         Advanced methods - getting free nodes by merging nodes
NOTA BENE: the code to implement merging is not debugged because the
 open-graph training with crosslinking works so well.  Consider
  this stuff more as "design notes for the future".
  merging nodes - in order to represent loops in the markov model,
  it's sometimes necessary to loop back into prior Markov states.
  Finding the appropriate prior state can be a hassle.  Cormack and
  Lai both postulate a "way" to find the appropriate prior state,
  but don't describe an efficient way to do so.
  Here's a shortcut, using arithmetic to get an infinite impulse
  response out of a floating point number; the already established
  node with the closest floating point number to the current
  generated floating point number is the "closest node".  This is
  a different version of "arithmetical coding", but it can be
  lossy (in a controlled way).  See the Guazzo coding method in
  Cormack's paper - note that we do NOT use Guazzo coding for 
  generating output but rather just for quickly finding the node
  whth the closest prior state for us to jump to when we run
  off the edge of a model.
  Clearly, one could weight the prior bit by a factor of 1/2
  (really 1/(2^1)), and weight the prior float by 1 - (1/2) in
  which case the prior float's mantissa is just the last N bits,
  in reverse order.  Unfortunately, this also gaurantees that
  prior history utterly disappears as soon as it rolls out of the
  number of bits in the matissa of the local floating point
  More interesting is larger values of the exponent; for example a
  value of 2 yields a half-bit fractional representation, so an
  error of 1 bit seen N steps in the past is overrideable by two
  correct bits at N+1 and N+2 bits in the past.  For example, the
  with an exponent of 2, the most recent bit has significance
  1/4 = 0.250, while the second-most-recent is 3/4 * 1/4 = 3/16 =
  .1875, and the bit before that is 3/4 * 3/16 = 9/64 = .04687,
  and so on.
  Higher exponents just stretch out this series even further.
  There is no need to assume only an integer exponent.  Indeed,
  it's only because of the finite precision of the computer that
  this a finite impulse response (FIR) filter; infinite precision
  would give infinite impulse response and "total recall" of prior
  states in the Markov process - but also the ability for some
  long-ago series of perfectly correct steps to blow away the more
  recent entropy.  For this reason, we usually use an exponent of
  1, which yields a prior bit weight of 1/2 (whch is just the
  floating-point mantissa as described above).  (NB: as of 20070101,
  the "floating point" has been replaced by a "long long" with 64 bits
  of precision.  This uses less space) 
  Values of the exponent less than one yield a situation where the
  range of possible outcome FIR numbers becomes discontinuous;
  that is, significant chunks of the range become unreachable by
  any possible prior bit series.  Consider the FIR value 0.5000
  with exponent of 0.5.  The most recent bit then has weighting of
  0.70711, with only .29289 of weight available for all following
  bits.  The maximum value achieveable for a pattern 0111111.. is
  then .29289, while the minimum value achieveable for pattern
  100000... is 0.70711, so the range of possible FIR numbers does
  not contain any value between .29289 and .70711.  Recursively,
  similar subranges cease to be accessible as well.
  The exponent for "exact ambivalence" between an error in state N
  versus two errors in state N-1 and N-2 is around 1.385; the
  exponent for ambivalence between an error in state N versus
  errors in states N-1, N-2, and N-3 is about 1.13 .  Note that both of
  these numbers yield rather short "memories" in the FIR filter
  before the Nth prior bit underflows and is lost completely.
  By using the FIR coding of prior state, we need only keep a
  table of the FIR path floating point number generated on the
  path that first generated this node (which can be sorted,
  yielding a _very_ fast node merging capability; just make one
  pass through the sorted table looking for the two closest
  floating point numbers.)  Those close pairings of nodes are good
  merge candidates.
  It's probably worth checking these merge candidates to see if
  they have similar local probabilities - that is, if they have similar
  0-counts and 1-counts.  If one node goes 0 mostly and the other
  goes 1 mostly, then there's a good chance that this isn't a good
  merge.  How to weight FIR prior history match versus the local 
  outbound probability is a good question.
  Another merge candidate is to look at nodes which have identical
  0 or 1 destinations.  However, finding this requires either an
  extensible list of back pointers or brute-force search (albeit
  over a very small subset of the entire Marov Chain).  
  If we're running in <unique> mode (that is, nodes are allocated
  on each unique path, and no node has ancestors with any
  possibility of significantly deviating FIR values, We can also
  back-calculate the approximate "feeder" node FIR values that
  could lead to a particular node, given the decreasing exponent.
  For example, if the current node's FIR is .3700000, then it can
  only have been reached by a 0-branch from a node whose prior was
  (2 * .3700000 = .7400000 ) plus or minus any crosslink-error
  threshold (which is typically under 1 in a million or less).
  So, we can just use our FIR lookaside table to find all possible
  ancestors to this node.
  When merging two nodes, we (speaking in the 0/1 tense; larger
  alphabets are an obvious extension) we perform the following steps:
  • update the to-be-kept node with the sums of the outbound 0-seen

and 1-seen counts. Without any loss, we can choose the

    to-be-kept node is the node with the lower index number.
  * we keep whichever output chain had the highest counts of each
    of the 0-seen and 1-seen categories
  * we set the FIR prior history to the weighted average of the 
    two prior weighted averages
  * for the two output nodes which were _not_ kept (a 0-chain and
    a 1-chain) mark these chains for eligibility for garbage 
    if we choose active collection;
  * put the merged-away node on the free list.
  * make an update pass thorugh the nodeset; any node outlet pointing to
    the merged-away node is reset to point to the to-be-kept node.
  We may choose to merge more than one node when we encounter
  an empty free list.  Each merge gives us one node for sure, plus
  the possibility of many others.  
  At some point, a garbage collection is necessary to reclaim nodes
  that no longer have a prior node.  We can do this either by:
  * reference counts - each node can keep an accurate reference count
    of nodes that can transfer to this node.  Since there are only 
    three events that can cause the reference count to change (those
    being "allocate from free list == refcount = 1", "clone node
    refcount = split incomings, and  "merge with another node 
    == refcount1 + refcount2" then the refcounts can be maintained at 
    relatively low cost.
  * we can run a "classic" mark and sweep.  However, there's a 
    better way...
  * we could add back pointers in each node.  Unfortunately, that
    means the size of a node could grow rather large.  There is
    still a better way....
  * We could add chaining pointers on the _output_ of each seen
    slot.  The "to" node gets a single slot added, to point to
    the first node that can transfer in; each such node points
    to the next node that can also transfer to the target node.  
    Preferably, the insertions are done in order so the list 
    remains sorted.
  When learning a new text, we also compute the running FIR prior
  floating point value, and on each node we encounter in our path,
  we update the local prior state FIR value according to the
  weighted average of the new text (with weight 1) and the prior
  texts (with weight = total times this node has been visited in
  the past = sum of the 0-seen and 1-seen output values ).  For
  this reason, node FIR prior encodings may change so it's necessary
  for the sorted FIR vector to contain not just the FIR values, but
  also backpointers to the FIR nodes that held them.
   Improvement: rather than a separate sorted list of FIR values, 
   embed the FIR value in the cell, along with the "next less"
   and "next greater" indices.  Thus, each cell contains within it
   pointers to it's most likely merge candidates.
   Secondary improvement - when allocating a new node from the
   free list, how does one efficiently link the new node into the
   next-less and next-greater lists?  Sequential search clearly
   works, but is inefficient.  Clearly a binary (or near-binary)
   tree would assist this but it's a programming hassle to
   maintain the two data structures in synchrony.  
   Alternatively, a fixed-size table of nodes at or close to
   various "thresholds" can be maintained; when a node changes
   it's FIR value, it only takes constant time to round the FIR to
   a table entry value and check to see if the new value is closer
   to the "perfect" value of FIR for that table entry versus the
   node that is currently in that slot.  If it's better (not just
   as good, "better", to minimize thrashing) then the new node
   index is written into the table; otherwise the old node stays.

Note that nothing prevents a node from being in the table more

   than once, nor is there any correctness "need" for the table to
   be exactly right.  Instead, the lookaside table is just a
   speedup that helps a lot most of the time.  When a node changes
   value, it will move (some) so incorrectness is part of the 
   game.  If you really want, you can move the node to a new 
   lookaside table slot, and then look at the previous next-less
   and next-greater nodes, and choose one of them to go into
   the table, replacing the (now moved) node.  (NB: in implementation,
   it turns out that the FIR lookaside table is *critical* for
   reasonable performance; any bug in that code that causes
   weak inital guesses will slow the classifier down by a factor
   of 10 or more.
  Futher Hacks and Improvements: resuming a CLASSIFY past NULL.
  If the classifier runs off the end of the known Markov chains,
  what do we do?  Just drop the following text?  Possibly - but
  there is an alternative.
  We can use the FIR arithmetical coding to resume the Markov
  chain at a near-optimal position.  The difference between where 
  we are and where we resume should be as small as possible;
  fortunately the FIR-next-least and FIR-next-greater indices in the
  current node give the two nearest side-steps at the current Markov
  node; we can then explore up and down the FIR chain to find a 
  preexisting node that actually has a further non-NULL chain.
  The entropy increase due to this side-step outside the Markov 
  chain is not entirely computable from the FIR values, because
  the prior information that's already rolled off the least significant
  bit of the FIR calculation.
  (note: this is currently the default action)
  Data storage format for bitwise Markov classifiers.  Note that
  we always use indices rather than pointers as we want the data
  structure to be stored and reloaded from disk via mmap.
  We have several "global static" values, which are stored in
  the meta-start node which is always node 0:
  • the index of the START node (should always be 1) - stored

in alphabet-zero slot.

  • the index of the node with the lowest FIR value - node 0 is
  • always “lower than the lowest”
  • the index of the node with the highest FIR value - node 0 is
  • also always the node “higher than the highest”.
  If we have a FIR lookaside speedup table, that also needs to
  have space allocated. 
  Each normal node contains a header containing:
  • the floating-point FIR value that codes the prior states
  • the index to the node with the next larger FIR value
  • the index to the node with the next smallest FIR value
  These FIR prior values are updated whenever a node is
  visited during learning.  This may mean minor motion up and 
  down the FIR chain.
  Each node also contains BITMARKOV_ALPHABET_SIZE slots; each slot
  contains the following (so there's one such slot for each
  member of the alphabet A).  There are two kinds of data stored in
  each slot; they are related but not interdependent.  The forward
  markov slot data is used during classification; the backward slots
  are used only during maintenance.
   Forward Markov slots:
  • how many times this alphabet member was seen as the input at

this node.

  • when it's seen, what the index of the next node to transition to

is (index, not pointer, for position-independent code)

   Backward slots for maintenance (clone and merge)   
  • index of the first incoming node that can transfer to this

node when that prior node saw this member of the alphabet A

      (note that this is NOT the alphabet member that the current
      node is seeing.  This just gives an array of
      backlinks for each member of the alphabet).  NULL if there's
      no known encounters of an incoming with the most recent prior
      for this alphabet member.
  • index iof the _next_ incoming node that can transfer to the

same outgoing node as this node. In combination with the

      prior index above, this forms a linked list of all of the
      nodes that can reach a particular node.  As we don't need
      to traverse this list in any particular order, it's strictly
      a "stack" (newest added at the front).
  Using 64-bit storage for the floating-point FIR values and 
  for the visit counts, and 32-bit storage for all indices, then
  assuming we're using a binary alphabet, we have:
  FIR prior float = 8 bytes
  FIR up and down links - 2 * 4 bytes = 8 bytes
  Slot within cell
  forward count - 8 bytes * 2  = 16 bytes
  forward index - 4 bytes * 2  = 8 bytes
  backward first - 4 bytes * 2 = 8 bytes
  backward next - 4 bytes * 2 = 8 bytes
  Total - 56 bytes storage per nodem, 18K nodes per megabyte.
  Note that the absolute minimum size is 24 bytes/node with no
  advanced features, so we're about a factor of two worse in 
  space, and for that we get near-linear in size for both learning
  and classification.  
  If we add in a 0.1% overhead size for the FIR prior lookaside
  accelleration table (containing 1 index per 4-byte word) then
  for every 4 bytes (1 index) of FIR prior lookaside there are
  4Kbytes (about 73) FIR value cells, so with the FIR prior
  lookaside and an assumption of roughly uniform FIR prior values,
  we need to traverse an average of 1/4 of 73 or about 18 markov
  chain cells.  With a 1.0% overhead (1 byte per 100 bytes, or 1
  index slot per 400 bytes) then there is 1 index slot per 7
  markov chain cells, and (again assuming a uniform distribution
  of FIR priors) an average of just 1.8 markov chain cells visited
  per FIR lookup.
  If the distribution of FIR values is too nonuniform, an adaptive
  grid method may be used to smooth it out.  But we'll climb that
  mountain when we come to it.
  Note to self: don't use the mathlib LOG function!  
  It's far too slow.  We'll custom craft our own that uses 
  linear interpolation over a lookup table.
   An interesting information-theoretic philosophical point - if
   we only have a million nodes in the database, then at most we
   encode 20 bits worth of state.  A 500-megabyte file like Andrej
   uses would have at most 62 million nodes (12 bytes/node not
   with short counts), or 41 million nodes = 27 bits to define
   state.  Since the FIR-prior value encodes at least 64 bits, can
   go to 96 bits (GCC 4.0 "long double") and possibly 128 (if your
   compiler supports "-mlong-double-128") shouldn't we be sticking
   with the FIR-prior and simply *ignoring* the actual links, as 
   they (and the state nodes) encode far less information?
   Submitted as an "ultimate" pare-down - the memory contains
   ONLY the times-previously-seen information.  It's a big array,
   indexed by the most significant N bits of the prior, with the
   high-order bit being the transition now being executed.  All
   that's there in the array is the number of times this transition
   has been executed in LEARN mode; everything else is embedded
   in the state (which _is_ the index).  
   That optimization would concieveably allow one to write an
   embedded form of bit-entropic classification in as little as
   two to four megabytes of space per class; in short, something
   that would fit on an embedded system.
     File Structure and Layout for Bit Entropy
 The file starts with a header block, of size 4 kybtes; the first
 four longwords give the indexes (in longwords, for 32-bit alignment)
 of the start and length of the FIR lookaside table and the node list.
      Per-node memory layout
  We have three pointers in the node slot; each of these is replicated
  for each member of the alphabet in the Markov graph.  
  *  The .nextcell says what the next cell is when we see this alph
  *  The .backfirst cell says what the first cell is that could 
        get us here when it saw an alph
  *  The .samenext cell says what the next cell that also gets you
        you to the same nextcell when you see the same alph.

Yes, this is slightly redundant, but it's very fast. (but note the

  approximation above that would save us all this memory!)
  A diagram here helps to understand the slotting structure
   .nextcell   >------------------->--> node
   .backfirst <---\     />------->/        .nextcell >---------->
   .samenext -|    \---------------------< .backfirst
              |       /
   node     <-|      /
     .nextcell  >---/
   Note- "backfirst" and "samenext" are redundant given the FIRlat
   system; we can always find referrers to the current node by
   looking at the possible origins - a node of FIR value 0.260 can
   only come from the alpha link of a node of FIR value 0.520 ; a
   node of value 0.779 can only have come from the 1-link of a FIR
   of .558 so we can just search the FIR around there.
bit_entropy.txt · Last modified: Y/m/d H:i:s O (T) by oopla
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki