Fr�d�ric Kayser sent me the link to this FastAC page with a bunch of papers and source code by Amir Said. It's pretty good stuff, I thought I'd give a synposis of what's in there :
The coder is almost a vanilla Michael Schindler style "range coder". There are a few little tweaks in there that differ from the standard implementation.
1. Instead of a "virtual queue" for the bytes that might carry, he goes ahead and just writes them out, and then if a carry is detected he goes back through the output bytes and increments them while they're 0xFF. This violates one of the rules of standard arithmetic coder design, which is that you never touch output bytes after you stream them out, so that they can be transmitted or written to disk or whatever. Now of course if you're just outputting to a big buffer in memory his way is totally fine, and it appears to be faster than bothering with all the variables to account for the virtual queue.
2. He actually uses the full 32 bits for range instead of 31 bits like in the standard Schindler. We use 31 bits so that we can see the carry in the top bit. Said has a neat little trick for detecting the carry :
U32 old_low = range_low; range = range_high - range_low; range_high = range_low + symhigh * (range / symtot); range_low += symlow * (range / symtot); if ( range_low < old_low ) // carry(This maybe could be even faster if there were an elegant way to directly get the carry flag in C). Using the full 32 bits is nice because it makes you byte aligned and eliminates some little ugly bit junk you have to do with shifting bytes 7 and 1 to get that last bit right. This lets you avoid a LOT of shift and mask work which is a big win.
Now, as for the models, his "Adapative_Data_Model" is 100% exactly my Deferred Summation model from DCC96. You can read about the original DefSum in my papers section or get the ancient code in the statistical coders section. From what I can tell there's nothing new or interesting in the modeling, it's 100% just deferred summation to get you power of two CumProbMax so you can turn your division into a shift, he uses my fast decode table, etc.
Anyway, the implementation is pretty good and the source code is pretty clean, so it's a decent place to start. However, I don't recommend actually using it as is. The models are retarded and the API interface is very bad.
The reason why "Deferred Summation" never amounted to much is that arithmetic coding a big hunk of symbols with order-0 modeling is a retarded thing to do. You just never actually want to do that. If you're doing arithmetic coding, you're also doing higher order modeling or LZ coding, or something, so that you don't just have static statistics. Providing API's that are locked to these silly models is lame. The correct API interfaces for an arithmetic coder are the ones I have in "arithc" , just
Encode( symlow, symfreq, symtot ); EncodeTotIsPow2( symlow, symfreq, symtotshift );
The binary models are also silly. He does deferred summation for the bit model as well, which is not necessary. You can either do the implicitly rescaling fixed-tot bit model that I just wrote about recently, or you can use a rung-ladder model like I have in crblib which uses a table to find power of 2 probabilities from a state.
Said's codec is in fact very fast. It's almost 10% faster that any arithmetic coder I've ever seen. That's pretty damn good. I'm going to modify it a bit to provide what I think are reasonable API's and post it.
Okay here it is : cb_Arithmetic_Codec.zip ; a tiny .h with a modified version of FastAC. This is pretty rough, I tried to modify the original FastAC code as little as possible.