Calculating an MD5 checksum of a files transmitted over a network is a pretty important for both security and integrity. Calculating an MD5 digest can be time and CPU intensive.
One solution to alleviate the time constraint, is to compute the MD5 as the file is received over a network. This works great, until you decide the transfer should be resumable. In fact, making the transfer resumable, makes computing the checksum even more valuable for integrity. Ruby provides a fairly straight forward API for calculating an MD5 in partial bytes. The only part that is missing is the ablity to serialize Ruby’s Digest::MD5 class.
Looking at the internal implementation, I decided it was easiest to just provide an alternative class. The main reason for doing this instead of extending the existing ruby class is because it switches between two different backends, either md5.c or openssl’s. Both are of comparable speed, so I decided to take the md5.c implementation because serializing it’s MD5_CTX structure is relatively easy. Here’s the structure:
typedef struct md5_state_s {
uint32_t count[2]; /* message length in bits, lsw first */
uint32_t state[4]; /* digest buffer */
uint8_t buffer[64]; /* accumulate block */
} MD5_CTX;
The only downside is the duplicated the code… I am fairly confident the md5.c code will not be changing too much over time… but it’s easy enough to keep up with any changes.
I added two serializations methods, save and restore. This allows an implementation to receive, compute, pause and resume file transfers while still calculating a small checksum per chunk of file received. This means, the load on the server can remain small whether a user is transmitting a large or small file. Here’s the gist of how it works:
hasher = Digest::MD5Partial.new
offset = 0
total = File.size(__FILE__)
until offset >= total do
buf = nil
File.open(__FILE__, 'rb') do|io|
io.seek(offset, IO::SEEK_SET)
buf = io.readpartial(buf_size)
hasher.update(buf)
end
# save the partial
File.open("partial", "wb") do|io|
str = hasher.save
io << str
end
# restore the partial
hasher.restore(File.read("partial"))
# advance the offset
offset += buf.size
end
from_partial = hasher.hexdigest
directly = Digest::MD5.hexdigest(File.read(__FILE__))
assert_equal directly, from_partial
Check it out: http://github.com/taf2/md5-partial/tree/master
Software
Curb provides ruby bindings for libcurl. Last year around this time, I decided to start hacking on curb by adding support for libcurl’s multi interface. At the time, I remember wanting to have an interface as similar to curl-multi as possible, but with the added benefit of being able to initialize requests using the features of Curl::Easy. The upside of this approach is any easy handle can be configured and dispatched through a Multi handle in parallel. The downside is the interface can be complicated for simple use cases. To simplify the interface, I added a more direct method for sending multiple concurrent requests. Here’s how the new interface works:
Curl::Multi.get('http://www.google.com/',
'http://www.yahoo.com/',
'http://www.msn.com/') do|easy|
puts easy.header_str
end
Now that’s great you issue multiple GET requests using a default easy handle configuration. As each request is completed, it yield’s passing the easy handle to the block.
The get method also can be passed 2 additional Hash arguments that configure each easy handle and the multi handle. For example,
Curl::Multi.get('http://www.google.com/',
'http://www.yahoo.com/',
'http://www.msn.com/',
{:follow_location => true},
{:pipeline => true}) do|easy|
puts easy.header_str
end
You can try it out using my github branch or stay tuned I should have a new release ready for rubyforge soon.
Also, I thought it would be nice to reflect on many of the changes collected over the course of the year…
All of that and we have a new release: 0.4.2 on rubyforge.org.
gem install curb
Software
Just noticed the new icons in Firefox 3.5 are slightly more saturated – and included in the icon bundle is a very large version. Check it out see the difference:
Firefox 3.0.x
Firefox 3.5

And this is on top of an already amazing job on the updated browser.
Software
We’ve been busy and now we have an updated Virtual Conference Center. It includes all the video footage from the World Congress 2009 SAE conference and the Hybrid SAE 2009 conference. As more SAE conferences take place we’ll be publishing the new video footage – so stay tuned! If you’re interested in the Automotive industry and how it works and the different aspects of innovation going on in the field these video’s are definitely worth the money. There are preview videos for most of the sections hybrid being the one exception.
We integrated the purchase flow into SAE’s store – so be prepared to either sign up as an SAE member or register for access to their content.
The coverflow really makes it easy to navigate the video’s. Jon did an awesome job of integrating it into the site.

You can visist the Virtual Conference Center at http://vcc-sae.org/ .
Software Anerian, Consulting, Video
bsdiff and bspatch are great little tools for creating patches of binary files. I used them for the updater in SimoHealth and I believe firefox and chromium use them to deliver application updates. I’m thinking they may be very useful for backups and archiving. I extracted out the bsdiff and bspatch binaries into an easy to use ruby interface. For now the ruby interface is exactly the same interface as the command line counterparts meaning all patching and diffing is done via files. E.g.
bsdiff oldfile newfile patchfile
in ruby would be:
BSDiff.diff('oldfilepath', 'newfilepath', 'patchfilepath')
and patching would be:
bspatch oldfile newfile patchfile
in ruby would be:
BSDiff.patch('oldfilepath', 'newfilepath', 'patchfilepath')
Software bsdiff, bspatch, extensions, Ruby
Just released a new version of rbtagger gem. It’s much easier to use as I now include the Brown Corpus and Lexicon in the gem. This means to create the tagger using the default Corpus no arguments are required.
tagger = Brill::Tagger.new
tagger.tag("some body of text")
To install:
gem install rbtagger
Software Gem, Ruby, Tagging
In rails environment the following works:
"àáâãäå".mb_chars.normalize(:kd).gsub(/[^\x00-\x7F]/n,'').downcase.to_s
Outside of rails you might need to wrap your strings explicitly in a ActiveSupport::Multibyte::Chars object. The following for example:
ActiveSupport::Multibyte::Chars.new("àáâãäå").mb_chars.normalize(:kd).gsub(/[^\x00-\x7F]/n,'').downcase.to_s
Software ActiveSupport, Ruby, Text Encoding
Recent Comments