Home > Software > MD5 Partial Calculations – save and restore the calculation of large files

MD5 Partial Calculations – save and restore the calculation of large files

June 28th, 2009

Calculating an MD5 checksum of a files transmitted over a network is a pretty important for both security and integrity. Calculating an MD5 digest can be time and CPU intensive.

One solution to alleviate the time constraint, is to compute the MD5 as the file is received over a network. This works great, until you decide the transfer should be resumable. In fact, making the transfer resumable, makes computing the checksum even more valuable for integrity. Ruby provides a fairly straight forward API for calculating an MD5 in partial bytes. The only part that is missing is the ablity to serialize Ruby’s Digest::MD5 class.

Looking at the internal implementation, I decided it was easiest to just provide an alternative class. The main reason for doing this instead of extending the existing ruby class is because it switches between two different backends, either md5.c or openssl’s. Both are of comparable speed, so I decided to take the md5.c implementation because serializing it’s MD5_CTX structure is relatively easy. Here’s the structure:

typedef struct md5_state_s {
  uint32_t count[2];  /* message length in bits, lsw first */
  uint32_t state[4];  /* digest buffer */
  uint8_t buffer[64]; /* accumulate block */
} MD5_CTX;

The only downside is the duplicated the code… I am fairly confident the md5.c code will not be changing too much over time… but it’s easy enough to keep up with any changes.

I added two serializations methods, save and restore. This allows an implementation to receive, compute, pause and resume file transfers while still calculating a small checksum per chunk of file received. This means, the load on the server can remain small whether a user is transmitting a large or small file. Here’s the gist of how it works:

      hasher = Digest::MD5Partial.new
      offset = 0
      total = File.size(__FILE__)

      until offset >= total do
        buf = nil
        File.open(__FILE__, 'rb') do|io|
          io.seek(offset, IO::SEEK_SET)
          buf = io.readpartial(buf_size)
          hasher.update(buf)
        end

        # save the partial
        File.open("partial", "wb") do|io|
          str = hasher.save
          io << str
        end

        # restore the partial
        hasher.restore(File.read("partial"))

        # advance the offset
        offset += buf.size
      end

      from_partial = hasher.hexdigest
      directly = Digest::MD5.hexdigest(File.read(__FILE__))
      assert_equal directly, from_partial

Check it out: http://github.com/taf2/md5-partial/tree/master

			

Software

  1. No comments yet.
  1. No trackbacks yet.
CommentLuv Enabled

Comments links could be nofollow free.