I did a fair bit of work over the holiday on mongrel-esi. As part of that work I rework the parser in C using ragel. I always try to run my code through valgrind to help catch memory leaks and errors in my pointer arithmatic early.
The ragel parser is call back driven and can accept a variable sized segment of the document. Being able to read in variable sized chunks was very important, because it means the server can be implemented using
. The advantage of asynchronous I/O or multiplexed I/O in this case; is that while the kernel is waiting on the network the user app can be busy processing markup and even queing up more requests. This is really nice, because it means the server is doing multiple tasks simultaneously, without creating full threads or processses. Getting the parser built to support this variabled sized input was tricky, so I first focused on just the parser component. Ragel really saved me a lot of time, once I started to understand how to use it.
Here’s the ESI C Parser API I came up with:
/* create a new Edge Side Include Parser */
ESIParser *esi_parser_new();
void esi_parser_free( ESIParser *parser );
/* initialize the parser */
int esi_parser_init( ESIParser *parser );
/*
* send a chunk of data to the parser, the internal parser state is returned
*/
int esi_parser_execute( ESIParser *parser, const char *data, size_t length );
/*
* let the parser no that it has reached the end and it should flush any remaining data to the desired output device
*/
int esi_parser_finish( ESIParser *parser );
/*
* setup a callback to execute when a new esi: start tag is encountered
* this is will fire for all block tags e.g. , and also
* inline tags
*/
void esi_parser_start_tag_handler( ESIParser *parser, start_tag_cb callback );
void esi_parser_end_tag_handler( ESIParser *parser, end_tag_cb callback );
/* setup a callback to recieve data ready for output */
void esi_parser_output_handler( ESIParser *parser, output_cb output_handler );
I developed a fairly simple set of tests to verify the accuracy of the implmentation. Using valgrind with the –leak-check=full option I was able to measure the number of memory allocations and verify no memory would be lost.
valgrind --leak-check=full ./testit
Once I was statisfied that the parser core was working, I started to implement the Ruby binding. I started by following this tutorial as well as referring to many other documents and sources.
One of the first things I decided to verify with my ruby binding was whether in glueing my C implementation to the Ruby runtime I was leaking any memory. As with the pure C implemenation, I decided to run my extension through valgrind.
valgrind -leak-check=full ruby test1.rb
My initial test was this:
require 'esi'
output = ""
p = ESI::CParser.new
p.start_tag_handler do|tag_name, attrs|
puts "Start: #{tag_name} #{attrs.inspect}"
end
p.end_tag_handler do|tag_name|
puts "End: #{tag_name}"
end
p.output_handler do|data|
output << data
end
p.process "<html><head><body><esi:include timeout='1' max-age='600+600' src=\"hello\"/>some more input"
p.process "some input<esi:include \nsrc='hello'/>some more input\nsome input<esi:include src=\"hello\"/>some more input"
p.process "some input<esi:inline src='hello'/>some more input\nsome input<esi:comment text='hello'/>some more input"
p.process "<p>some input</p><esi:include src='hello'/>some more input\nsome input<esi:include src='hello'/>some more input"
p.process "</body></html>"
p.finish
expected = %Q(<html><head><body>some more inputsome inputsome more input
some inputsome more inputsome inputsome more input
some inputsome more input<p>some input</p>some more input
some inputsome more input</body></html>)
if( expected != output )
puts "Failed output was different from the expected"
puts "Expected: #{expected}"
puts "\n"
puts "Actual: #{output}"
exit(1)
end
GC.start
This is really a pretty simple test, that just ensures the callbacks are all working and that the parser data emitted excludes any esi tags.
The results I got from running this through valgrind, however were very disturbing. Not only at the end is valgrind reporting memory leaked, but nearly 4211 errors along the way.
The majority of these errors are the “Use of uninitialised value of size 4″ and “Conditional jump or move depends on uninitialised value(s)”.
I finally decided to figure out what was causing this. First to get ruby built with debugging symbols enabled. I downloaded the latest stable CVS snapshot, feeling optimistic in case I spot something and can send in a patch.
CFLAGS=-g ./configure --prefix=$HOME/project/ruby-stable && make && make install
Rerunning my ruby script through valgrind:
valgrind --leak-check=full --num-callers=24 ~/project/ruby-stable/bin/ruby test1.rb
Now the first error I see reported from valgrind looks like this:
==9911== Conditional jump or move depends on uninitialised value(s)
==9911== at 0x807305F: is_pointer_to_heap (gc.c:609)
==9911== by 0x8073023: mark_locations_array (gc.c:629)
==9911== by 0x80743B7: garbage_collect (gc.c:1367)
==9911== by 0x8074467: rb_gc (gc.c:1423)
==9911== by 0x8074479: rb_gc_start (gc.c:1440)
==9911== by 0x805F95B: call_cfunc (eval.c:5704)
==9911== by 0x805EEAF: rb_call0 (eval.c:5857)
==9911== by 0x8060461: rb_call (eval.c:6104)
==9911== by 0x8059278: rb_eval (eval.c:3482)
==9911== by 0x805467F: eval_node (eval.c:1434)
==9911== by 0x8054C61: ruby_exec_internal (eval.c:1640)
==9911== by 0x8054CA5: ruby_exec (eval.c:1660)
==9911== by 0x8054CC7: ruby_run (eval.c:1670)
==9911== by 0x8052B2D: main (main.c:48)
This takes me to the function is_pointer_to_heap in gc.c.
static inline int
is_pointer_to_heap(ptr)
void *ptr;
{
register RVALUE *p = RANY(ptr);
register RVALUE *heap_org;
register long i;
if (p < lomem || p > himem) return Qfalse;
if ((VALUE)p % sizeof(RVALUE) != 0) return Qfalse;
/* check if p looks like a pointer */
for (i=0; i < heaps_used; i++) {
heap_org = heaps[i].slot;
if (heap_org <= p && p < heap_org + heaps[i].limit)
return Qtrue;
}
return Qfalse;
}
It should pretty obvious from looking at that code why valgrind would report “Conditional jump or move depends on uninitialised value”. The highlighted condition above is testing to make sure the memory is really within the heap allocated by ruby, by comparing the address of p to the lower heap address and upper heap address. I am not certain, but failry sure that the lomem and himem values must be the upper and lower bounds on a preallocated block of memory ruby allocates. This would mean it’s safe to test p in this context. I still have the question and concern of why p would be uninitialized in the first place….
There are more errors being reported besides this and I hope to follow up with those next.
Software C++, Ruby, Valgrind
Recent Comments