Archive

Posts Tagged ‘C++’

C++ namespace’s and externally global variables

November 20th, 2008

I ran across an interesting segmentation fault caused by trying to access a NULL pointer, that I thought should not be. Here’s what I had in my main source file:


Siphon::Logger *glogger = NULL;

...

int main( int argc, char **argv )
{
 ...
 glogger = new Siphon::Logger(); // initialize the global

}

Now in another source file;


namespace Siphon {

extern Siphon::Logger *glogger = NULL;
...

}

Later I realized this caused a warning, but initially, I just ran my code only to receive a segmentation fault when using the glogger in the second source file. Because, of the extern the compiler allowed this to slip by thinking it was Siphon::Siphon::Logger, instead of Siphon::Logger. To correct this, I could either move the line out of the namespace or remove the Siphon:: prefix.


extern Siphon::Logger *glogger;

namespace Siphon {

...

}

Software

ruby extension memory leak tracking

May 14th, 2008

Using a valgrind and a nice patch for ruby 1.8.6, I now have rb-brill-tagger leak free.

Here’s the process I went through. First off you need a linux environment to run valgrind. If you don’t already have one setup, I recommend fedora core. It’s super easy to run and has a pretty good track record with hardward. Also, yum is super easy to use to install new software.

First I svn co http://svn.ruby-lang.org/repos/ruby/branches/ruby_1_8_6

Then get the patch:

wget http://fauna.rubyforge.org/svn/bleak_house/trunk/ruby/valgrind.patch

Update patch now available here.

Apply the patch:

patch -p0 < valgrind.patch

Build ruby:

autoconf && ./configure --prefix=$HOME/work/ruby-valgrind && make && make install

Setup the new ruby environment:

 export PATH=$HOME/work/ruby-valgrind/bin:$PATH

Verify you have the correct ruby:

which ruby

Install rubygems:

wget http://rubyforge.org/frs/download.php/35283/rubygems-1.1.1.tgztar -zxf rubygems-1.1.1.tgzcd rubygems-1.1.1ruby setup.rb install

Verify the rubygems install:

which gem

Install Rake:

gem install rake

Checking out rb-brill-tagger:

git clone git://github.com/taf2/rb-brill-tagger.gitcd rb-brill-taggerrake

Running valgrind:

valgrind --leak-check=full ruby test/tagger_test.rb

Valgrind will take much longer to run but once the process has finished you should get some output similar to this:

rb-brill-tagger> valgrind --leak-check=full ruby test/tagger_test.rb
==17160== Memcheck, a memory error detector.
==17160== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al.
==17160== Using LibVEX rev 1804, a library for dynamic binary translation.
==17160== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
==17160== Using valgrind-3.3.0, a dynamic binary instrumentation framework.
==17160== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al.
==17160== For more details, rerun with: -v
==17160==
loading tagger...
tagger loaded!
Loaded suite test/tagger_test
Started
time: 75.484142 sec 0.132478156802789 docs/sec
..
Finished in 76.943314 seconds.

2 tests, 1 assertions, 0 failures, 0 errors
==17160==
==17160== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 25 from 1)
==17160== malloc/free: in use at exit: 9,666,639 bytes in 176,968 blocks.
==17160== malloc/free: 1,897,877 allocs, 1,720,909 frees, 48,816,043 bytes allocated.
==17160== For counts of detected errors, rerun with: -v
==17160== searching for pointers to 176,968 not-freed blocks.
==17160== checked 7,364,900 bytes.
==17160==
==17160== 16 bytes in 1 blocks are definitely lost in loss record 1 of 17
==17160==    at 0x4022828: malloc (vg_replace_malloc.c:207)
==17160==    by 0x8070851: ruby_xmalloc (gc.c:114)
==17160==    by 0x808C95B: local_append (parse.y:5640)
==17160==    by 0x808CC8F: special_local_set (parse.y:6228)
==17160==    by 0x80A4E58: rb_reg_search (re.c:946)
==17160==    by 0x80B8B22: rb_str_split_m (string.c:3559)
==17160==    by 0x8055885: call_cfunc (eval.c:5700)
==17160==    by 0x805E0D1: rb_call0 (eval.c:5856)
==17160==    by 0x805ECC0: rb_call (eval.c:6103)
==17160==    by 0x805C733: rb_eval (eval.c:3479)
==17160==    by 0x805C65E: rb_eval (eval.c:3473)
==17160==    by 0x805AE4C: rb_eval (eval.c:3689)
==17160==
==17160==
==17160== 26 bytes in 9 blocks are definitely lost in loss record 2 of 17
==17160==    at 0x4022828: malloc (vg_replace_malloc.c:207)
==17160==    by 0x40FA1BF: strdup (in /lib/libc-2.6.so)
==17160==    by 0x402BD41: tagger_context_add_to_lexicon (tagger.c:71)
==17160==    by 0x402BBC8: BrillTagger_add_to_lexicon (rbtagger.c:26)
==17160==    by 0x8055859: call_cfunc (eval.c:5709)
==17160==    by 0x805E0D1: rb_call0 (eval.c:5856)
==17160==    by 0x805ECC0: rb_call (eval.c:6103)
==17160==    by 0x805C733: rb_eval (eval.c:3479)
==17160==    by 0x805C8E6: rb_eval (eval.c:3133)
==17160==    by 0x805E8BE: rb_call0 (eval.c:6007)
==17160==    by 0x805ECC0: rb_call (eval.c:6103)
==17160==    by 0x805C733: rb_eval (eval.c:3479)
==17160==
==17160==
==17160== 896 bytes in 30 blocks are possibly lost in loss record 13 of 17
==17160==    at 0x4022828: malloc (vg_replace_malloc.c:207)
==17160==    by 0x8070851: ruby_xmalloc (gc.c:114)
==17160==    by 0x8054725: scope_dup (eval.c:8211)
==17160==    by 0x805A07F: rb_yield_0 (eval.c:5078)
==17160==    by 0x806353E: proc_invoke (eval.c:8622)
==17160==    by 0x805E0D1: rb_call0 (eval.c:5856)
==17160==    by 0x805ECC0: rb_call (eval.c:6103)
==17160==    by 0x805C733: rb_eval (eval.c:3479)
==17160==    by 0x8059E97: rb_yield_0 (eval.c:5027)
==17160==    by 0x805A720: rb_yield (eval.c:5111)
==17160==    by 0x80C6674: rb_ary_each (array.c:1138)
==17160==    by 0x805E0D1: rb_call0 (eval.c:5856)
==17160==
==17160== LEAK SUMMARY:
==17160==    definitely lost: 42 bytes in 10 blocks.
==17160==      possibly lost: 896 bytes in 30 blocks.
==17160==    still reachable: 9,665,701 bytes in 176,928 blocks.
==17160==         suppressed: 0 bytes in 0 blocks.
==17160== Reachable blocks (those to which a pointer was found) are not shown.
==17160== To see them, rerun with: --leak-check=full --show-reachable=yes

Software , ,

evdispatch 0.2.6

April 21st, 2008

Here it is version 0.2.6.

This version fixes a bug when sending an HTTP POST on apple/darwin Mac OS.

curl_easy_setopt( m_handle, CURLOPT_POST, 1 );
// set the buffer size to copy
curl_easy_setopt( m_handle, CURLOPT_POSTFIELDSIZE, value.length() );
curl_easy_setopt( m_handle, CURLOPT_POSTFIELDS, value.c_str() );
// copy the buffer
curl_easy_setopt( m_handle, CURLOPT_COPYPOSTFIELDS, value.c_str() );

I had to set the CURLOPT_POSTFIELDS before calling CURLOPT_COPYPOSTFIELDS.

In my next release I hope to have support for setting arbitrary HTTP headers. I’ll also be working on a streaming response interface. In my C++ library I already have a working example that lets the response from the event loop be written directly to a file descriptor. To expose this in Ruby I would make it so any IO object can be passed into the request method via a :stream => io parameter. This only posses one interesting issue that the Ruby StringIO could be valid to pass, but my implementation would not be able to pull a file descriptor from the StringIO object. Perhaps, when it’s an IO object without a file descriptor my implementation could create a pipe fd on behave of the caller?

Software , , ,

evdispatch

April 11th, 2008

If you have ever had a web site that required a large number of service requests from a single user request then you might find evdispatch useful. evdispatch makes it very easy to send off multiple http requests and check back later for their status. Lets say your site has a number of different feeds for example that it’s aggregating together. You’ll definitely want to cache these actions, but for the times when the pages are uncached you’ll surely want to make sure they get built quickly. The service responses very fast so it’s not the bottleneck. In your case the bottleneck is the fact that from ruby it’s very difficult to efficiently run multiple concurrent requests. evdispatch might be exactly what you need. The way it works is you send off all your requests ahead of time. Then after doing some processing you stop your ruby process as normal when you need to get the request, unless it’s already responded in which case it returns immediately.

This will probably be easier to follow using an example:
Creating a feed aggregator, using google news feeds. (Note: google will rate limit you, so if you plan to do this make sure you cache).

First install the evdispatch gem:

sudo gem install evdispatch

Set up your rails app initializer:

require 'evdispatch'

$dispatcher = Evdispatch::Loop.new

# startup a dispatcher for this rails app
$dispatcher.start

Add hpricot to your config/environment.rb

require 'hpricot'

Create a new action on your controller:

class DashController < ApplicationController

  def index
    @timer = Time.now
    @top_news_id = $dispatcher.request_http("http://news.google.com/news?ned=us&topic=h&output=rss")
    @world_id = $dispatcher.request_http("http://news.google.com/news?ned=us&topic=w&output=rss")
    @us_id = $dispatcher.request_http("http://news.google.com/news?ned=us&topic=n&output=rss")
    @health_id = $dispatcher.request_http("http://news.google.com/news?ned=us&topic=m&output=rss")
    @sports_id = $dispatcher.request_http("http://news.google.com/news?ned=us&topic=s&output=rss")
  end
end

Add a helper method to your dash_helper.rb:

module DashHelper
  def display_feed(id)
    res = $dispatcher.response(id)
    doc = Hpricot.XML(res[:body])
    items = []
    titles = []
    (doc/'title').each do|t|
      titles << t.inner_html
      break
    end
    (doc/'item').each do|item|
      items << "<a href="#{(item/">#{(item/'title').inner_html}</a>"
    end
    "

#{titles.first}

#{items}response time: #{res[:response_time]} seconds" end end

Create the view:

All the latests

  • <%= display_feed(@top_news_id) %>
  • <%= display_feed(@world_id) %>
  • <%= display_feed(@us_id) %>
  • <%= display_feed(@health_id) %>
  • <%= display_feed(@sports_id) %>
Page render time <%= Time.now - @timer %> seconds

Software , , , ,

ruby 1.9 and valgrind support

January 6th, 2008

This evening I decided to stay in and take a look at valgrind with ruby 1.9. Turns out, there is now a compile option to build ruby 1.9 to be valgrind friendly by using the macros defined in valgrind/memcheck.h

./configure --with-valgrind --prefix=/home/taf2/project/mongrel-esi/trunk/ruby19-test/ && make

I think I might work on patching ruby 1.8.6 with these macros later to get a better sense for the memory usage with the mongrel-esi parser.

Update:

Evan has the patch here. As well as a great tutorial for how to use it.

Software , ,

[Mongrel ESI] Ragel Parser, and more!

January 4th, 2008

I’ve been busy this new years. I was bitten by a bug to improve mongrel esi. First, I set down to finally master ragel. I initially, implemented the ragel parser using ruby. Then after some performance tests discovered while it had improved the performance stability it had actually reduced the average performance. It really wasn’t too difficult, once I had the parser written and working in ruby to convert it into C, which today I can finally say is complete and all tests are again passing. With the new C ragel implementation I am seeing about a 2x improvement in raw performance. My methods for measuring performance have been largely based on ab (apache benchmark).

Today I spent time to really understand how mongrel_rails works and in doing so was able to rework the servers configuration so that it can take a simple ruby script or yaml file, but all configuration options are by default passed via the command line. Here’s how the configuration works now:

ESI::Config.define(listeners) do|config|

  # define the caching rules globally for all routes, defaults to ruby
  config.cache do|c|
    #c.memcached do|mc|
    #  mc.servers = ['localhost:11211']
    #  mc.debug = false
    #  mc.namespace = 'mesi'
    #  mc.readonly = false
    #end
    c.ttl = 600
  end

  # define rules for when to enable esi processing globally for all routes
  config.esi do|c|
    c.allowed_content_types = ['text/plain', 'text/html']
    #c.enable_for_surrogate_only = true # default is false
  end

  # define request path routing rules
  config.routes do|s|
    #s.match( /content/ ) do|r|
    #  r.servers = ['127.0.0.1:4000']
    #end
    s.default do|r|
      r.servers = ['127.0.0.1:3000']
    end
  end

end

I’ve been posting new gems to http://mongrel-esi.googlecode.com/files/mongrel_esi-0.4.0.gem

Software , , , ,

valgrind and ruby: developing a ruby c extension

January 2nd, 2008

I did a fair bit of work over the holiday on mongrel-esi. As part of that work I rework the parser in C using ragel. I always try to run my code through valgrind to help catch memory leaks and errors in my pointer arithmatic early.

The ragel parser is call back driven and can accept a variable sized segment of the document. Being able to read in variable sized chunks was very important, because it means the server can be implemented using
. The advantage of asynchronous I/O or multiplexed I/O in this case; is that while the kernel is waiting on the network the user app can be busy processing markup and even queing up more requests. This is really nice, because it means the server is doing multiple tasks simultaneously, without creating full threads or processses. Getting the parser built to support this variabled sized input was tricky, so I first focused on just the parser component. Ragel really saved me a lot of time, once I started to understand how to use it.

Here’s the ESI C Parser API I came up with:

/* create a new Edge Side Include Parser */
ESIParser *esi_parser_new();
void esi_parser_free( ESIParser *parser );
/* initialize the parser */
int esi_parser_init( ESIParser *parser );
/*
 * send a chunk of data to the parser, the internal parser state is returned
 */
int esi_parser_execute( ESIParser *parser, const char *data, size_t length );
/*
 * let the parser no that it has reached the end and it should flush any remaining data to the desired output device
 */
int esi_parser_finish( ESIParser *parser );
/*
 * setup a callback to execute when a new esi: start tag is encountered
 * this is will fire for all block tags e.g. ,  and also
 * inline tags  
 */
void esi_parser_start_tag_handler( ESIParser *parser, start_tag_cb callback );
void esi_parser_end_tag_handler( ESIParser *parser, end_tag_cb callback );
/* setup a callback to recieve data ready for output */
void esi_parser_output_handler( ESIParser *parser, output_cb output_handler );

I developed a fairly simple set of tests to verify the accuracy of the implmentation. Using valgrind with the –leak-check=full option I was able to measure the number of memory allocations and verify no memory would be lost.

  valgrind --leak-check=full ./testit

Once I was statisfied that the parser core was working, I started to implement the Ruby binding. I started by following this tutorial as well as referring to many other documents and sources.
One of the first things I decided to verify with my ruby binding was whether in glueing my C implementation to the Ruby runtime I was leaking any memory. As with the pure C implemenation, I decided to run my extension through valgrind.

  valgrind -leak-check=full ruby test1.rb

My initial test was this:

require 'esi'

output = ""
p = ESI::CParser.new

p.start_tag_handler do|tag_name, attrs|
  puts "Start: #{tag_name} #{attrs.inspect}"
end

p.end_tag_handler do|tag_name|
  puts "End: #{tag_name}"
end

p.output_handler do|data|
  output << data
end

p.process "<html><head><body><esi:include timeout='1' max-age='600+600' src=\"hello\"/>some more input"
p.process "some input<esi:include \nsrc='hello'/>some more input\nsome input<esi:include src=\"hello\"/>some more input"
p.process "some input<esi:inline src='hello'/>some more input\nsome input<esi:comment text='hello'/>some more input"
p.process "<p>some input</p><esi:include src='hello'/>some more input\nsome input<esi:include src='hello'/>some more input"
p.process "</body></html>"
p.finish

expected = %Q(<html><head><body>some more inputsome inputsome more input
some inputsome more inputsome inputsome more input
some inputsome more input<p>some input</p>some more input
some inputsome more input</body></html>)

if( expected !=  output )
  puts "Failed output was different from the expected"
  puts "Expected: #{expected}"
  puts "\n"
  puts "Actual: #{output}"
  exit(1)
end
GC.start

This is really a pretty simple test, that just ensures the callbacks are all working and that the parser data emitted excludes any esi tags.

The results I got from running this through valgrind, however were very disturbing. Not only at the end is valgrind reporting memory leaked, but nearly 4211 errors along the way.
The majority of these errors are the “Use of uninitialised value of size 4″ and “Conditional jump or move depends on uninitialised value(s)”.

I finally decided to figure out what was causing this. First to get ruby built with debugging symbols enabled. I downloaded the latest stable CVS snapshot, feeling optimistic in case I spot something and can send in a patch.

CFLAGS=-g ./configure --prefix=$HOME/project/ruby-stable && make && make install

Rerunning my ruby script through valgrind:

 valgrind --leak-check=full --num-callers=24 ~/project/ruby-stable/bin/ruby test1.rb

Now the first error I see reported from valgrind looks like this:

==9911== Conditional jump or move depends on uninitialised value(s)
==9911==    at 0x807305F: is_pointer_to_heap (gc.c:609)
==9911==    by 0x8073023: mark_locations_array (gc.c:629)
==9911==    by 0x80743B7: garbage_collect (gc.c:1367)
==9911==    by 0x8074467: rb_gc (gc.c:1423)
==9911==    by 0x8074479: rb_gc_start (gc.c:1440)
==9911==    by 0x805F95B: call_cfunc (eval.c:5704)
==9911==    by 0x805EEAF: rb_call0 (eval.c:5857)
==9911==    by 0x8060461: rb_call (eval.c:6104)
==9911==    by 0x8059278: rb_eval (eval.c:3482)
==9911==    by 0x805467F: eval_node (eval.c:1434)
==9911==    by 0x8054C61: ruby_exec_internal (eval.c:1640)
==9911==    by 0x8054CA5: ruby_exec (eval.c:1660)
==9911==    by 0x8054CC7: ruby_run (eval.c:1670)
==9911==    by 0x8052B2D: main (main.c:48)

This takes me to the function is_pointer_to_heap in gc.c.

static inline int
is_pointer_to_heap(ptr)
    void *ptr;
{
    register RVALUE *p = RANY(ptr);
    register RVALUE *heap_org;
    register long i;

    if (p < lomem || p > himem) return Qfalse;
    if ((VALUE)p % sizeof(RVALUE) != 0) return Qfalse;

    /* check if p looks like a pointer */
    for (i=0; i < heaps_used; i++) {
      heap_org = heaps[i].slot;
      if (heap_org <= p && p < heap_org + heaps[i].limit)
        return Qtrue;
    }
    return Qfalse;
}

It should pretty obvious from looking at that code why valgrind would report “Conditional jump or move depends on uninitialised value”. The highlighted condition above is testing to make sure the memory is really within the heap allocated by ruby, by comparing the address of p to the lower heap address and upper heap address. I am not certain, but failry sure that the lomem and himem values must be the upper and lower bounds on a preallocated block of memory ruby allocates. This would mean it’s safe to test p in this context. I still have the question and concern of why p would be uninitialized in the first place….

There are more errors being reported besides this and I hope to follow up with those next.

Software , ,