<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Idle Hacking &#187; libxml</title>
	<atom:link href="http://www.idle-hacking.com/tag/libxml/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.idle-hacking.com</link>
	<description>Ruby, XUL/Javascript, C/C++, and more...</description>
	<lastBuildDate>Tue, 11 May 2010 02:15:22 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>a little ferret&#8217;ing</title>
		<link>http://www.idle-hacking.com/2006/09/a-little-ferreting/</link>
		<comments>http://www.idle-hacking.com/2006/09/a-little-ferreting/#comments</comments>
		<pubDate>Thu, 28 Sep 2006 05:32:00 +0000</pubDate>
		<dc:creator>taf2</dc:creator>
				<category><![CDATA[Software]]></category>
		<category><![CDATA[Ferret]]></category>
		<category><![CDATA[libxml]]></category>
		<category><![CDATA[Ruby]]></category>

		<guid isPermaLink="false">http://www.idle-hacking.com/2006/09/a-little-ferreting/</guid>
		<description><![CDATA[I was trying to find some information tonight in an RFC and realized I didn&#8217;t have any good way of searching the RFC documents. I still don&#8217;t have a very good solution, but I had a little fun trying to make one&#8230; First thing I did was grab all the RFC&#8217;s (275M) wget --passive-ftp -r [...]]]></description>
			<content:encoded><![CDATA[<p>I was trying to find some information tonight in an <a href="http://www.ietf.org/iesg/1rfc_index.txt">RFC</a> and realized I didn&#8217;t have any good way of searching the RFC documents.  I still don&#8217;t have a very good solution, but I had a little fun trying to make one&#8230;</p>
<p>First thing I did was grab all the RFC&#8217;s (275M)</p>
<pre lang="bash">wget --passive-ftp -r -l 1 ftp://ftp.isi.edu/in-notes/</pre>
<p>Mean while, as that downloaded I started reading up on a great little <a href="http://ferret.davebalmain.com/trac">ruby project Ferret</a>.  It&#8217;s actually a really impressive indexer, sporting a really easy to use interface and from the looks of it pretty extensible.  Anyways&#8230; looking around the rfc documents I noticed the rfc-index.xml.  Using <a href="http://libxml.rubyforge.org/">ruby-libxml</a> and it&#8217;s SaxParser, I was able to easily extract fields to describe each rfc like, title, author, date, etc&#8230;</p>
<p>Finally, once the index was built a few lines of ruby and searching is lighting fast and not too bad either&#8230;</p>
<p>Now to provide a web interface and some sensible ordering&#8230;</p>
<p>Anyways for those interested here&#8217;s the source files</p>
<p>The best part is probably that the indexer is 109 lines and the search is 11!</p>
<p>index.rb:</p>
<pre lang="ruby">#!/usr/bin/env ruby

require 'rubygems'
require 'xml/libxml'
require 'ferret'

include Ferret

RFC_PATH="ftp.isi.edu/in-notes"

class RfcEntry
  attr_accessor :doc_id, :title, :authors, :month, :year, :file

  def initialize
    self.authors = []
  end

  def update( index )
    # call the indexer here to add data to index
    file = doc_id.gsub(/RFC0*/,"rfc")

    file += ".txt"
    path = "#{RFC_PATH}/#{file}"
    if( File.exist?( path ) )

      puts "record: #{doc_id}, '#{title}' by #{authors} - #{month}, #{year} =&gt; #{file}"

      index &lt;&lt; {:id =&gt; doc_id, :title =&gt; title, :content =&gt; File.read(path),

      :authors =&gt; authors, :month =&gt; month, :year =&gt; year }

    end
  end
end

class RfcIndexParser
  # tag table
  TAGS = { "rfc-entry" =&gt; { :start =&gt; :start_entry, :end =&gt; :add_entry },

  "doc-id" =&gt; { :start =&gt; :collect, :end =&gt; :store_doc_id },

  "title" =&gt; { :start =&gt; :collect, :end =&gt; :store_title },

  "name" =&gt; { :start =&gt; :collect, :end =&gt; :store_author },

  "month" =&gt; { :start =&gt; :collect, :end =&gt; :store_month },

  "year" =&gt; { :start =&gt; :collect, :end =&gt; :store_year } }

  # always have one entry
  def initialize( index ) # pass in the index
    @entry = nil

    @buffer = ""
    @index = index
  end

  def parse( rfc_index )
    parser = XML::SaxParser.new

    parser.filename = rfc_index
    parser.on_start_element {|name,attrs| self.on_start(name,attrs) }

    parser.on_end_element {|name| self.on_end(name) }

    parser.on_characters {|chars| self.on_chars(chars) }

    parser.parse
  end

  # when we find a new start tag check the table,
  # if it's in the table call the start method
  def on_start( tag, attrs )

    action = TAGS[tag]
    self.send( action[:start], tag, attrs ) if( action )

  end

  # when we find a new end tag check the table,
  # if it's in the table call the end method
  def on_end( tag )

    action = TAGS[tag]
    self.send( action[:end], tag ) if( action )

  end

  def on_chars( char )
    @buffer &lt;&lt; char

  end

  def start_entry( tag, attrs )
    @entry = RfcEntry.new

  end

  def add_entry( tag )
    @entry.update( @index ) if @entry

    @entry = nil
  end

  def collect( tag, attrs )

    @buffer = "" # reset the buffer
  end

  def store_doc_id( tag )

    @entry.doc_id = @buffer.squeeze(" ") if @entry and @entry.doc_id.nil?

  end

  def store_author( tag )
    @entry.authors &lt;&lt; @buffer.squeeze(" ") if @entry

  end

  def store_title( tag )
    @entry.title = @buffer.squeeze(" ") if @entry

  end

  def store_month( tag )
    @entry.month = @buffer.squeeze(" ") if @entry

  end

  def store_year( tag )
    @entry.year = @buffer.squeeze(" ") if @entry

  end

end

# parse and create the index
RfcIndexParser.new(Index::Index.new(:path =&gt; 'index')).parse( "#{RFC_PATH}/rfc-index.xml" )</pre>
<p>search.rb:</p>
<pre lang="ruby">#!/usr/bin/env ruby
require 'rubygems'
require 'ferret'
include Ferret

index = Index::Index.new(:path =&gt; 'index')

index.search_each('title|content:"URL"') do |id, score|

  doc = index[id]
  puts "#{doc[:id]} '#{doc[:title]}' #{score}"

end</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.idle-hacking.com/2006/09/a-little-ferreting/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>svn merge hell!</title>
		<link>http://www.idle-hacking.com/2006/08/svn-merge-hell/</link>
		<comments>http://www.idle-hacking.com/2006/08/svn-merge-hell/#comments</comments>
		<pubDate>Wed, 30 Aug 2006 18:48:00 +0000</pubDate>
		<dc:creator>taf2</dc:creator>
				<category><![CDATA[Software]]></category>
		<category><![CDATA[libxml]]></category>
		<category><![CDATA[Ruby]]></category>
		<category><![CDATA[Subversion]]></category>

		<guid isPermaLink="false">http://www.idle-hacking.com/2006/08/svn-merge-hell/</guid>
		<description><![CDATA[svn: REPORT request failed on '/svn/rhg/!svn/vcc/default' svn: Working copy path 'path/to/afile/in/your/project/a_file_that_is_broken' does not exist in repository If you&#8217;ve ever seen this error you&#8217;ve probably resorted to &#8216;rm -rf&#8217; Thanks to some of the great minds of revolution we have a simple fix that involves editing the .svn/entries file and locating an incorrect attribute revision=&#8221;0&#8243; And [...]]]></description>
			<content:encoded><![CDATA[<pre lang="bash">
svn: REPORT request failed on '/svn/rhg/!svn/vcc/default'
svn: Working copy path 'path/to/afile/in/your/project/a_file_that_is_broken' does not exist in repository
</pre>
<p>If you&#8217;ve ever seen this error you&#8217;ve probably resorted to &#8216;rm -rf&#8217;</p>
<p>Thanks to some of the great minds of <a href="http://www.revolutionhealth.com">revolution </a>we have a simple fix that involves editing the .svn/entries file and locating an incorrect attribute revision=&#8221;0&#8243;</p>
<p>And to automate this I wrote a little ruby script.  It uses <a href="http://libxml.rubyforge.org/">libxml-ruby</a> because I wanted to get fimilar with the API, which thankfully is very similar to the<a href="http://xmlsoft.org/"> C API. </a></p>
<p><strong>Note: </strong>This only applies to <a href="http://subversion.tigris.org/">subversion </a>1.3 client, the newer 1.4 client does not generate xml property files.</p>
<pre lang="ruby">
#!/usr/bin/env ruby
require 'find'
require 'pathname'
require 'rubygems'

require 'xml/libxml'
# going to search through all the folders in the current project
# and locate all .svn/entries files.  Parse each file looking for
# bad entries
# a bad entry is defined as
# any entry with a revision="0"

# that is not scheduled="add"

def start_doc
end

def start_element(name,attrs, entry_path)

 if( name == "entry" &&amp; attrs["revision"] == "0" &&amp; attrs["schedule"] != "add" )

   puts "Potential Error in #{entry_path}"
 end
end

def end_element(name)

end

def chars
end
def comments
end

subversion_folder = /\.svn$/i

root_path = Pathname.new(".").realpath
Find.find(root_path) do |file_name|

 if subversion_folder.match(file_name)
   Find.find(file_name) do |sub_file|

     entry_file = File.basename(sub_file)
     if entry_file == "entries"

       entry_path = "#{file_name}/#{entry_file}"
       parser = XML::SaxParser.new

       parser.on_start_element {|name,attrs| start_element( name, attrs, entry_path ) }

       parser.on_end_element {|name| end_element(name) }
       parser.filename = entry_path

       parser.parse
       break
     end
   end
 end

end
</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.idle-hacking.com/2006/08/svn-merge-hell/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
