Anemone is a free, multi-threaded Ruby web spider framework from Chris Kite, which is useful for collecting information about websites. With Anemone you can write tasks to generate some interesting statistics on a site just by giving it the URL.
Its only dependency is Nokogiri (an HTML and XML parser). Other than that, you just need to install the gem to get started using Anemone's simple syntax which, among other things, allows you to tell it which pages to include (based on regular expressions) or define callbacks.
This example taken from Anemone's homepage prints out the URL of every page on a site:
require 'anemone'
Anemone.crawl("http://www.example.com/") do |anemone|
anemone.on_every_page do |page|
puts page.url
end
end
The bin folder in the project contains some more in-depth examples, including tasks for counting the number of unique pages on a site, the number of pages at a certain depth in a site, or a list of urls encountered. There's also a combined-task which wraps up a few of these, intended to be run as a daily cron job.
You can install Anemone as a gem or get the source from Github of course, and there's some fairly comprehensive RDoc documentation available in the source or online.
Also worth seeing.. Mobile Orchard's Beginning iPhone Programming Workshop. Bay Area/July 30-31. Seattle/Aug 20-21. Ruby Inside discount of $200 -- use "ri" discount code.






This post is by 


It has always been a trend with Rubyists to take things that have poor interfaces and give them better ones. Javan Makhmali from 