Focused Crawling aims to index the web
according to a specific theme
and thus support domain-specific search engines and thematic web
portals. Reinforcement is a very suitable approach to training focused
crawlers, due to the nature of crawlers, which can only receive partial
feedback at the end of a successful crawl.
You can find a collection of publications
in the field of Focused Crawling here
(last update: May 6, 2008 - 20 papers). The list is, of course,
incomplete. For suggestions, additions or if you have a paper on this
field, please contact Ioannis Partalas
Scripts for downloading web pages from dmoz:
you can find datasets created from Web
the datasets, in Weka format
The description of the datasets can be found in .
 I. Partalas, G. Paliouras, I.
Learning with Classifier
Selection for Focused Crawling
, 18th European
Conference on Artificial Intelligence, 2008 (accepted for presentation)