Focused Crawling
Introduction
Focused Crawling aims to index the web
according to a specific theme
and thus support domain-specific search engines and thematic web
portals. Reinforcement is a very suitable approach to training focused
crawlers, due to the nature of crawlers, which can only receive partial
feedback at the end of a successful crawl.
Bibliography
You can find a collection of publications
in the field of Focused Crawling here
(last update: May 6, 2008 - 20 papers). The list is, of course,
incomplete. For suggestions, additions or if you have a paper on this
field, please contact Ioannis Partalas
(partalas[at]csd.auth.gr).
Source code
Scripts for downloading web pages from dmoz:
Datasets
Here
you can find datasets created from Web
pages.
Download
the datasets, in Weka format
(.arff),here.
The description of the datasets can be found in [1].
Single files:
Publications
[1] I. Partalas, G. Paliouras, I.
Vlahavas,
Reinforcement
Learning with Classifier
Selection for Focused Crawling
, 18th European
Conference on Artificial Intelligence, 2008 (accepted for presentation)