Introduction
Recent advances in sensor, storage, processing and
communication technologies have enabled the automated recording of data, leading
to fast and continuous flows of information, referred to as data streams.
Examples of data streams are the web logs and web page click streams recorded by
web servers, transactions like credit card usage, data from network monitoring
and sensor networks, video streams such as images from surveillance cameras,
news articles in an RSS reader etc. The dynamic nature of data streams requires
continuous or at least periodic updates of the current knowledge in order to
ensure that it always includes the information content of the latest batch of
data. This is important in applications where the concept of a target class
and/or the data distribution changes over time. This phenomenon is commonly
known as concept drift.
Bibliography
You can find a collection of publications in the field of Concept Drift
Classification here
(last update: May 5, 2008). The list is, of course,
incomplete. For suggestions, additions or if you have a paper on this
field, please contact Ioannis Katakis.
Datasets 1:
Here you can find datasets from the text domain that include concept drift.
The datasets are in Weka (.arff) format.
Spam Assassin Corpus (Gradual Concept Drift)
Mailing Lists (Instant Concept Drift)
You can find the description of both datasets in [3].
Datasets 2:
Usenet1
Usenet2
You can find the description of the above datasets in [2].
Datasets 3:
Email Data
Spam Data
You can find the description of the above datasets in [1].
Publications
-
I. Katakis, G. Tsoumakas, I. Vlahavas, “Tracking Recurring Contexts using Ensemble Classifiers: An Application to Email Filtering”,
Knowledge and Information Systems, Springer, 2009
-
I. Katakis, G. Tsoumakas, I. Vlahavas, “An Ensemble of Classifiers for coping with Recurring Contexts in Data Streams”,
18th European Conference on Artificial Intelligence, IOS Press, Patras, Greece, 2008.
-
I. Katakis, G. Tsoumakas, E. Banos, N. Bassiliades, I. Vlahavas, “An Adaptive Personalized News Dissemination System”,
Journal of Intelligent Information Systems (accepted for publication), Springer, 2008.
-
I. Katakis, G. Tsoumakas, I. Vlahavas, “Dynamic Feature Space and Incremental Feature Selection for the Classification of Textual Data Streams”, ECML/PKDD-2006 International Workshop on Knowledge Discovery from Data Streams, pp. 107-116, Berlin, Germany, 2006.
-
I. Katakis, G. Tsoumakas, I. Vlahavas, “On
the Utility of Incremental Feature Selection for the Classification of
Textual Data Streams”, 10th Panhellenic Conference on Informatics (PCI
2005), P. Bozanis and E.N. Houstis (Eds.), Springer-Verlag, LNCS 3746, pp.
338-348, Volos, Greece, 11-13 November, 2005.
|