MLKD - Classification of Concept Drifting Data Streams

Machine Learning &
Knowledge Discovery Group

	People
	Research
	Publications
	Courses
	Projects
	Links

Concept Drift

Introduction

Recent advances in sensor, storage, processing and communication technologies have enabled the automated recording of data, leading to fast and continuous flows of information, referred to as data streams. Examples of data streams are the web logs and web page click streams recorded by web servers, transactions like credit card usage, data from network monitoring and sensor networks, video streams such as images from surveillance cameras, news articles in an RSS reader etc. The dynamic nature of data streams requires continuous or at least periodic updates of the current knowledge in order to ensure that it always includes the information content of the latest batch of data. This is important in applications where the concept of a target class and/or the data distribution changes over time. This phenomenon is commonly known as concept drift.

Bibliography

You can find a collection of publications in the field of Concept Drift Classification here (last update: May 5, 2008). The list is, of course, incomplete. For suggestions, additions or if you have a paper on this field, please contact Ioannis Katakis.

Datasets 1:
Here you can find datasets from the text domain that include concept drift. The datasets are in Weka (.arff) format.
Spam Assassin Corpus (Gradual Concept Drift)
Mailing Lists (Instant Concept Drift)
You can find the description of both datasets in [3].

Datasets 2:
Usenet1
Usenet2
You can find the description of the above datasets in [2].

Datasets 3:
Email Data
Spam Data
You can find the description of the above datasets in [1].

Publications

I. Katakis, G. Tsoumakas, I. Vlahavas, “Tracking Recurring Contexts using Ensemble Classifiers: An Application to Email Filtering”, Knowledge and Information Systems, Springer, 2009
I. Katakis, G. Tsoumakas, I. Vlahavas, “An Ensemble of Classifiers for coping with Recurring Contexts in Data Streams”, 18th European Conference on Artificial Intelligence, IOS Press, Patras, Greece, 2008.
I. Katakis, G. Tsoumakas, E. Banos, N. Bassiliades, I. Vlahavas, “An Adaptive Personalized News Dissemination System”, Journal of Intelligent Information Systems (accepted for publication), Springer, 2008.
I. Katakis, G. Tsoumakas, I. Vlahavas, “Dynamic Feature Space and Incremental Feature Selection for the Classification of Textual Data Streams”, ECML/PKDD-2006 International Workshop on Knowledge Discovery from Data Streams, pp. 107-116, Berlin, Germany, 2006.
I. Katakis, G. Tsoumakas, I. Vlahavas, “On the Utility of Incremental Feature Selection for the Classification of Textual Data Streams”, 10th Panhellenic Conference on Informatics (PCI 2005), P. Bozanis and E.N. Houstis (Eds.), Springer-Verlag, LNCS 3746, pp. 338-348, Volos, Greece, 11-13 November, 2005.