||Author(s): I. Katakis, G. Tsoumakas, I. Vlahavas.
Title: “On the Utility of Incremental Feature Selection for the Classification of Textual Data Streams”.
Click here to download the PDF (Acrobat Reader) file (11 pages).
Text Mining, Text Classification, Feature Based Classifiers, Dynamic Feature Space, Dynamic Feature Selection, Data Streams, Concept Drift.
10th Panhellenic Conference on Informatics (PCI 2005), P. Bozanis and E.N. Houstis (Eds.), Springer-Verlag, LNCS 3746, pp. 338-348, Volos, Greece, 11-13 November, 2005.
Abstract: In this paper we argue that incrementally updating the fea-
tures that a text classification algorithm considers is very important for
real-world textual data streams, because in most applications the distri-
bution of data and the description of the classification concept changes
over time. We propose the coupling of an incremental feature ranking
method and an incremental learning algorithm that can consider differ-
ent subsets of the feature vector during prediction (what we call a feature
based classifier), in order to deal with the above problem. Experimental
results with a longitudinal database of real spam and legitimate emails
shows that our approach can adapt to the changing nature of streaming
data and works much better than classical incremental learning algo-