Learning from Multi-Label Data

Introduction

Traditional classification is concerned with learning from a set of instances that are associated with a single label from a set of disjoint labels L, |L| > 1. In multi-label learning, instances are associated with a subset of L. Learning from multi-label data has recently received increased attention by researchers working on machine learning and data mining for two main reasons. The first one is the ubiquitous presence of multi-label data in application domains ranging from multimedia information retrieval to tag recommendation, query categorization, gene function prediction, medical diagnosis, drug discovery and marketing. The other reason is a number of challenging research problems involved in multi-label learning, such as dealing with label rarity, scaling to large number of labels and exploiting label relationships (e.g. hierarchies), with the most prominent one being the explicit modelling of label dependencies.

Our contribution

We have developed a number of ensemble methods that focus on (a) random subsets of the target variables, (b) random subsets of the target variables, input variables and training examples, (c) nodes of an artificial hierarchy that we construct through constrained clustering of the target variables, and (d) different clusters constructed at a pre-processing clustering step of the training examples. Our HOMER algorithm is considered as state-of-the-art according to a recent large-scale empirical study on multi-label learning algorithms. We have also studied the pruning of models at the second level of a Stacking process in order to reduce computational complexity. Furthermore, we have studied instance-based learning approaches, how to transform a vector of label scores to bipartitions and how to sample multi-label data in an effective way. We have done an extensive review of this area, which is among the most downloaded chapters from Springer's 2nd edition of the Data Mining and Knowledge Discovery Handbook. Finally, we have developed and are constantly enriching an open source Java library for multi-label learning, called Mulan.

Datasets

See the datasets page of Mulan.

Publications

G. Tsoumakas, I. Katakis, I. Vlahavas, "A Review of Multi-Label Classification Methods", in: Proceedings of the 2nd ADBIS Workshop on Data Mining and Knowledge Discovery (ADMKD 2006), pp 99-109, September 2006, Thessaloniki, Greece.
G. Tsoumakas, I. Katakis, "Multi-Label Classification: An Overview", International Journal of Data Warehousing and Mining, 3(3):1-13, 2007.
G. Tsoumakas, I. Vlahavas, "Random k-Labelsets: An Ensemble Method for Multilabel Classification", Proc. 18th European Conference on Machine Learning (ECML 2007), pp. 406-417, Warsaw, Poland, 17-21 September 2007.
K. Trohidis, G. Tsoumakas, G. Kalliris, I. Vlahavas. "Multilabel Classification of Music into Emotions". Proc. 9th International Conference on Music Information Retrieval (ISMIR 2008), pp. 325-330, Philadelphia, PA, USA, 2008.
E. Spyromitros, G. Tsoumakas, I. Vlahavas, “An Empirical Study of Lazy Multilabel Classification Algorithms”, Proc. 5th Hellenic Conference on Artificial Intelligence (SETN 2008), Springer, Syros, Greece, 2008.
G. Tsoumakas, I. Katakis, I. Vlahavas, “Effective and Efficient Multilabel Classification in Domains with Large Number of Labels”, Proc. ECML/PKDD 2008 Workshop on Mining Multidimensional Data (MMD'08), Antwerp, Belgium, 2008.
I. Katakis, G. Tsoumakas, I. Vlahavas, “Multilabel Text Classification for Automated Tag Suggestion”, Proceedings of the ECML/PKDD 2008 Discovery Challenge, Antwerp, Belgium, 2008.
A. Dimou, G. Tsoumakas, V. Mezaris, I. Kompatsiaris, I. Vlahavas, “An Empirical Study Of Multi-Label Learning Methods For Video Annotation”, 7th International Workshop on Content-Based Multimedia Indexing, IEEE, Chania, Crete, 2009
- [cbmi09-bow.rar] [cbmi09-mpeg.rar]
G. Nasierding, G. Tsoumakas, A. Kouzani, “Clustering Based Multi-Label Classification for Image Annotation and Retrieval”, 2009 IEEE International Conference on Systems, Man, and Cybernetics, IEEE, 2009.
G. Tsoumakas, A. Dimou, E. Spyromitros, V. Mezaris, I. Kompatsiaris, I. Vlahavas, “Correlation-Based Pruning of Stacked Binary Relevance Models for Multi-Label Learning”, Proceedings of the 1st International Workshop on Learning from Multi-Label Data (MLD'09), G. Tsoumakas, Min-Ling Zhang, Zhi-Hua Zhou (Ed.), pp. 101-116, Bled, Slovenia, 2009.
G. Tsoumakas, E. Loza Mencia, I. Katakis, S. Park, J. Furnkrnaz, “On the combination of two decompositive multi-label classification methods”, Workshop on Preference Learning, ECML PKDD 09, Eyke Hullermeir, Johannes Furnkranz (Ed.), pp. 114-133, Bled, Slovenia, 2009.
G. Tsoumakas, I. Katakis, I. Vlahavas, "Mining Multi-label Data", Data Mining and Knowledge Discovery Handbook, O. Maimon, L. Rokach (Ed.), Springer, 2nd edition, 2010.
G. Nasierding, A. Kouzani, G. Tsoumakas, “A Triple-Random Ensemble Classification Method for Mining Multi-label Data”, Proc. 2010 IEEE International Conference on Data Mining Workshops, pp. 49-56, 2010.
M. Ioannou, G. Sakkas, G. Tsoumakas, I. Vlahavas, “Obtaining Bipartitions from Score Vectors for Multi-Label Classification”, 22nd International Conference on Tools with Artificial Intelligence, 27-29 October 2010., IEEE, Arras, France, 2010.
E. Spyromitros-Xioufis, G. Tsoumakas, I. Vlahavas, “Multi-label Learning Approaches for Music Instrument Recognition”, Proc. 9th International Symposium on Methodologies for Intelligent Systems (ISMIS 2011), Warsaw, Poland, 2011
G. Tsoumakas, I. Katakis, I. Vlahavas, “Random k-Labelsets for Multi-Label Classification”, IEEE Transactions on Knowledge and Data Engineering, IEEE, 23(7), pp. 1079-1089, 2011.
E. Spyromitros-Xioufis, M. Spiliopoulou, G. Tsoumakas, I. Vlahavas, "Dealing with Concept Drift and Class Imbalance in Multi-label Stream Classification", Proc. 22nd International Conference on Artificial Intelligence (IJCAI 2011), AAAI press, Barcelona, Spain, 2011.
G. Tsoumakas, E. Spyromitros-Xioufis, J. Vilcek, I. Vlahavas, “Mulan: A Java Library for Multi-Label Learning”,Journal of Machine Learning Research, 12, pp. 2411-2414, 2011
K. Sechidis, G. Tsoumakas, I. Vlahavas, “On the Stratification of Multi-Label Data”, Proceedings of ECML PKDD 2011, Athens, Greece, 2011.
E. Spyromitros-Xioufis, K. Sechidis, G. Tsoumakas, I. Vlahavas, “MLKD's Participation at the CLEF 2011 Photo Annotation and Concept-Based Retrieval Tasks”, ImageClef Lab of CLEF 2011 Conference on Multilingual and Multimodal Information Access Evaluation, Amsterdam, Netherlands, 2011.
K. Trohidis, G. Tsoumakas, G. Kalliris, I. Vlahavas, “Multi-label classification of music by emotion”, EURASIP Journal on Audio, Speech, and Music Processing, 2011:4, 2011.

Bibliography

Have a look at our new online multi-label learning bibliography at CiteULike (100 papers, September, 2009). Much more useful, as you can grab BibTeX and RIS records, subscribe to the corresponding RSS feed, follow links to the papers' full pdf (may require access to digital libraries) and export the complete bibliography for BibTeX or EndNote use (requires CiteULike account).