Ensemble Pruning

Introduction

Ensemble Pruning, also known as ensemble selection, selective ensemble and ensemble thinning, deals with the reduction of the ensemble size prior to combining the members of the ensemble. It is important for two reasons: a) efficiency: Having a very large number of models in an ensemble adds a lot of computational overhead, and b) predictive performance: An ensemble may consist not only of high performance models, but also of models with lower predictive performance. Pruning the low-performing models while maintaining a good diversity of the ensemble is typically considered as a proper recipe for a successful ensemble.

Our contribution

We have developed a number of approaches for ensemble pruning. Our early work involved methods that use statistical tests in order to select a subset of models with statistically significant accuracy difference from the rest of the models [1, 2]. In addition, we modeled the ensemble pruning task as a reinforcement learning task and used Q-learning to solve it [3, 4]. We have also looked at applications of ensemble pruning to water quality prediction [5, 6]. Furthermore, we have proposed heuristics for greedy exploration of the space of sub-ensembles using directed hill-climbing [7, 10, 12]. We have also proposed a technique for dynamic, also called instance-based, ensemble pruning based on multi-label learning [11]. Finally, we have contributed a taxonomy of ensemble pruning techniques [8, 9].

Bibliography

Have a look at our online ensemble pruning bibliography at CiteULike. You can grab BibTeX and RIS records, subscribe to the corresponding RSS feed, follow links to the papers' full pdf (may require access to digital libraries) and export the complete bibliography for BibTeX or EndNote use (requires CiteULike account)..

Source Code

Here you can find the source code for performing ensemble pruning. We implemented several algorithms from the recent bibliography which are builded under a common framework. Soon a documentation will be availiable. Also, we intend to make a UI in order to help the users to experiment with ensemble pruning methods.

Additionally, we implemented a package for performing several statistical tests (Nemenyi, Wilcoxon).

The software is distributed under the GNU GPL licence. It requires Java v1.5 or better and Weka v3.5.5. Please contact Ioannis Partalas for bug reports, comments, suggestions or request for help with the source code.

Source code developers: Ioannis Partalas, Grigorios Tsoumakas.

Production of models: Homogeneous Models , Heterogeneous Models
Ensemble Selection: EnsembleSelection.tar.gz
Statistical tests: StatisticalTests.tar.gz

Publications

Tsoumakas, G.; Katakis, I.; Vlahavas, I. (2004) Effective Voting of Heterogeneous Classifiers, Proc. 15th European Conference on Machine Learning (ECML 2004), Springer-Verlag, LNAI 3201, pp. 465-476, Pisa, Italy, September 2004.
Tsoumakas, G.; Angelis, L.;Vlahavas, I. (2005) Selective Fusion of Heterogeneous Classifiers, Intelligent Data Analysis, 9(6), 511-525, IOS Press.
Partalas, I.; Tsoumakas, G.; Katakis, I.; Vlahavas I. (2006) Ensemble Pruning using Reinforcement Learning, Proc. 4th Hellenic Conference on Artificial Intelligence (SETN 2006), LNAI 3955, pp 301-310, Heraklion, Greece, May 18-20, 2006.
Partalas, I.; Tsoumakas, G.; Vlahavas, I. (2009) Pruning an Ensemble of Classifiers via Reinforcement Learning, Neurocomputing, Elsevier, 72(7-9), pp. 1900-1909.
Partalas, I.; Tsoumakas, G.; Hatzikos, E.; Vlahavas, I. (2007) Ensemble Selection for Water Quality Prediction, Proc. 10th International Conference on Engineering Applications of Neural Networks (EANN 2007), pp 428-435, Thessaloniki, Greece, August 29-31, 2007.
Partalas, I.; Tsoumakas, G.; Hatzikos, E.; Vlahavas, I. (2008) Greedy Regression Ensemble Selection: Theory and an Application to Wated Quality Prediction, Information Sciences 178(20), pp. 3867-3879.
I. Partalas, G. Tsoumakas, I. Vlahavas, "Focused Ensemble Selection: A Diversity-Based Method for Greedy Ensemble Selection", Proc. 18th Euporean Conference on Artificial Intelligence, 2008 (accepted for presentation).
G. Tsoumakas, I. Partalas, I. Vlahavas, A Taxonomy and Short Review of Ensemble Selection, ECAI, Workshop on Supervised and Unsupervised Ensemble Methods and Their Applications (SUEMA-2008), pp. 41-46,2008.
G. Tsoumakas, I. Partalas, I. Vlahavas, An Ensemble Pruning Primer, Oleg Okun and Valentini (eds) Supervised and Unsupervised Methods and their Applications to Ensemble Methods (SUEMA 2008), Spinger Verlag, 2009.
I. Partalas, G. Tsoumakas, I. Vlahavas, “An Ensemble Uncertainty Aware Measure for Directed Hill Climbing Ensemble Pruning”, Machine Learning, Springer, 2010.
F. Markatopoulou, G. Tsoumakas, I. Vlahavas, “Instance-Based Ensemble Pruning via Multi-Label Classification”, 22nd International Conference on Tools with Artificial Intelligence, 27-29 October 2010., IEEE, Arras, France, 2010.
I. Partalas, G. Tsoumakas, I. Vlahavas, “A Study on Greedy Algorithms for Ensemble Pruning”,Technical Report TR-LPIS-360-12, LPIS, Dept. of Informatics, Aristotle University of Thessaloniki, Greece, 2012.