also known as ensemble selection, selective
ensemble and ensemble thinning, deals
with the reduction of the ensemble size prior to combining the members
of the ensemble. It is important for two reasons: a) efficiency: Having
a very large number of models in an ensemble adds a lot of
computational overhead, and b) predictive performance: An ensemble may
consist not only of high performance models, but also of models with
lower predictive performance. Pruning the low-performing models while
maintaining a good diversity of the ensemble is typically considered as
a proper recipe for a successful ensemble.
We have developed a number of approaches for ensemble pruning.
Our early work involved methods that use statistical tests in order
to select a subset of models with statistically significant accuracy
difference from the rest of the models [1, 2]. In addition, we
modeled the ensemble pruning task as a reinforcement learning task
and used Q-learning to solve it [3, 4]. We have also looked at
applications of ensemble pruning to water quality prediction [5, 6].
Furthermore, we have proposed heuristics for greedy exploration of
the space of sub-ensembles using directed hill-climbing [7, 10, 12].
We have also proposed a technique for dynamic, also called
instance-based, ensemble pruning based on multi-label learning .
Finally, we have contributed a taxonomy of ensemble pruning
techniques [8, 9].
Have a look at our online ensemble pruning bibliography at CiteULike. You can grab BibTeX and RIS records, subscribe to the corresponding RSS feed, follow links to the papers' full pdf (may require access to digital libraries) and export the complete bibliography for BibTeX or EndNote use (requires CiteULike account)..
Here you can find the source code for
performing ensemble pruning. We implemented several algorithms from
the recent bibliography which are builded under a common framework.
Soon a documentation will be availiable. Also, we intend to make a UI
in order to help the users to experiment with ensemble pruning
Additionally, we implemented a package for
performing several statistical tests (Nemenyi, Wilcoxon).
The software is distributed under the GNU
GPL licence. It requires Java v1.5 or better and Weka v3.5.5. Please
Partalas for bug reports, comments, suggestions or request
for help with the source code.
Source code developers: Ioannis Partalas,
- Tsoumakas, G.; Katakis, I.; Vlahavas, I. (2004) Effective
Voting of Heterogeneous Classifiers, Proc. 15th European
Conference on Machine Learning (ECML 2004), Springer-Verlag, LNAI 3201,
pp. 465-476, Pisa, Italy, September 2004.
- Tsoumakas, G.; Angelis, L.;Vlahavas,
Fusion of Heterogeneous Classifiers, Intelligent Data
Analysis, 9(6), 511-525, IOS Press.
- Partalas, I.; Tsoumakas, G.; Katakis,
I.; Vlahavas I. (2006)
Pruning using Reinforcement Learning, Proc. 4th Hellenic
Conference on Artificial Intelligence (SETN 2006), LNAI 3955, pp
301-310, Heraklion, Greece, May 18-20, 2006.
- Partalas, I.; Tsoumakas, G.; Vlahavas, I. (2009)
Pruning an Ensemble of Classifiers via Reinforcement Learning,
Neurocomputing, Elsevier, 72(7-9), pp. 1900-1909.
- Partalas, I.; Tsoumakas, G.; Hatzikos,
E.; Vlahavas, I. (2007)
Selection for Water Quality Prediction, Proc. 10th
International Conference on Engineering Applications of Neural Networks
(EANN 2007), pp 428-435, Thessaloniki, Greece, August 29-31, 2007.
- Partalas, I.; Tsoumakas, G.; Hatzikos, E.; Vlahavas,
Selection: Theory and an Application to Wated Quality Prediction,
Information Sciences 178(20), pp. 3867-3879.
- I. Partalas, G. Tsoumakas, I. Vlahavas, "Focused
Ensemble Selection: A Diversity-Based Method for Greedy Ensemble
Selection", Proc. 18th Euporean Conference on Artificial
Intelligence, 2008 (accepted for presentation).
- G. Tsoumakas, I. Partalas, I. Vlahavas, A
Taxonomy and Short Review of Ensemble Selection, ECAI, Workshop on Supervised and Unsupervised Ensemble Methods and
Their Applications (SUEMA-2008), pp. 41-46,2008.
- G. Tsoumakas, I. Partalas, I. Vlahavas,
An Ensemble Pruning Primer, Oleg Okun and Valentini (eds) Supervised and Unsupervised Methods
and their Applications to Ensemble Methods (SUEMA 2008), Spinger Verlag, 2009.
- I. Partalas, G. Tsoumakas, I. Vlahavas, “An Ensemble Uncertainty Aware Measure for Directed Hill Climbing Ensemble Pruning”,
Machine Learning, Springer, 2010.
- F. Markatopoulou, G. Tsoumakas, I. Vlahavas, “Instance-Based Ensemble Pruning via Multi-Label Classification”,
22nd International Conference on Tools with Artificial Intelligence, 27-29 October 2010., IEEE, Arras, France, 2010.
- I. Partalas, G. Tsoumakas, I. Vlahavas, “A Study on Greedy Algorithms for Ensemble Pruning”,Technical Report TR-LPIS-360-12, LPIS, Dept. of Informatics, Aristotle University of Thessaloniki, Greece, 2012.