AutoML at ICML 2018

Here at SigOpt, we provide a tuning platform for practitioners to develop machine learning models efficiently. To stay at the cutting edge, we regularly attend conferences such as the International Conference on Machine Learning (ICML). This year, ICML is being held in Stockholm during the second week of July, in conjunction with IJCAI, AAMAS, and ICCBR (acronyms are very popular in this community).

Each year, leaders in the community present amazing results over the course of tutorials, main conference proceedings, and workshops. But this year, one workshop holds special interest for the SigOpt research team: the Automatic Machine Learning workshop. This has been held in various forms for several years, and we have always found it to be an exciting exchange of ideas in a friendly community.

What makes this year so special? The results of the PAKDD 2018 data competition will be presented and, spoiler alert, not one but TWO alumni of our SigOpt research internship program successfully competed. Below we will introduce the competition and allow the competitors, Katharina Eggensperger and Jungtaek Kim, a chance to explain their submissions and AutoML strategies.

The PAKDD 2018 data competition

The main competition website explains the competition details, but the fundamental concept is to consider machine learning absent any field-specific expertise. On one hand, the competition is designed to identify best practices for building a machine learning model in a limited amount of time without domain knowledge. On the other hand, participants and observers can gain insights regarding the limitations of machine learning without expert insights, unlimited compute capacity, or both.

In the competition, there are ten datasets containing either discrete or continuous features, and binary labels are attached to each entry. Missing values are permitted, and some of the datasets had a sequential structure (the order of appearance in the file was not random). The goal is to create the best classifier, as judged by normalized AUC, in no more than 20 minutes per dataset. Of the ten datasets, five are reserved for helping to design the AutoML strategy, and participants are given feedback on their performance on those datasets during the design process. The other five are only tested after the final submission, and success in the competition is defined on this so-called "blind test" portion.

At midnight on March 31, 2018, this blind test was executed on the final submissions from each of the competitors. Competitor performance was ranked on each of the blind test datasets, and the winners were determined by the mean ranking. The victors in this competition were awarded monetary prizes, as well as travel funding to attend the Pacific-Asia Knowledge Discovery and Data Mining conference in Melbourne. Tables below show statistics of the datasets (to help describe the complexity of the task under consideration) and the performance (score and ranking) of the top 5 competitors.

Feedback datasets
Name # Samples # Features
Gina 3153 970
Ada 4147 48
Arcene 10000 100
Guillermo 20000 4296
Rl 31406 22
Blind test datasets
Name # Samples # Features
Pm 29964 89
Rh 31498 76
Ri 30562 113
Riccardo 20000 4296
Rm 28278 89
Competitor AUC (rank) on each blind test dataset
Place Team Pm Rh Ri Riccardo Rm Mean rank
1 aad_freiburg .55 (3) .28 (4) .39 (1) .263 (1) .677 (5) 2.8
2 mlg.postech .54 (5) .29 (2) .37 (2) .20 (9) .69 (1) 3.8
T-3 wlWangl .57 (2) .49 (1) .28 (5) -.09 (16) .684 (3) 5.4
T-3 thanhdng .513 (6) .226 (8) .26 (7) .260 (2) .678 (4) 5.4
T-3 Malik .509 (3) .230 (4) .27 (1) .24 (1) .685 (5) 5.4

As part of the AutoML workshop, competitors may submit articles detailing their participation strategy. Both Katharina and Jungtaek chose to do so, and they summarize their results below.

Practical Automated Machine Learning for the AutoML Challenge 2018

aad_freiburg

First of all, I would like to mention our whole ML team at the University of Freiburg since winning the challenge was really a team effort: Matthias Feurer, Stefan Falkner, Marius Lindauer and Frank Hutter. After winning the previous AutoML competition it was great to apply the experience we gathered to build our new system dubbed PoSH Auto-sklearn. This section will discuss some details of our solution and the results; a more complete treatment can be found in this post.

The size of the datasets in this challenge (both in number of entries and in number of features), combined with the extremely limiting 20 minute deadline for each model proved to be a significant challenge. We adapted our original Auto-sklearn strategy in two key ways: development of a set of static, complementary machine learning pipelines (ML portfolio) to warmstart Bayesian optimization, and successive halving during Bayesian optimization to accelerate hyperparameter selection. This generated the prefix PoSH (Portfolio Successive Halving) to the Auto-sklearn foundation.

Our goal in constructing a portfolio of ML pipelines was to obtain a small set of robust ML pipelines, such that for any dataset there would likely be at least one pipeline in the portfolio that performs well. After analysis on 421 binary classification datasets from OpenML, we identified 16 ML pipelines (using an automated process based on SMAC3). We then supplemented these pipelines once the actual dataset of interest was known.

Other details regarding our design decisions include:

  • During hyperparameter tuning, we used a 2/3 split of the available data for training and 1/3 split for validation.
  • Datasets with more than 500 features underwent feature selection down to 500 features through a random 1000 subselection of available data points.
  • We limited the classifiers from which to define the pipelines to 4 standard tools: random forests, support vector machines, XGboost, and the SGDclassifier.
  • The configuration space contained 37 total free parameters (listed in our article) which could be tuned to create additional classifiers to supplement the portfolio.
  • In contrast to normal hyperparameter selection where a single set of hyperparameter values is chosen, we saved all machine learning models and used them to construct an ensemble to guard against overfitting and improve generalization.

An interesting note in the analysis of our results is that, despite the availability of numerous classifiers, both as part of the ML portfolio and as available for tuning with the dataset, our final ensembles only consisted of XGBoost classifiers. Further investigation may find that XGBoost classifiers perform the best with limited tuning time.

All in all we were really happy how all the components comprising our submission worked well together. We felt especially satisfied that the addition of successive halving helped pick well-performing machine learning pipelines. In the future we plan to further automate the construction of AutoML systems which automatically adapt to the the task at hand.

Automated Machine Learning for Soft Voting in an Ensemble of Tree-based Classifiers

mlg.postech

Our automated machine learning goal for this competition is to develop a model that performs well without human intervention. A significant challenge was an aggressive time constraint. All aspects, including algorithm selection, hyperparameter optimization and model parameter learning must take place during the 1200 second time limit for each dataset. To deal with this, our models were limited to strategies that can be executed quickly.

We built a soft majority voting model for combining tree-based classifiers: gradient boosting classifier, extra-trees classifier, random forests classifier. For the soft voting strategy, we give each member of the ensemble the opportunity to state their probabilities for a given class, and then we choose the class best represented according to the weights measuring our trust in that classifier. The PAKDD competition datasets were all binary classification problems.


Figure 1: Graphic description of possible weights used in soft majority voting model for mlg.postech.


Hyperparameters of each of the models were automatically tuned by Bayesian optimization. Our Bayesian optimization strategy used Gaussian processes for regression and GP-UCB for our acquisition function. Bayesian optimization should efficiently choose the hyperparameters that maximize our validation accuracy from a 60/40 split of our provided training data into training/validation groups. We chose 6 hyperparameters to set in mlg.postech:

  • the relative weights of the classifiers in the soft majority voting (2 hyperparameters),
  • the number of estimators in each of the component the classifiers (3 hyperparameters),
  • the maximum depth of the gradient boosting classifier.

There are many other choices to make for these models, but we decided to limit the free parameters to only the parameters that we feel would have the most impact. In our experience, these parameters are the most important for tuning in the limited time we have to train the model. The parameter bounds were changed depending on properties of the dataset. For futher detail on these changes, refer to our submission for the AutoML workshop.

In our results, we performed well, placing second behind the team from Freiburg. We are proud to have created an effective classifier with low complexity, especially one which consists entirely of tree-based components. The idea of creating a weighted voting strategy allowed each of the individual models to have only 1 or 2 free parameters and still construct an effective classifier by combining them. Bayesian optimization was an effective tool for optimizing in this 6 parameter space, which was important because the 1200 seconds was a very limited time.

We would like to thank the creators of this competition for organizing it and providing an opportunity to present our results. We look forward to the next competition!

Conclusion

Here at SigOpt, our mission is to empower the world's experts with solutions that automate the most tedious, challenging tasks of model development -- such as hyperparameter optimization. We are always thinking about the most effective ways to give our customers and academic users the tools they need to make optimal decisions that accelerate and amplify the impact of their machine learning models.

Competitions like this help explore the complexities of developing effective ML pipelines under severe time restrictions and without expert intuition regarding the desired or expected output. The community benefits greatly from the participants' experimentation and rigorous documentation, and the research team at SigOpt is using these results to refine and augment our own services. We are incredibly proud of the continued excellence of our intern alumni and wish them the best in their future work.

Katharina is currently pursuing her PhD at the University of Freiburg. She is interested in practical hyperparameter tuning, automated machine learning and exploring methods to model the performance of algorithms. Katharina worked as an intern at SigOpt in Fall 2017.
Mike McCourt, PhD
Jungtaek is a Ph.D. student in computer science at POSTECH in South Korea. He is studying machine learning and Bayesian optimization. Jungtaek worked as an intern at SigOpt in spring 2018.
Mike McCourt, PhD
Mike studies mathematical and statistical tools for interpolation and prediction. Prior to joining SigOpt, he spent time in the math and computer science division at Argonne National Laboratory and was a visiting assistant professor at the University of Colorado-Denver where he co-wrote a text on kernel-based approximation. Mike holds a PhD and MS in Applied Mathematics from Cornell and a BS in Applied Mathematics from Illinois Institute of Technology.
Mike McCourt, PhD