Discovering the Science behind Hyperparameter Tuning

Companies hire large teams of data scientists to manually tune hyperparameter configurations of deep learning models. These parameters are used to control the learning process and the tuning is extremely tedious and time consuming since it involves training the model to know the resulting performance of each configuration.

For Bryan Low, an associate professor at the National University of Singapore’s Department of Computer Science, the burning question is: “Can we transform this process of optimising the hyperparameters of a machine learning (ML) model into a rigorous ‘science’?”

Prof Low is intrigued by this possibility, which will free up data scientists to work on results analysis and other more meaningful tasks. It also dovetails with his wider research vision, which is to enable “learning with less data”.

The quest for answers led him to delve deeper into the area of automated machine learning (AutoML), specifically Bayesian optimisation algorithms which help simplify and quicken the search for optimal settings by identifying which parameters are dependent on one another.

Tackling the fundamental questions

“Traditionally, it is considered an ‘art’ to tune the hyperparameter configurations of deep learning and ML models such as learning rate, number of layers and number of hidden units, so as to optimise their predictive performance,” explained Prof Low.

To transform this into a science, several fundamental questions had to be tackled. For example: How can Bayesian optimisation be scaled to handle a large number of hyperparameters and large batches of hyperparameter queries? How can auxiliary information be exploited to boost its performance? How can Bayesian optimisation be performed under privacy settings?

In seeking answers to these questions, one of the interesting things that Prof Low uncovered was that AutoML/ Bayesian optimisation tools can have many applications beyond the hyperparameter optimisation of ML models.

“There are many complex ‘black-box’ problems to which Bayesian optimisation can be applied, to reduce the number of costly trials/experiments needed to find an optimal solution,” he noted. Examples included optimising properties in material design or battery design, optimising the environmental conditions for maximising crop yield, the performance of adversarial ML, and single- and multi-agent reinforcement learning.

Multi-party machine learning

More recently, Prof Low has embarked on another line of research to achieve his vision of “learning with less data”. He is working in multi-party machine learning where a party with some data tries to improve its ML model by collaborating with other parties with data.

There are two key challenges involved in this. The first lies in having to combine heterogeneous black-box models without any knowledge of their internal architecture and local data, in order to create a single predictive model that is more accurate than its composite models.

One way to address this is to find a common language to unite the disparate models. This paves the way for the creation of a surrogate model from the different machine learning models, and has the potential to elevate machine learning to another level by combining multiple models to harness their collective intelligence.

The second challenge lies in trusted data sharing and data valuation, where Prof Low and his research team ask questions such as: “How can multiple parties be incentivised to share their data? How do we value their data?”

In this pioneering work, Prof Low has introduced a novel and intuitive perspective – a party that contributes more valuable data will receive a more valuable model in return (instead of monetary reward). To achieve this, formally-defined incentives such as fairness and stability have been adapted from cooperative game theory to encourage collaboration in machine learning.

His research journey

For Prof Low, research can be described as a hobby – one that that he has been pursuing for nearly two decades. During his final year as an undergraduate, it even replaced gaming as something that he would “naturally indulge in”, and he has not looked back since.

The field of AI/ML has likewise powered on. Prof Low remembers that when he first presented at the AAAI (Association for the Advancement of Artificial Intelligence) conference back in 2004, there were only 453 papers submitted for review. This year, there were 7,737.

Indeed, as his passion for research continues to burn, his chosen field of AI/ML has gone “from cold to scotching hot”.