AutoML: A Revolution in machine learning

3 December, 2022

Contributors

Prathik Shetty

@prathikkshetty15766

Want to hear a joke? AutoML a day keeps data scientists away! This might not sound funny but let's see if it is really possible🤔

What is AutoML?

So you might think, what is all this fuss about, and isn't machine learning itself a hot topic that is vast to explore then how is it bettered? AutoML stands for Automated Machine learning. It is a subfield of Data Science and is associated with Machine learning. AutoML is the process of automating end-to-end machine learning models and applying them to real-world solutions.

It is basically a user-friendly approach to machine learning. A tool to ease the process of applying machine learning to data in a nutshell. So does it do all the tasks by itself? let's take a look at its architecture and break down its functioning in further parts.

How AutoML eases the process?

Most AutoML frameworks allow users to access a no-code or low-code approach to build machine learning models. Most users apply AutoML to ease the model selection and optimization as well it is used by data scientists to automate data preparation and ingestion steps. let's take a look at the features AutoML provides.

Stages of ML automated by AutoML

•

data preparation

•

data preprocessing

•

exploratory data analytics

•

feature engineering

•

feature selection

•

hyperparameter tuning

•

ensembling

•

model selection

Is AutoML taking over the jobs of Data scientists?

My answer - Absolutely NO! It only eases the life of a data scientist. How?

It frees data scientists from the burden of repetitive and time-consuming tasks such as :

•

data cleaning

•

model selection

•

building model intuition

And by automating these time-consuming tasks they can focus on the main bargain of a machine learning model i.e accuracy of the models and end result generations. So at the end of the day, AutoML is not going to end the need for data scientists.

Architecture of the AutoML softwares

There are multiple stages involved to truly put a machine learning model into use-case. The two main stages are training and inferencing i.e putting the model into production. But the training part takes a lot of time and hence a lot of models require long duration of time before being put into the inferencing stage.

This training phase includes a lot of tedious tasks such as :

Data gathering and integration from various sources

Data cleaning to ensure the model gets proper data

Exploratory data analysis to know about the data

Feature engineering consists of various sub-stages such as :

•

Feature selection - only selecting useful features and eliminating the rest

•

Feature scaling - scale the existing features to be able to fit the model criterion

•

Feature extraction - the creation of new variables from the raw data

Model selection according to the target and features

Model optimization is done by tuning hyperparameters

Finally, model evaluation takes place to test the utility of the model

A flowchart demonstrating the ML lifecycle handling done by an AutoML tool

The AutoML approach helps data scientists avoid the hassle of training on different parameters and repeating the time-consuming process of validating their models and selecting a suitable model .

The Model begins with the data preparation stage and does all the cleaning, integration, and analysis. Furthermore, the feature engineering part gets executed by the AutoML method. The main part is ensembling or selecting and training suitable models. For this, the Automated flow is set up wherein the model is fed with different parameters and values. The best result is stored by the model. The evaluation is mainly done by testing the results against factors like F1 score, Recall, Precision value, etc. So, this saves the developers from spending time on extensive model training and evaluation. Helps them focus on putting their models to real-world use cases into production.

Thus AutoML helps widely in real-world applied ML models. A lot of companies can quickly execute their predictive modeling part with the help of AutoML.

How do AutoML tools do predictive modeling?

The driving powers behind the AutoML method are Reinforcement learning & Recurrent neural networks. First, the RNNs propose a set of random hyperparameters such as nodes per layer, layer count, etc, and build the model. Then Reinforcement learning (RL) assigns a reward/punishment based on the model's accuracy score. A higher accuracy tends towards reward while a lower gets punishment. Thus selecting a model with the highest accuracy as deemed to fit the model evaluated.

Demonstration of an AutoML library TPOT

So as discussed above this AutoML library uses a combination of reinforcement learning and recurrent neural networks to choose the best hyperparameters and the best suitable model optimization.

The above model is for the digit classification problem. The TPOT classifier builds a model on random parameters and then tests the accuracy of each generation here i.e 5. It was able to determine K nearest neighbors using K=1 & Euclidean distance. Thus classifying the MNIST dataset. The best combination of high-accuracy reaping models is selected at the end. Below the best result can be seen.

You can check this code here :

TPOT - digit classification model

https://gist.github.com/prathikshetty2002/80e3bd0449c695027c3daf75995c49ed

TPOT classifier generates a fine tuned model for digit classification

Why use AutoML?

Most people don't have a deep knowledge of machine learning and aren't well-versed in model training and optimization. AutoML solves this dilemma by offering low-code solutions. Where people can upload their data and just click a button and their model is trained and deployed to production. Many cloud services actually offer AutoML services to their users. This saves the time of a business and provides data insights and whether to invest in the project is clearly understood. No need of hiring specialists, but in some cases where a more practical approach is required AutoML can't be the only reliable source.

AutoML eliminates the need to understand the internal structure of the model and anybody can apply a machine learning model without much knowledge. Companies that don't want to spend much on ML experts can save their time here. If there isn't much extensive machine learning required the best solution would be AutoML.

Pros

•

Efficiency - It speeds up and simplifies the machine learning process and reduces the training time of machine learning models.

•

Cost-effective - a company can save money by utilizing the fast and efficient way rather than going for traditional ML and using more resources.

•

Easy to adopt - since it is simpler and even people with lesser knowledge of ML can learn and apply it, the need for extensive hiring and training is saved.

•

Performance - these automated ML models turn out to be better than hand-code models.

Cons

•

Cutting edge - since it is a relatively new field, many tools aren't developed. Hence, it can't be used in all the fields of applications

•

Over-reliance - these tools are developed to ease the life of developers not to substitute their work. So, despite automation in processes like monitoring, modeling and optimization the model still needs to be monitored by the developers. Else, there are high chances of failure or false predictions.

Some widely used AutoML libraries

Top AutoML libraries widely used around the world

Open-source libraries

•

Auto Sklearn

•

Auto Keras

•

TPOT

•

ML Box

•

FLAML

You can find more about them here :

Top AutoML open source libraries

https://moez-62905.medium.com/top-automl-python-libraries-in-2022-2d306cf7acf0

Commercial services

•

Google Cloud AutoML

•

Microsoft Azure AutoML

•

Amazon Sagemaker

You can find more about them here :

Top commercial AutoML services

https://www.akkio.com/post/benchmarking-the-top-automl-platforms-an-in-depth-analysis

Some AutoML project ideas

Customer retention analysis & prediction

Crop quality testing and assessment

Image recognition for recommendation systems

Analysis of spam and classification

Insights in the healthcare sector & predictive analysis

If you enjoyed reading do leave a like & comment 💯🚀 Thanks :)

develevate

hotintech

machinelearning

datascience