Telco Customer Churn Prediction

25 August, 2020

Contributors

Mia (Yuhsin) Hou

@miayuxin

Data Science

Pandas

NumPy

Languages

Python

Machine Learning

scikit-learn

Charting Libraries

Matplotlib

Introduction

Problem Definition

•

What are the factors influent the churn?

•

How to predict churn and customer lifetime values?

Dataset

This IBM sample dataset has information about telco customers, and if they left the company within the last month (churn).

The data set contains information about Telco customers where each row represents unique customers and the columns are information regarding customers’services. There are a total of 7,032 customers in the dataset among which 1,869 left within the last month. With a churn rate that high, i.e 26.58%, Telco may run out of customers in the coming months if no action is taken.

Data Preprocessing

Feature engineering for customer lifetime value

First, I were using “get_dummies” to convert all categorical variables to 0 or 1. In addition, most of the features have a “yes” or “no” element. Some features have three elements and the third element is “No phone service”, which is the same meaning as “no”. Therefore, I decided to delete those columns containing “no”, because the columns with “yes” have both 0 and 1. Before the model development, I scaled each column (feature) to standard normal distribution on each machine learning model, which means that I made all data between 0 to 1. I did this step because we tried to make our model more precise.

Feature engineering for customer lifetime value

To predict the customer lifetime value (predict the cost of false-negative). The train data consists of only customers who already left the telco company, since we can only know tenure value for customers who left the telco.

So, I added new column named "life_time_value" ("tenure" * "MonthlyCharges"). Noticed that we deleted "Churn_Yes" and "tenure" to build the model. We were deleting "Churn_Yes" column in the dataset, since we need to use this dataset to predict any data (Churn_Yes = 0 or Churn_Yes = 1); however if we only use the data that "Churn_Yes" equals to 1, the model won't be able to predict lifetime value if the dataset also includes that "Churn_Yes" equals to 0. We were deleting "tenure" because when we want to predict new data through the model, we don't know the tenure for those customers who are staying with the company. The tenure for those customers is the time length staying with the company up to now because they are not leaving the company.

Models Development

•

Built and compared the classification models for predicting the potential customer churn, including Logistic Regression, Decision Tree, KNN Classifier, and Random Forest.

•

Used GridSearch with 5 folds cross-validation to find the optimal hyperparameters To minimize the overfitting, we used the technique of GridSearch to tune the optimal hyperparameters for all models.

Conclusion

1. Customer churn prediction

Discovered Random Forest performed best with the highest AUC score of 86% among all models.

Partial dependence plot for Random Forest

We can see the variables of "Contract_Month-to-month", "InternetService_DSL", and "PaymentMethod_Electronic check" have the strong positive linear relationship with churn. While the ‘tenure’ has negative linear relationship against churn.

Confusion Matrix

A confusion matrix is a breakdown of predictions into a table showing correct predictions (the diagonal) and the types of incorrect predictions made (what classes incorrect predictions were assigned).

There are 108 cases for false-positive and 155 cases for false-negative. Total errors for type 1 and type 2 are 263 out of 1,407 samples.

For false-positive cases, the model predicted that customers would leave the company, but actually they are not. The cost for false positive is that the company may be spending campaign fees for the customers who are not leaving the company. That is a waste of money.

For false negative cases, the model predicted that customers would not leave the company, but actually they are. The cost of false negative is that the company will lose customers. It depends on business strategies to choose which one to minimize. For example, we can compare the cost of campaign and the cost of losing customers.

Since we don't know the campaign fees in this dataset, we are not going to analyze this part. However, we can do the prediction on customer life-time value to know what the customer value for the company for each customer based on their tenure and total monthly charges.

2. Customer lifetime value prediction

The result shows that Mean Square Error for Ridge Regression is the lowest, which is 3,378. Thus, Ridge Regression is the best model to predict customer lifetime value.

python

machine learning