TelCo Churn Analysis: Exploratory Data Analysis


PART 1: Utilizing Python to conduct EDA -- understanding who the customers are and why they churned


Introduction

Understanding why people churn is a normal response; simply accepting it is not.

Competition in today’s telecommunications industry is tougher than ever, with the rise of new and improved products and services that can easily sway customers from one provider to another. In such a landscape, even minor lapses in customer satisfaction can lead to significant churn—where customers switch to a competitor, seeking better value, service, or experience.

Churn isn’t just a metric to be monitored; it’s a crucial indicator of underlying issues that, if left unaddressed, can erode a company’s market share and profitability. This learning oriented project focuses on two things:

  1. Exploring intricate patterns of customer behavior aiming to identify the key drivers of churn.
  2. Proposing strategic solutions that leverage the insights gained from data analysis to reduce churn, improve customer retention, and ultimately, enhance profitability.

Data Profiling

The data was sourced from Kaggle, specifically from IBM’s Sample Data Sets. Although fictional, this dataset serves as an excellent sandbox for showcasing exploratory data analysis (EDA) skills and machine learning modeling. The schema is detailed below:

ColumnData TypeValues
customerIDobject7590-VHVEG (sample)
genderobject[Female, Male]
SeniorCitizenint64[0, 1]
Partnerobject[Yes, No]
Dependentsobject[No, Yes]
tenureint641 (sample)
PhoneServiceobject[No, Yes]
MultipleLinesobject[No phone service, No, Yes]
InternetServiceobject[DSL, Fiber optic, No]
OnlineSecurityobject[No, Yes, No internet service]
OnlineBackupobject[Yes, No, No internet service]
DeviceProtectionobject[No, Yes, No internet service]
TechSupportobject[No, Yes, No internet service]
StreamingTVobject[No, Yes, No internet service]
StreamingMoviesobject[No, Yes, No internet service]
Contractobject[Month-to-month, One year, Two year]
PaperlessBillingobject[Yes, No]
PaymentMethodobject[Electronic check, Mailed check, Bank transfer]
MonthlyChargesfloat6429.85 (sample)
TotalChargesobject29.85 (sample)
Churnobject[No, Yes]

Data Quality Assessment

Prior to data transformations, it is critical to check the health of our source data. For this, we are to follow the standard Data Quality Dimensions

DimensionCodeAssessment
Completenessdf.count()- There are 7043 entries and there are no signs of NULL values
Uniquenessdf.drop_duplicates()
df.count()
- There are no instances of duplicate entries
Consistency1)
id_pattern = r"^\d{4}-[A-Z]{4}$
match = df["customerID"].str.match(id_pattern)
match.count()

2)
df[<col>].unique()
- IDs: All ID instance follows a consistent pattern: 4 digits, dash, 4 characters
- Other Categorical Variables: The categories are distinct (i.e., no overlaps) and consistent
Validitydf.dtypes- There is a slight hiccup on the data type of monthly charges. A conversion to float is necessary

Data Transformations

Full changes are shown in my notebook: Link to Code

Surface-level EDA: The Basic Facts

This initial exploration is dedicated to gaining a foundational understanding of the customer base. By examining demographic characteristics and fundamental preferences, we aim to know who the customers are, their behavior, and what drives their engagement with the service. This analysis sets the stage for deeper dives into more complex relationships and trends within the data.

Plausible Reasons for Churning

As shown in the last slide, the churn rate among customers is approaching 30%, a figure that signals potential issues within the company’s service or customer retention strategies. While the scope of the data covers only the churning from the previous month, it can be highly problematic if not addressed.

Now that we have some idea on the basic facts about who the customers are, let’s focus on the possible reasons for their churn… Why did they churn?

Older customers struggle with adapting to new services 
Older customers exhibit a higher churn rate of 41% compared to younger customers.

The older customer segment is financially sensitive, relying on fixed income or pensions, which makes them hesitant to invest in new services. Additionally, adaptability to new technology can be a barrier due to limited tech familiarity.
 Independence leads to greater flexibility in service usage
Customers who are independent tend to churn at a rate of 33%, which is higher compared to those with family responsibilities.

Dependency refers to the reliance on a particular service for daily, personal needs or business operations, creating a need for stability and consistency. Customers without such dependencies are more flexible in choosing services and can more easily perceive and adapt to subtle differences between options.
Combined cost may burden finances, causing dissatisfaction 
Thirty-three percent (33%) of customers who used both internet and phone services have churned.

Sometimes, bundling may not meet the needs or financial capabilities of all customers. In this case, the average monthly cost for customers who have churned is $82.25, regardless of the additional services they may have had. This can be a potential financial burden for those who are economically struggling or living independently.
 Monthly billing can make costs seem higher psychologically
Customers who are in monthly contracts with the service tend to churn more at a rate of 43%

- Regular payments may lead customers to reassess the value of the service relative to the cost. Over time, the cumulative effect might amplify the perception of financial burden, even if the total amount paid annually is comparable to or less than other billing models.
Dissatisfaction with the quality of electronic transactions 
Customers who pay online will more likely churn at a rate of 45% compared to those who use other payment methods.

The lack of support during downtimes or technical difficulties often leads to frustration. Additionally, customers who are more tech-savvy and use online payment methods typically have higher expectations for high-quality digital interactions. When support does not meet these expectations, it can exacerbate dissatisfaction and contribute to customer churn.
 Fiber optic users experience high churn due to cost expectations
Fiber-optic users tend to churn more at a rate of 42%.

It is surprising to see that fiber optic users tend to churn more than DSL users, given that fiber optics offer more network advantages. This suggests potential issues with both pricing and service quality. The premium cost of fiber optic services can be a barrier for many customers, especially if they perceive the value to be insufficient.
With internet add-ons, streaming services can drive customers to churn 
Generally, among all the possible internet service add-ons, streaming has the highest churn rates. Analysis shows that customers who only stream movies have a churn rate of 65%, while those who only stream TV have a rate of 61%, and those who stream both have a rate of 60%.

High churn rates in streaming services, whether for movies or TV content, often stem from the selection of media. Customers may find themselves dissatisfied if the content library does not meet their expectations or if it lacks variety and freshness. This can lead to a perception of limited value, prompting users to switch to other services that better align with their viewing preferences.

ON THE NEXT:

  • PART 2 WILL COVER THE DEVELOPMENT OF A CLASSIFICATION MODEL, AND
  • PART 3 WILL DISCUSS STRATEGIES ON HOW THE COMPANY SHOULD RESPOND TO THE CHURN