PART 1: Utilizing Python to conduct EDA -- understanding who the customers are and why they churned
Understanding why people churn is a normal response; simply accepting it is not.
Competition in today’s telecommunications industry is tougher than ever, with the rise of new and improved products and services that can easily sway customers from one provider to another. In such a landscape, even minor lapses in customer satisfaction can lead to significant churn—where customers switch to a competitor, seeking better value, service, or experience.
Churn isn’t just a metric to be monitored; it’s a crucial indicator of underlying issues that, if left unaddressed, can erode a company’s market share and profitability. This learning oriented project focuses on two things:
The data was sourced from Kaggle, specifically from IBM’s Sample Data Sets. Although fictional, this dataset serves as an excellent sandbox for showcasing exploratory data analysis (EDA) skills and machine learning modeling. The schema is detailed below:
| Column | Data Type | Values |
|---|---|---|
| customerID | object | 7590-VHVEG (sample) |
| gender | object | [Female, Male] |
| SeniorCitizen | int64 | [0, 1] |
| Partner | object | [Yes, No] |
| Dependents | object | [No, Yes] |
| tenure | int64 | 1 (sample) |
| PhoneService | object | [No, Yes] |
| MultipleLines | object | [No phone service, No, Yes] |
| InternetService | object | [DSL, Fiber optic, No] |
| OnlineSecurity | object | [No, Yes, No internet service] |
| OnlineBackup | object | [Yes, No, No internet service] |
| DeviceProtection | object | [No, Yes, No internet service] |
| TechSupport | object | [No, Yes, No internet service] |
| StreamingTV | object | [No, Yes, No internet service] |
| StreamingMovies | object | [No, Yes, No internet service] |
| Contract | object | [Month-to-month, One year, Two year] |
| PaperlessBilling | object | [Yes, No] |
| PaymentMethod | object | [Electronic check, Mailed check, Bank transfer] |
| MonthlyCharges | float64 | 29.85 (sample) |
| TotalCharges | object | 29.85 (sample) |
| Churn | object | [No, Yes] |
Prior to data transformations, it is critical to check the health of our source data. For this, we are to follow the standard Data Quality Dimensions
| Dimension | Code | Assessment |
|---|---|---|
| Completeness | df.count() | - There are 7043 entries and there are no signs of NULL values |
| Uniqueness | df.drop_duplicates()df.count() | - There are no instances of duplicate entries |
| Consistency | 1)id_pattern = r"^\d{4}-[A-Z]{4}$match = df["customerID"].str.match(id_pattern)match.count()2) df[<col>].unique() | - IDs: All ID instance follows a consistent pattern: 4 digits, dash, 4 characters - Other Categorical Variables: The categories are distinct (i.e., no overlaps) and consistent |
| Validity | df.dtypes | - There is a slight hiccup on the data type of monthly charges. A conversion to float is necessary |
Full changes are shown in my notebook: Link to Code
This initial exploration is dedicated to gaining a foundational understanding of the customer base. By examining demographic characteristics and fundamental preferences, we aim to know who the customers are, their behavior, and what drives their engagement with the service. This analysis sets the stage for deeper dives into more complex relationships and trends within the data.
As shown in the last slide, the churn rate among customers is approaching 30%, a figure that signals potential issues within the company’s service or customer retention strategies. While the scope of the data covers only the churning from the previous month, it can be highly problematic if not addressed.
Now that we have some idea on the basic facts about who the customers are, let’s focus on the possible reasons for their churn… Why did they churn?
| Older customers struggle with adapting to new services | |
|---|---|
![]() | Older customers exhibit a higher churn rate of 41% compared to younger customers. The older customer segment is financially sensitive, relying on fixed income or pensions, which makes them hesitant to invest in new services. Additionally, adaptability to new technology can be a barrier due to limited tech familiarity. |
| Independence leads to greater flexibility in service usage | |
|---|---|
| Customers who are independent tend to churn at a rate of 33%, which is higher compared to those with family responsibilities. Dependency refers to the reliance on a particular service for daily, personal needs or business operations, creating a need for stability and consistency. Customers without such dependencies are more flexible in choosing services and can more easily perceive and adapt to subtle differences between options. | ![]() |
| Combined cost may burden finances, causing dissatisfaction | |
|---|---|
![]() | Thirty-three percent (33%) of customers who used both internet and phone services have churned. Sometimes, bundling may not meet the needs or financial capabilities of all customers. In this case, the average monthly cost for customers who have churned is $82.25, regardless of the additional services they may have had. This can be a potential financial burden for those who are economically struggling or living independently. |
| Monthly billing can make costs seem higher psychologically | |
|---|---|
| Customers who are in monthly contracts with the service tend to churn more at a rate of 43% - Regular payments may lead customers to reassess the value of the service relative to the cost. Over time, the cumulative effect might amplify the perception of financial burden, even if the total amount paid annually is comparable to or less than other billing models. | ![]() |
| Dissatisfaction with the quality of electronic transactions | |
|---|---|
![]() | Customers who pay online will more likely churn at a rate of 45% compared to those who use other payment methods. The lack of support during downtimes or technical difficulties often leads to frustration. Additionally, customers who are more tech-savvy and use online payment methods typically have higher expectations for high-quality digital interactions. When support does not meet these expectations, it can exacerbate dissatisfaction and contribute to customer churn. |
| Fiber optic users experience high churn due to cost expectations | |
|---|---|
| Fiber-optic users tend to churn more at a rate of 42%. It is surprising to see that fiber optic users tend to churn more than DSL users, given that fiber optics offer more network advantages. This suggests potential issues with both pricing and service quality. The premium cost of fiber optic services can be a barrier for many customers, especially if they perceive the value to be insufficient. | ![]() |
| With internet add-ons, streaming services can drive customers to churn | |
|---|---|
![]() | Generally, among all the possible internet service add-ons, streaming has the highest churn rates. Analysis shows that customers who only stream movies have a churn rate of 65%, while those who only stream TV have a rate of 61%, and those who stream both have a rate of 60%. High churn rates in streaming services, whether for movies or TV content, often stem from the selection of media. Customers may find themselves dissatisfied if the content library does not meet their expectations or if it lacks variety and freshness. This can lead to a perception of limited value, prompting users to switch to other services that better align with their viewing preferences. |
ON THE NEXT:
- PART 2 WILL COVER THE DEVELOPMENT OF A CLASSIFICATION MODEL, AND
- PART 3 WILL DISCUSS STRATEGIES ON HOW THE COMPANY SHOULD RESPOND TO THE CHURN