Thursday, February 19, 2009

Two Step Cluster - Customer Segmentation in Telecom

I love Cluster Analysis because unlike a lot of other techniques, I don’t have to make any assumptions about the underlying distribution of the data. Though there are a few assumptions for best performance, it’s perfectly okay to cluster data that may not meet these assumptions. Only the business requirements/goals can determine whether the clusters/segments are useful or the solution is satisfactory.

Customer Segmentation is the process of splitting a customer database into distinct, meaningful, and homogenous groups based on specific parameters or attributes. At a macro level, the main objective for customer segmentation is to understand the customer base, monitor and understand changes over time, and to support critical strategies and functions such as CRM, Loyalty programs, and product development.

At a micro level, the goal is to support specific campaigns, commercial policies, cross-selling & up-selling activities, and analyze/manage churn & loyalty

SPSS has three different procedures that can be used to cluster data: hierarchical cluster analysis, k-means cluster, and two-step cluster. The two-step cluster is appropriate for large datasets or datasets that have a mixture of continuous and categorical variables. It requires only one pass of data (which is important for very large data files).

The first step - Formation of Preclusters
Preclusters are just clusters of the original cases that are used in place of the raw data to reduce the size of the matrix that contains distances between all possible pairs of cases. When preclustering is complete, all cases in the same precluster are treated as a single entity. The size of the distance matrix is no longer dependent on the number of cases but on the number of preclusters. These preclusters are then used in hierarchical clustering.


The second step - Hierarchical Clustering of Preclusters
In the second step, the standard hierarchical clustering algorithm is used on the preclusters.


The dataset I am going to use has information on 75 attributes for more than 70,000 customers. Product/service usage variables for all customers in the dataset are averages calculated over a period of four months.

In SPSS Clementine, the Data Audit available under the Output nodes palette gives the basic/descriptive statistics (mean, min, max...) and the quality (outliers, missing values...) of the variables.


Out of the 75 variables in the dataset, I used about 15 original variables and 3 new derived variables after considering their quality and business relevance. These selected variables were a combination of demographic, billing, and usage information.


The two-step cluster analysis produced 3 clusters. A very interesting difference was observed between Clusters 1 and 2.


Customers in Cluster 2 display the following characteristics:
- few of them are married
- few of them have children
- few of them have a credit card
- owns the most expensive mobile set

- maximum # of incoming & outgoing calls
- maximum # of roaming calls
- maximum MOU (minutes of usage)
- maximum # of active subscriptions
- maximum recurring charge (or, subscribes to the most expensive calling plan)
- maximum revenue

- maximum # of calls to customer care
- has the largest proportion of customers with low credit rating


Customers in Cluster 1 display characteristics that were exactly the opposite in ALMOST all of the areas mentioned above. So we have these customers who are married with children, posses a credit card, own a cheap mobile set, subscribe to the least expensive calling plan, make the minimum # of calls (incoming, outgoing, roaming & customer care), and has the highest credit rating.

Customers in Cluster 3 follow the middle path (in almost all the attributes) and offered no interesting or meaningful insights.

So what can be the business application of this exercise?
To put it simply, cluster analysis has thrown up two very distinct groups of customers – highly profitable but high risk customers in Cluster 2, and low profitable and low risk customers in Cluster 1.


For the highly profitable but high risk customers, one or more of the following actions can be implemented:
- Enhance credit risk monitoring
- Establish stringent usage thresholds
- Educate customers about alternative payment options, or make CC a mandatory payment method
- Migrate to pre-paid plans


For the low profitable and low risk customers, usage stimulation campaigns can be attempted with or without further segmentation.

This is one of the most basic examples of customer segmentation. If we consider traffic analysis information by taking ratios of certain call/service usage parameters, we can identify customer groups who have increased or decreased their usage. If we consider customer tenure, we can have an understanding of customer loyalty. Accordingly, specific actions can be taken for these groups.

Tuesday, February 3, 2009

The Stakeholders

According to the Encarta dictionary a stakeholder is a person or group with a direct interest, involvement, or investment in something.

The most important task faced by a Data Miner is to understand the client’s business background and arrive at the business and data mining objectives by asking the relevant, right questions to the right people. And the right people here are the so-called stakeholders; and identifying them makes the job half done!

According to Dorian Pyle, these stakeholders can be divided into five groups:

1. Need Stakeholders – People who actually experience the business problem regularly, in their work. In most situations, they have developed intuitive ideas about what is causing the problem, what is the solution, and how it should be applied. They often expressed their needs as an expected/desired solution, and not as a description of the problem.

2. Money Stakeholders – People who will commit the resources that allow the project to move forward. The business case document written to support modeling/the data mining project is mainly addressed to these people. It is usually not possible for this stakeholder to say “yes” to a project – that is the prerogative of the decision stakeholder - but they can easily say “no” if the numbers aren’t convincing.

3. Decision Stakeholders – People who make the decision of whether to execute the project. Someone very important but difficult to identify as this person is not directly involved with the data miner but relies instead on input from people who have interacted with the data miner.

4. Beneficiary Stakeholders – People who will get the benefit of the results of the data mining project/model; people who will be directly affected. They usually have the ability to promote the success or bring about the failure of many data mining projects.

5. Kudos Stakeholders – People who have sold the project internally. Credit for the project’s success will accrue to them, so will the negative impact of a less than successful project. Very important to understand from these people what it is that determines success, and how the project result will be evaluated.