Thursday, March 19, 2009

Software Dependence & Model Accuracy

I work a lot with the Data Mining/Analytics business development team at my current company. My primary role is to be there during client presentations/conferences and answer the client’s queries on modeling techniques, and the USP of our approach related to model performance and/or business benefits.

During one of these interactions, we found out that a particular client is using THREE Data Mining softwares. Not statistical softwares or the base versions, but the complete, very expensive Data Mining softwares – SAS EM, SPSS Clementine and KXEN.

I was like, “Wow!!! But do you really need 3 Data Mining softwares???” Our initial questions and the client’s answers confirmed that inconsistent data formats was not the reason as the client already has a BI/DW system. Their reason? Well, they have the opinion that some algorithms/techniques in a particular DM software is much better and accurate than the same algorithms/techniques in another DM software.

I was, and I am, not convinced. Unless a particular DM software has a totally different and new algorithm for which you can’t obviously make a comparison, I haven’t come across or heard of any stark differences among model performances and results for the same algorithms offered by the reputed DM softwares. Data Mining solutions and the subsequent business benefits are not solely driven by model accuracy, a lot depends on how you interpret and apply the model’s results too.

What’s your opinion on this?


On a slightly different but related note, I learned of an interesting case from Rob Mattison’s webcast on Telco Churn Management available on the SAS website. He mentioned an incident where a client’s existing churn model was giving an impressive “above 90%” accuracy. Feeling something amiss, he went and talked with the Marketing people and found out that they were sending the same communication (sent at the time of acquisition) to the list of customers identified by the model as the most likely churners.

The result? The already unsatisfied customers who were thinking of switching got an inappropriate message/treatment, got further irritated and eventually left. In other words, all customers identified as likely churners by the model were encouraged to leave thereby shooting up the model accuracy!!!

If you have come across such cases, please share them with me in your comments:-)