skip to main |
skip to sidebar
###
So you thought...?

Sometimes you get the feeling that everyone around you is so confused or just don't know about things which are basic and essential in Analytics. Below is a list of the most common terms that a majority thinks they know but don't.1. **Linear/Pearson** **Correlation**: The most misunderstood term as far as i know. Before doing anything else, check if the 2 variables share a linear relation. Correlation values without a linear pattern is meaningless. And also be aware that in many softwares (including MS Excel), the default is pearson correlation, for which a linear relation between the two variables is a requirement.2. **Significance Test**: Many many people *into* Analytics (?) will never ever understand this or will never try to understand this. Just because you see 2 groups doesn't mean that you can do a significance test. Know something or everything about sampling and designs before talking about significance test.3. **Lift and Cumulative Gains Charts**: They are different, period. Don't confuse one with another.Lift - Without a model, we get 30% of the responders by contacting 30% of the customers. Using a model, we get 60% of responders. The lift is 60/30 = 2 times.Cumulative Gains - Using the model, if we contact 30% of the customers we get 60% of all responders.4. **Clustering/Segmentation and Profiling**: Let's make this simple. Clustering/Segmenting will answer - Can my customer base be broken up into distinct groups based on certain attributes/characteristics? Customers within a group will be very similar to one another while customers across groups will be different.Profiling will answer - Who are my best customers? What do they purchase? How often? What is their ethnicity, their household size and income, etc.? In many cases, profiling usually follows clustering/segmentation. Who are the customers in Group 1?Signing off with:"There must be some kind of way out of here,"Said the joker to the thief"There's too much confusion,I can get no relief"-- All along the watchtower *by* Jimi Hendrix

## 6 comments:

"All Along the Watchtower" was written by Bob Dylan . . . If you want to beat people up for their mistakes regarding terms like "correlation," you need to get it ALL right!

thanks john, but i know that. i was listening to the hendrix version when i wrote this, i like it better and mentioned that.

and something that might interest you. Dylan said: "I liked Jimi Hendrix's record of this and ever since he died I've been doing it that way...Strange how when I sing it, I always feel it's a tribute to him in some kind of way."

was browsing your blog. you are into software/technology, what about analytics or data mining?

Saying only linear relationships matter - or only linear correlations matter is just flat out wrong. Pearson Correlation is linear yes, but Spearman Rank is not - for example. A quadratic relationship between two variable is very useful - as is a cubic, an exponential, a logistic etc. etc.

I was talking about linear/pearson correlation. Lots of people just use proc corr (default is pearson) and/or proc reg in SAS, or the correlation function in MS Excel and then just infers anything from the correlation value. They rarely or never checked the relation between the 2 variables before using these functions.

Thanks for pointing that out Brian. I have made a few changes to make it more clear.

Thanks for this very interesting points. It's always good to get such reminder!

Correlation and dependence are any of a broad class of statistical relationships between two or more random variables or observed data values.

perason correlation

Post a Comment