Sometimes you get the feeling that everyone around you is so confused or just don't know about things which are basic and essential in Analytics. Below is a list of the most common terms that a majority thinks they know but don't.
1. Linear/Pearson Correlation: The most misunderstood term as far as i know. Before doing anything else, check if the 2 variables share a linear relation. Correlation values without a linear pattern is meaningless. And also be aware that in many softwares (including MS Excel), the default is pearson correlation, for which a linear relation between the two variables is a requirement.
2. Significance Test: Many many people into Analytics (?) will never ever understand this or will never try to understand this. Just because you see 2 groups doesn't mean that you can do a significance test. Know something or everything about sampling and designs before talking about significance test.
3. Lift and Cumulative Gains Charts: They are different, period. Don't confuse one with another.
Lift - Without a model, we get 30% of the responders by contacting 30% of the customers. Using a model, we get 60% of responders. The lift is 60/30 = 2 times.
Cumulative Gains - Using the model, if we contact 30% of the customers we get 60% of all responders.
4. Clustering/Segmentation and Profiling: Let's make this simple. Clustering/Segmenting will answer - Can my customer base be broken up into distinct groups based on certain attributes/characteristics? Customers within a group will be very similar to one another while customers across groups will be different.
Profiling will answer - Who are my best customers? What do they purchase? How often? What is their ethnicity, their household size and income, etc.? In many cases, profiling usually follows clustering/segmentation. Who are the customers in Group 1?
Signing off with:
"There must be some kind of way out of here,"
Said the joker to the thief
"There's too much confusion,
I can get no relief"
-- All along the watchtower by Jimi Hendrix
Subscribe to:
Post Comments (Atom)
6 comments:
"All Along the Watchtower" was written by Bob Dylan . . . If you want to beat people up for their mistakes regarding terms like "correlation," you need to get it ALL right!
thanks john, but i know that. i was listening to the hendrix version when i wrote this, i like it better and mentioned that.
and something that might interest you. Dylan said: "I liked Jimi Hendrix's record of this and ever since he died I've been doing it that way...Strange how when I sing it, I always feel it's a tribute to him in some kind of way."
was browsing your blog. you are into software/technology, what about analytics or data mining?
Saying only linear relationships matter - or only linear correlations matter is just flat out wrong. Pearson Correlation is linear yes, but Spearman Rank is not - for example. A quadratic relationship between two variable is very useful - as is a cubic, an exponential, a logistic etc. etc.
I was talking about linear/pearson correlation. Lots of people just use proc corr (default is pearson) and/or proc reg in SAS, or the correlation function in MS Excel and then just infers anything from the correlation value. They rarely or never checked the relation between the 2 variables before using these functions.
Thanks for pointing that out Brian. I have made a few changes to make it more clear.
Thanks for this very interesting points. It's always good to get such reminder!
Correlation and dependence are any of a broad class of statistical relationships between two or more random variables or observed data values.
perason correlation
Post a Comment