Thursday, October 16, 2008

Market Basket Analysis

Market Basket Analysis (MBA) is the process of analyzing transactional level data to determine the likelihood that a set of items/products will be bought together.

Retailers use the results/observations from an MBA to understand the purchase behaviour of customers for cross-selling, store design, discount plans and promotions. MBA can, and should be done across different branches/stores as the customer demographics/profiles and their purchase behavior usually varies across regions.

The most common technique used in MBA is Association Rules. The three measures of Association Rules are - Support, Confidence, and Lift

A --> B = if a customer buys A, then B is also purchased
Condition --> Result
Antecedent --> Consequent

Support: Ratio of the # of transactions that includes both A & B to the total number of all transactions

Confidence: Ratio of the # of transactions with all items in the rule (A + B) to the # of transactions with items in the condition (A )

Lift: Indicates how much better the rule is at predicting the “result” or “consequent” as compared to having no rule at all, or how much better the rule does rather than just guessing

Lift = Confidence/P(result) = [P (A+B)/P(A)]/P(B)

If a customer buys milk, what is the likelihood of orange juice being purchased?

Milk --> Orange Juice
Customer Base: 1000
600 customers buy milk
400 customers buy orange juice
300 customers buy milk & orange juice

Support = P(milk & orange juice)/1000 = 300/1000 = 0.3

Confidence = P(milk & orange juice)/P(milk) = (300/1000)/(600/1000) = 0.5

Lift = Confidence/P(result) = 0.5/(400/1000) = 1.25

Interpretation: A customer who purchases milk is 1.25 times likely to purchase orange juice, than a randomly chosen customer.

Actionable: rules that can be justified and lead to actionable information
Trivial: rules that are obvious or already known (because of past/existing promotions, mandatory/required purchase of a stabilizer with an air conditioner…)
Inexplicable: rules that have no explanation and no course of action

Transactional data characterized by multiple rows per customer or order is the norm for MBA.

1. Choose the right set of items/level of detail – items, product category, brands…?
2. Generate rules - one-way rules (2 items, A-->B), 2-way rules (3 items, A & B --> C)…?
3. Limit the orders/items in the analysis by
- considering only orders having at least as many items as are in the rule
- requiring a minimum support for the rule
- removing the largest orders having multiple items/products

MBA doesn’t refer to a single technique but a set of business problems related to understanding of POS transaction data. The most popular of these techniques happens to be Association Rules.


Anonymous said...


Ricardo Guerrero González said...

Thanks for the info provided.
One question: Which software do you think is best to do MBA and which one as freeware?

Romakanta said...

I have used SPSS's Clementine for MBA, and it's awesome, especially the visualizations.

As far as the support, confidence, and lift calculations are concerned, you can do that using SQL.

Ricardo Guerrero González said...

Thank you for you response.
MBA requires to have a binary sets of fields (yes / no) were fields are each product and cases are the shopping transactions. My problem now is the following:
I have the cases as each transaction but the fields i have are categorical variables in an SPSS dataset were the categories are all the types of bought products. How ca I convert those fields to a binary?

Romakanta said...

Have u decided on the depth you want? Item level, brand level, category level?

I am not aware of a single node or function for doing what u require in Clementine. But u can do this by recoding.

If you take a single field that has, say, 5 product categories, you will need 5 columns/fields with 0/1 or Yes/No to indicate the presence/absence of that particular product category. So the code will be something like:

if old_field=product1, then new_field1=Yes
else new_field1=No

if old_field=product2, then new_field2=Yes
else new_field2=No'

..and so on.

Ricardo Guerrero González said...

Dear Romakata:
Thank you very much for your suggestion. You are right, I am doing a recode in SPSS and then i'll work on the Clementine.

First i'm am doing a category analysis(61 items), since at a product level there are around 11.000 items.

Helen F said...

Hey Guys,
have you considered using the 'Set to Flag' node in Clementine? It's a simpler way of achieving the recode you discussed above.
Helen F.