Mining market basket data to discover localized associations
The study of cross-category effects has been recently gaining attention in the marketing research area. Several statistical models have been proposed (Manchada, Ansari et al. 1999; Russell and Petersen 2000) that attempt to measure these effects in market basket data. Despite their useful- ness, the computational cost of these models makes them somewhat limited in the context of very large databases (VLDB). A very large database has not only a large number of data records, but also a large number of variables. Exploratory data mining techniques such as association rule mining can be used to discover product/category associations in VLDB. In association mining, the association between two products, A and B is assessed by three meas- ures, support, confidence, and lift (Agrawal, Imielinski et al. 1993; Brin, Motwani et al. 1997). The support is the co-occurrence frequency, but usually represented by their percentage, P(AB). The confidence is a conditional probability, P(A|B). The lift is P(AB)/P(A)P(B), which measures the strength of the association. An association rule is a statement like if a customer buys product A, he will also buy product B with a probability c%. Such rules can be used to coordinate marketing activities across products/categories in the form of cross-selling recommendations that are made based on the overall data set, which is composed of transactions from different customer segments. Product/category associations, however, may differ across such attributes as customer demographics and psychographics. When the aggregate data set is divided into multiple subsets based on different customer segments, the support, confidence, and lift may vary quite significantly across these subsets. As such, the retailer may benefit from targeting cross-selling promotions between specific product pairs to specific customer segments. This paper proposes an approach to identify such paired product associations in customer segments.