Model-Learning Bandits for Personalization

Series

PhD Seminars
Field

Empirical Microeconomics

Location

Erasmus University Rotterdam, Mandeville Building, Room T3-01
Rotterdam
Date and time

September 20, 2023
13:00 - 14:00

Abstract

Personalization strategies often build on a large set of customer-specific and contextual features to optimally select among the available marketing actions. Contextual multi-armed bandit algorithms can help marketers to adaptively select optimal personalized actions. However, conventional contextual bandit algorithms are not well-suited for use with a large number of features, a common characteristic of many real-world problems. Exploration is beneficial to identify relevant features, yet, when faced with high-dimensional features, learning the impact of each feature can lead to over-exploration and thus inefficiency. To address this challenge, it becomes crucial to leverage an adaptive modeling approach to support the exploration process and to effectively resolve the uncertainty in feature importance. We propose a new approach using variable selection techniques to learn both the optimal model specification and the action-selection strategy. We further enhance model interpretability via feature decomposition, to effectively identify both relevant and irrelevant features. Among relevant features, we discern between two types: common features, which have the same influence on consumer behavior for all actions, and hence do not impact the personalized policy, and action-specific features, whose impact differs across the possible actions and hence do affect the policy. Our method allows firms to run cost-efficient and interpretable bandit algorithms with high-dimensional contextual data.