Personalization strategies often build on a large set of customer-specific and contextual features to optimally select among the available marketing actions. Contextual multi-armed bandit algorithms can help marketers to adaptively select optimal personalized actions. However, conventional contextual bandit algorithms are not well-suited for use with a large number of features, a common characteristic of many real-world problems. Exploration is beneficial to identify relevant features, yet, when faced with high-dimensional features, learning the impact of each feature can lead to over-exploration and thus inefficiency. To address this challenge, it becomes crucial to leverage an adaptive modeling approach to support the exploration process and to effectively resolve the uncertainty in feature importance. We propose a new approach using variable selection techniques to learn both the optimal model specification and the action-selection strategy. We further enhance model interpretability via feature decomposition, to effectively identify both relevant and irrelevant features. Among relevant features, we discern between two types: common features, which have the same influence on consumer behavior for all actions, and hence do not impact the personalized policy, and action-specific features, whose impact differs across the possible actions and hence do affect the policy. Our method allows firms to run cost-efficient and interpretable bandit algorithms with high-dimensional contextual data.