Robust Estimation and Inference for Categorical Data
-
Series
-
Speaker(s)Max Welz (University of Zurich, Switzerland)
-
FieldEconometrics, Data Science and Econometrics
-
LocationErasmus University Rotterdam, Campus Woudestein, ET-14
Rotterdam -
Date and time
February 04, 2025
11:30 - 12:30
Abstract:
While there is a rich literature on robust methodologies for contamination in continuously distributed data, contamination in categorical data is largely overlooked. This gap in the statistics literature is unfortunate because many datasets contain categorical variables that can suffer from contamination just like continuous variables. Examples include inattentive responding and bot responses in questionnaires, data entry errors, or zero-inflated count data. We propose a novel class of contamination-robust estimators of models for categorical data, termed C-estimators ("C" for categorical). C-estimators generalize maximum likelihood estimation and are shown to be consistent, asymptotically Gaussian, and fully efficient in the absence of contamination, where the latter property contrasts with classic robustness theory for continuous data. In addition, we propose a general notion of outlyingness for categorical data and a measure thereof. We verify the attractive statistical properties of the proposed methodology in simulation studies. Furthermore, we demonstrate its practical usefulness in an empirical application on correlation estimation in questionnaire responses, where we find evidence for inattentive responding. Moreover, we provide a free open-source R package implementing C-estimators and offering rich methods for printing and plotting.