• Graduate program
  • Research
  • Summer School
  • Events
    • Summer School
      • Applied Public Policy Evaluation
      • Economics of Blockchain and Digital Currencies
      • Economics of Climate Change
      • Foundations of Machine Learning with Applications in Python
      • From preference to choice: The Economic Theory of Decision-Making
      • Gender in Society
      • Business Data Science Summer School Program
    • Events Calendar
    • Events Archive
    • Tinbergen Institute Lectures
    • 16th Tinbergen Institute Annual Conference
    • Annual Tinbergen Institute Conference
  • News
  • Alumni
  • Magazine
Home | Events | Robust Estimation and Inference for Categorical Data
Seminar

Robust Estimation and Inference for Categorical Data


  • Location
    Erasmus University Rotterdam, Campus Woudestein, ET-14
    Rotterdam
  • Date and time

    February 04, 2025
    11:30 - 12:30

Abstract:

While there is a rich literature on robust methodologies for contamination in continuously distributed data, contamination in categorical data is largely overlooked. This gap in the statistics literature is unfortunate because many datasets contain categorical variables that can suffer from contamination just like continuous variables. Examples include inattentive responding and bot responses in questionnaires, data entry errors, or zero-inflated count data. We propose a novel class of contamination-robust estimators of models for categorical data, termed C-estimators ("C" for categorical). C-estimators generalize maximum likelihood estimation and are shown to be consistent, asymptotically Gaussian, and fully efficient in the absence of contamination, where the latter property contrasts with classic robustness theory for continuous data. In addition, we propose a general notion of outlyingness for categorical data and a measure thereof. We verify the attractive statistical properties of the proposed methodology in simulation studies. Furthermore, we demonstrate its practical usefulness in an empirical application on correlation estimation in questionnaire responses, where we find evidence for inattentive responding. Moreover, we provide a free open-source R package implementing C-estimators and offering rich methods for printing and plotting.