Behavioral Validity Checks for ML‑Based Coding
Social scientists increasingly use supervised ML and LLMs to scale content analysis—classifying texts for protest, policy issues, frames, sentiment, and more. The central question is construct validity: does the coder respond to evidence that defines the concept and ignore part of the evidence that is irrelevant? Standard evaluations (held‑