Against Proxy Variables
Lacking direct measures of the theoretical variable of interest, some rely on “proxy variables.” For instance, some have used years of education as a proxy for cognitive ability. However, using “proxy variables” can be problematic for the following reasons: (1) proxy variables may not track the theoretical variable of interest very well, (2) they may track other confounding variables outside the theoretical variable of interest. For instance, in the case of years of education as a proxy for cognitive ability, the concerns manifest themselves as follows:
- Cognitive ability causes, and is a consequence of, what courses you take, and what school you go to, in addition to, of course, years of education. GSS, for instance, contains more granular measures of education, such as whether the respondent took a science course in college. Almost always, the variable proves significant when predicting knowledge, etc. This is somewhat surmountable as it can be seen as a measurement error.
- More problematically, years of education may tally other confounding variables—diligence, education of parents, economic strata, etc. Education endows people with more than cognitive ability; it also causes potentially confounding variables such as civic engagement, knowledge, etc.
Conservatively, we can only attribute the effect of the variable to the variable itself. That is – we only have variables we enter. If one does rely on proxy variables, then one may want to address the two points mentioned above.