Getting a Measure of Measures of Selective Exposure
1. Ideal Estimand
For person $i$ with ideological location $\theta_i$, each consumed item $k$ has ideology score $x_{ik}$ on a common scale.
Ideal data: Time-stamped draws ${x_{ik}}_{k=1}^{n_i}$
Distance measure: $$d_{ik} = x_{ik} - \theta_i$$
This yields the full distribution $F_i(d)$ with moments:
- Mean: $\bar{d}i = \frac{1}{n_i}\sum_k d{ik}$
- Variance: $\sigma^2_{d_i} = \frac{1}{n_i}\sum_k (d_{ik} - \bar{d}_i)^2$
- Skewness: $\gamma_{d_i} = \frac{1}{n_i\sigma^3_{d_i}}\sum_k (d_{ik} - \bar{d}_i)^3$
2. Feasible Estimand
Item-level ideology rarely observed → collapse to binary side match.
Binary indicator: $$g_{ik} = \mathbf{1}{\text{sgn}(x_{ik} - \theta_c) = \text{sgn}(\theta_i - \theta_c)}$$
where $\theta_c$ is the ideological center/cutpoint.
Congenial share: For political items set $K_i$: $$S_i^{\text{con}} = \frac{1}{|K_i|} \sum_{k \in K_i} g_{ik}$$
Uncongenial share: $$S_i^{\text{un}} = 1 - S_i^{\text{con}}$$
Normalized difference index: $$\Delta_i = S_i^{\text{con}} - S_i^{\text{un}} = 2S_i^{\text{con}} - 1 \in [-1, 1]$$
Properties:
- $\Delta_i = 1$: Perfect echo chamber
- $\Delta_i = 0$: Balanced exposure
- $\Delta_i = -1$: Complete cross-cutting
3. Measurement Biases
A. Denominator Definition
Political items set $K_i$ requires explicit specification:
$$K_i = {k : f(c_k) = \text{"political"}}$$
where $f(\cdot)$ is the classification function and $c_k$ is item content.
Problem: $|K_i|$ sensitivity → first-order effects on $S_i^{\text{con}}$
B. Source-Level vs. Item-Level Coding
Source-level assumption: $$x_{ik} = \bar{x}_{\text{source}(k)} \quad \forall k \in \text{source}$$
Reality with within-source selection: $$x_{ik} \sim F_{\text{source}}(x | \theta_i)$$
where selection depends on user ideology.
Bias Formalization
Let $p_{is}(\theta_i)$ be the probability user $i$ selects item from source $s$ given their ideology.
Source-level estimate: $$\hat{S}i^{\text{con,source}} = \sum_s \frac{n{is}}{n_i} \cdot \mathbf{1}{\bar{x}_s \text{ congenial to } \theta_i}$$
Item-level truth: $$S_i^{\text{con,item}} = \sum_s \sum_{k \in s} \frac{1}{n_i} \cdot \mathbf{1}{x_{ik} \text{ congenial to } \theta_i}$$
Bias: $$\text{Bias} = \mathbb{E}[\hat{S}_i^{\text{con,source}} - S_i^{\text{con,item}}] < 0$$
C. Toy Example
Setup:
- User views: 10 Fox, 10 CNN
- Fox items: 7R, 3D (user selects 7R, 3D)
- CNN items: 5R, 5D (user selects 5R, 5D)
Source-level coding: $$\hat{S}^{\text{right}} = \frac{10}{20} = 0.50$$
Item-level reality: $$S^{\text{right}} = \frac{7 + 5}{20} = 0.60$$
Bias magnitude: $0.50 - 0.60 = -0.10$ (20% relative understatement)
4. Survey Measurement Issues
Restricted Choice Sets
Experiment offers set $\mathcal{C} = {c_1, ..., c_m}$ where $m \ll$ universe of options.
Measured: $$\Pr(c_j | \mathcal{C}, \theta_i)$$
Needed: $$\Pr(c_j | \mathcal{U}, \theta_i)$$
where $\mathcal{U}$ is the full universe. Generally: $\Pr(c_j | \mathcal{C}) \neq \Pr(c_j | \mathcal{U})$
Expressive Responding
Self-reported consumption $\tilde{x}_i$ conflates: $$\tilde{x}_i = \alpha \cdot x_i^{\text{true}} + (1-\alpha) \cdot x_i^{\text{identity}}$$
where $\alpha \in [0,1]$ and $x_i^{\text{identity}}$ is the ideologically expressive response.