By Gaurav in Social Science — 24 Jul 2015

Getting a Measure of Measures of Selective Exposure

1. Ideal Estimand

For person $i$ with ideological location $\theta_i$, each consumed item $k$ has ideology score $x_{ik}$ on a common scale.

Ideal data: Time-stamped draws ${x_{ik}}_{k=1}^{n_i}$

Distance measure: $$d_{ik} = x_{ik} - \theta_i$$

This yields the full distribution $F_i(d)$ with moments:

Mean: $\bar{d}i = \frac{1}{n_i}\sum_k d{ik}$
Variance: $\sigma^2_{d_i} = \frac{1}{n_i}\sum_k (d_{ik} - \bar{d}_i)^2$
Skewness: $\gamma_{d_i} = \frac{1}{n_i\sigma^3_{d_i}}\sum_k (d_{ik} - \bar{d}_i)^3$

2. Feasible Estimand

Item-level ideology rarely observed → collapse to binary side match.

Binary indicator: $$g_{ik} = \mathbf{1}{\text{sgn}(x_{ik} - \theta_c) = \text{sgn}(\theta_i - \theta_c)}$$

where $\theta_c$ is the ideological center/cutpoint.

Congenial share: For political items set $K_i$: $$S_i^{\text{con}} = \frac{1}{|K_i|} \sum_{k \in K_i} g_{ik}$$

Uncongenial share: $$S_i^{\text{un}} = 1 - S_i^{\text{con}}$$

Normalized difference index: $$\Delta_i = S_i^{\text{con}} - S_i^{\text{un}} = 2S_i^{\text{con}} - 1 \in [-1, 1]$$

Properties:

$\Delta_i = 1$: Perfect echo chamber
$\Delta_i = 0$: Balanced exposure
$\Delta_i = -1$: Complete cross-cutting

3. Measurement Biases

A. Denominator Definition

Political items set $K_i$ requires explicit specification:

$$K_i = {k : f(c_k) = \text{"political"}}$$

where $f(\cdot)$ is the classification function and $c_k$ is item content.

Problem: $|K_i|$ sensitivity → first-order effects on $S_i^{\text{con}}$

B. Source-Level vs. Item-Level Coding

Source-level assumption: $$x_{ik} = \bar{x}_{\text{source}(k)} \quad \forall k \in \text{source}$$

Reality with within-source selection: $$x_{ik} \sim F_{\text{source}}(x | \theta_i)$$

where selection depends on user ideology.

Bias Formalization

Let $p_{is}(\theta_i)$ be the probability user $i$ selects item from source $s$ given their ideology.

Source-level estimate: $$\hat{S}i^{\text{con,source}} = \sum_s \frac{n{is}}{n_i} \cdot \mathbf{1}{\bar{x}_s \text{ congenial to } \theta_i}$$

Item-level truth: $$S_i^{\text{con,item}} = \sum_s \sum_{k \in s} \frac{1}{n_i} \cdot \mathbf{1}{x_{ik} \text{ congenial to } \theta_i}$$

Bias: $$\text{Bias} = \mathbb{E}[\hat{S}_i^{\text{con,source}} - S_i^{\text{con,item}}] < 0$$

C. Toy Example

Setup:

User views: 10 Fox, 10 CNN
Fox items: 7R, 3D (user selects 7R, 3D)
CNN items: 5R, 5D (user selects 5R, 5D)

Source-level coding: $$\hat{S}^{\text{right}} = \frac{10}{20} = 0.50$$

Item-level reality: $$S^{\text{right}} = \frac{7 + 5}{20} = 0.60$$

Bias magnitude: $0.50 - 0.60 = -0.10$ (20% relative understatement)

4. Survey Measurement Issues

Restricted Choice Sets

Experiment offers set $\mathcal{C} = {c_1, ..., c_m}$ where $m \ll$ universe of options.

Measured: $$\Pr(c_j | \mathcal{C}, \theta_i)$$

Needed: $$\Pr(c_j | \mathcal{U}, \theta_i)$$

where $\mathcal{U}$ is the full universe. Generally: $\Pr(c_j | \mathcal{C}) \neq \Pr(c_j | \mathcal{U})$

Expressive Responding

Self-reported consumption $\tilde{x}_i$ conflates: $$\tilde{x}_i = \alpha \cdot x_i^{\text{true}} + (1-\alpha) \cdot x_i^{\text{identity}}$$

where $\alpha \in [0,1]$ and $x_i^{\text{identity}}$ is the ideologically expressive response.

1. Ideal Estimand

2. Feasible Estimand

3. Measurement Biases

4. Survey Measurement Issues

Subscribe to Gojiberries