3. Methods

library(rankingQ)
data(identity)

library(dplyr)
library(ggplot2)
library(tidyr)

Estimating the Proportion of Random Responses

In the identity dataset, given anc_correct_identity, the raw proportion of those who have correctly or incorrectly answered the anchor question is as follows:

round(prop.table(table(identity$anc_correct_identity)) * 100, digits = 1)

## 
##    0    1 
## 30.3 69.7

The 69.7% seen here, however, is likely an upwardly biased estimate of the percentage of non-random responses, because we must account for respondents accidentally answering the question correctly. For an unbiased estimate of random responses, we use unbiased_correct_prop.

identity$random_identity <- case_when(
  identity$anc_correct_identity == 1 ~ 0,
  TRUE ~ 1
)

unbiased_correct_prop(
  sum(identity$random_identity == 0) / sum(!is.na(identity$random_identity)),
  J = 4
)

## [1] 0.6836776

The revised estimate of non-random responses is 68.4%. That is to say, roughly 31.6% of the respondents are randomly responding.

Direct Bias Correction via `imprr_direct`

rankingQ has two primary functions to perform bias correction. First, imprr_direct improves ranking data by applying direct bias correction to several classes of quantities of interest.

To apply the bias correction, we specify our dataset (data), the number of items (J), the columns holding the marginal ranks for the target ranking question (main_q), and the indicator for answering the anchor ranking question correctly (anc_correct). When survey weights are available, they can be included by specifying weight in the function.

## party, religion, gender, and race hold the marginal rank of each item

# Perform bias correction
out_direct <- imprr_direct(
  data = identity,
  ## Not strictly necessary: when `main_q` lists the ranking columns,
  ## J is inferred from their number
  J = 4,
  main_q = c("party", "religion", "gender", "race"),
  anc_correct = "anc_correct_identity",
  # setting to 10 only for our vignette
  n_bootstrap = 10
)

## No weight column supplied; using equal weights for all observations.

By default, imprr_direct assumes that the target population is a set of non-random respondents. When researchers wish to study the entire population as a target group, additional arguments must be specified, including population and assumption. For example, the uniform preference assumption can be specified as follows:

# Bias correction for the entire population with the uniform assumption

out_direct_uniform <- imprr_direct(
  data = identity,
  J = 4,
  main_q = c("party", "religion", "gender", "race"),
  anc_correct = "anc_correct_identity",
  population = "all",
  assumption = "uniform",
  n_bootstrap = 10
)

## population = 'all' with assumption = 'uniform' implies no correction; ignoring anc_correct and p_random.

## No weight column supplied; using equal weights for all observations.

Similarly, the contaminated sampling assumption can be specified as follows:

# Bias correction for the entire population with contaminated sampling

out_direct_contaminated <- imprr_direct(
  data = identity,
  J = 4,
  main_q = c("party", "religion", "gender", "race"),
  anc_correct = "anc_correct_identity",
  population = "all",
  assumption = "contaminated",
  n_bootstrap = 10
)

## No weight column supplied; using equal weights for all observations.

Results: Estimated Proportion of Random Responses

The first output of imprr_direct is the estimated proportion of random responses. The vector est_p_random returns the estimated proportion along with the lower and upper ends of its corresponding 95% confidence interval.

# Estimated proportion of random responses with a 95% CI
out_direct$est_p_random

##        mean     lower     upper
## 1 0.3158402 0.2875352 0.3451338

Results: Estimated Quantities of Interest

The other output is the bias-corrected estimates of four classes of ranking-based quantities, including

average ranks
pairwise ranking probabilities
top-k ranking probabilities
marginal ranking probabilities

The output tibble qoi stores the estimated quantities and their corresponding 95% CIs.

# View the results based on the quantity of interest
out_direct$results %>%
  filter(qoi == "average rank")

## # A tibble: 4 × 6
##   item     qoi          outcome        mean lower upper
##   <chr>    <chr>        <chr>         <dbl> <dbl> <dbl>
## 1 gender   average rank Avg: gender    1.66  1.61  1.74
## 2 party    average rank Avg: party     3.27  3.21  3.30
## 3 race     average rank Avg: race      2.48  2.42  2.53
## 4 religion average rank Avg: religion  2.60  2.49  2.66

# View the results based on the item
out_direct$results %>%
  filter(item == "party")

## # A tibble: 11 × 6
##    item  qoi              outcome       mean  lower  upper
##    <chr> <chr>            <chr>        <dbl>  <dbl>  <dbl>
##  1 party average rank     Avg: party  3.27   3.21   3.30  
##  2 party marginal ranking Ranked 1    0.0427 0.0309 0.0638
##  3 party marginal ranking Ranked 2    0.151  0.133  0.171 
##  4 party marginal ranking Ranked 3    0.304  0.289  0.336 
##  5 party marginal ranking Ranked 4    0.503  0.488  0.520 
##  6 party pairwise ranking v. gender   0.109  0.0807 0.143 
##  7 party pairwise ranking v. race     0.266  0.257  0.287 
##  8 party pairwise ranking v. religion 0.359  0.335  0.382 
##  9 party top-k ranking    Top-1       0.0427 0.0309 0.0638
## 10 party top-k ranking    Top-2       0.194  0.172  0.213 
## 11 party top-k ranking    Top-3       0.497  0.480  0.512

For example, one can visualize the result for average ranks as follows:

# Plot the result
out_direct$results %>%
  mutate(
    item = factor(
      item,
      levels = c("party", "religion", "gender", "race")
    )
  ) %>%
  plot_avg_ranking()

Weighting-Based Bias Correction via `imprr_weights`

The alternative method for bias correction is based on the idea of inverse-probability weighting (IPW). imprr_weights improves ranking data by computing bias correction weights, which can be used to correct for the bias in the IPW framework. The same arguments previously used can be used as follows:

Because imprr_weights enumerates the full permutation space, its computational cost grows quickly with J!. In practice, this method is best suited to small or moderate ranking questions; for larger J, imprr_direct or imprr_direct_rcpp will usually be much more practical.

# Perform bias correction
out_weights <- imprr_weights(
  data = identity,
  J = 4,
  main_q = c("party", "religion", "gender", "race"),
  anc_correct = "anc_correct_identity"
)

## No weight column supplied; using equal weights for all observations.

By default, imprr_weights assumes that the target population is a set of non-random respondents. When researchers wish to study the entire population as a target group, additional arguments must be specified, including population and assumption. For example, the uniform preference assumption can be specified as follows:

# Perform bias correction with the uniform preference assumption
out_weights_uniform <- imprr_weights(
  data = identity,
  J = 4,
  main_q = c("party", "religion", "gender", "race"),
  anc_correct = "anc_correct_identity",
  population = "all",
  assumption = "uniform"
)

## population = 'all' with assumption = 'uniform' implies no correction; ignoring anc_correct and p_random.

## No weight column supplied; using equal weights for all observations.

Similarly, the contaminated sampling assumption can be specified as follows:

# Perform bias correction with the uniform preference assumption
out_weights_contaminated <- imprr_weights(
  data = identity,
  J = 4,
  main_q = c("party", "religion", "gender", "race"),
  anc_correct = "anc_correct_identity",
  population = "all",
  assumption = "contaminated"
)

## No weight column supplied; using equal weights for all observations.

Results: Estimated Weights

The output of imprr_weights contains the set of weights for all possible ranking profiles with J items. For example, when J = 4, the set has {1234, 1243, ..., 4321} and each profile now has an estimated weight.

# View the estimated weights
out_weights$rankings %>%
  select(ranking, weights)

##    ranking   weights
## 1     1234 0.0000000
## 2     1243 0.0000000
## 3     1324 0.0000000
## 4     1342 0.0000000
## 5     1423 1.0158812
## 6     1432 0.4078355
## 7     2134 0.8582397
## 8     2143 0.8070574
## 9     2314 0.7456387
## 10    2341 0.0000000
## 11    2413 1.1316994
## 12    2431 0.5767371
## 13    3124 1.0238295
## 14    3142 0.5400194
## 15    3214 0.8251218
## 16    3241 0.0000000
## 17    3412 1.2733020
## 18    3421 1.0314721
## 19    4123 1.2628998
## 20    4132 1.1045545
## 21    4213 1.0388263
## 22    4231 0.4999637
## 23    4312 1.2711103
## 24    4321 1.0593130

Results: Estimated PMF with Bias Corrected Data

imprr_weights also returns the estimated probability mass function of all ranking profiles before and after bias correction.

# View the estimated corrected PMF
out_weights$rankings %>%
  select(ranking, prop_obs, prop_bc)

##    ranking    prop_obs     prop_bc
## 1     1234 0.012939002 0.000000000
## 2     1243 0.010166359 0.000000000
## 3     1324 0.012939002 0.000000000
## 4     1342 0.006469501 0.000000000
## 5     1423 0.046210721 0.046944603
## 6     1432 0.018484288 0.007538549
## 7     2134 0.033271719 0.028555111
## 8     2143 0.030499076 0.024614506
## 9     2314 0.027726433 0.020673901
## 10    2341 0.005545287 0.000000000
## 11    2413 0.064695009 0.073215306
## 12    2431 0.022181146 0.012792690
## 13    3124 0.047134935 0.048258138
## 14    3142 0.021256932 0.011479155
## 15    3214 0.031423290 0.025928041
## 16    3241 0.011090573 0.000000000
## 17    3412 0.126617375 0.161222159
## 18    3421 0.048059150 0.049571673
## 19    4123 0.118299445 0.149400343
## 20    4132 0.059149723 0.065334095
## 21    4213 0.048983364 0.050885208
## 22    4231 0.020332717 0.010165620
## 23    4312 0.124768946 0.158595089
## 24    4321 0.051756007 0.054825814

Estimated Weights with Original Data

identity_w <- out_weights$results
head(identity_w)

## # A tibble: 6 × 19
##   weights s_weight app_identity party religion gender  race anc_identity
##     <dbl>    <dbl> <chr>        <dbl>    <dbl>  <dbl> <dbl> <chr>       
## 1    1.02    0.844 1423             1        4      2     3 1234        
## 2    1.02    0.886 1423             1        4      2     3 1234        
## 3    1.27    2.96  3412             3        4      1     2 1234        
## 4    1.02    0.987 1423             1        4      2     3 1234        
## 5    1.10    1.76  4132             4        1      3     2 1324        
## 6    1.02    0.469 3124             3        1      2     4 1234        
## # ℹ 11 more variables: household <dbl>, neighborhood <dbl>, city <dbl>,
## #   state <dbl>, anc_correct_identity <dbl>, app_identity_recorded <chr>,
## #   anc_identity_recorded <chr>, app_identity_row_rnd <chr>,
## #   anc_identity_row_rnd <chr>, random_identity <dbl>, ranking <chr>

# save(identity_w, file = "data/identity_w.rda")

Yuki Atsusaka and Seo-young Silvia Kim