
Polish Household Budget Data with Simulated Nonignorable Nonresponse
Source:R/data_docs_polish_households.R
polish_households.RdThis dataset is derived from the `h05` dataset (Polish household budgets for 2005) found in the `RClas` package. The original data was cleaned to remove all rows with missing values.
Format
A data frame with 19,330 rows and 17 columns. The key variables are:
- class
TODO
- voi
TODO
- bio
TODO
- type
TODO
- d345
TODO
- d347
TODO
- d348
TODO
- d36
TODO
- d38
TODO
- d61
TODO
- noper
TODO
- income
TODO
- expenditure
TODO
- y_exp
Numeric. The **true** scaled expenditure (`expenditure / mean(expenditure)`). This is the complete study variable without missingness.
- resp
TODO
- R
Integer. The simulated response indicator (1=responded, 0=nonresponse).
- y_exp_miss
Numeric. The **observed** scaled expenditure, containing 7,778 `NA` values where `R = 0`. This is the variable to be used as the NMAR-affected outcome.
Details
To create a realistic test case for nonignorable nonresponse (NMAR), a nonresponse mechanism was simulated and applied to the scaled expenditure variable (`y_exp`).
The key simulation steps were: 1. `y_exp` (true study variable) was created by scaling total expenditure. 2. A true response probability (`resp`) was created using the logistic model: `plogis(1 - 0.6 * y_exp)`. 3. A response indicator (`R`) was simulated based on this probability. 4. The final variable `y_exp_miss` was generated by setting `y_exp` to `NA` wherever `R` was 0.
The response is **nonignorable** because the probability of missingness depends directly on the value of the expenditure variable itself.