Polish Household Budget Data with Simulated Nonignorable Nonresponse

This dataset is derived from the `h05` dataset (Polish household budgets for 2005) found in the `RClas` package. The original data was cleaned to remove all rows with missing values.

Usage

polish_households

Format

A data frame with 19,330 rows and 17 columns. The key variables are:

class: TODO
voi: TODO
bio: TODO
type: TODO
d345: TODO
d347: TODO
d348: TODO
d36: TODO
d38: TODO
d61: TODO
noper: TODO
income: TODO
expenditure: TODO
y_exp: Numeric. The **true** scaled expenditure (`expenditure / mean(expenditure)`). This is the complete study variable without missingness.
resp: TODO
R: Integer. The simulated response indicator (1=responded, 0=nonresponse).
y_exp_miss: Numeric. The **observed** scaled expenditure, containing 7,778 `NA` values where `R = 0`. This is the variable to be used as the NMAR-affected outcome.

Source

TODO

Details

To create a realistic test case for nonignorable nonresponse (NMAR), a nonresponse mechanism was simulated and applied to the scaled expenditure variable (`y_exp`).

The key simulation steps were: 1. `y_exp` (true study variable) was created by scaling total expenditure. 2. A true response probability (`resp`) was created using the logistic model: `plogis(1 - 0.6 * y_exp)`. 3. A response indicator (`R`) was simulated based on this probability. 4. The final variable `y_exp_miss` was generated by setting `y_exp` to `NA` wherever `R` was 0.

The response is **nonignorable** because the probability of missingness depends directly on the value of the expenditure variable itself.

Polish Household Budget Data with Simulated Nonignorable Nonresponse

Usage

Format

Source

Details

See also