Skip to contents

This dataset is derived from the `h05` dataset (Polish household budgets for 2005) found in the `RClas` package. The original data was cleaned to remove all rows with missing values.

Usage

polish_households

Format

A data frame with 19,330 rows and 17 columns. The key variables are:

class

TODO

voi

TODO

bio

TODO

type

TODO

d345

TODO

d347

TODO

d348

TODO

d36

TODO

d38

TODO

d61

TODO

noper

TODO

income

TODO

expenditure

TODO

y_exp

Numeric. The **true** scaled expenditure (`expenditure / mean(expenditure)`). This is the complete study variable without missingness.

resp

TODO

R

Integer. The simulated response indicator (1=responded, 0=nonresponse).

y_exp_miss

Numeric. The **observed** scaled expenditure, containing 7,778 `NA` values where `R = 0`. This is the variable to be used as the NMAR-affected outcome.

Source

TODO

Details

To create a realistic test case for nonignorable nonresponse (NMAR), a nonresponse mechanism was simulated and applied to the scaled expenditure variable (`y_exp`).

The key simulation steps were: 1. `y_exp` (true study variable) was created by scaling total expenditure. 2. A true response probability (`resp`) was created using the logistic model: `plogis(1 - 0.6 * y_exp)`. 3. A response indicator (`R`) was simulated based on this probability. 4. The final variable `y_exp_miss` was generated by setting `y_exp` to `NA` wherever `R` was 0.

The response is **nonignorable** because the probability of missingness depends directly on the value of the expenditure variable itself.

See also

`riddles_case1`, `riddles_case2`, `riddles_case3`, `riddles_case4`