nonprobsvy 0.2.3
CRAN release: 2025-08-20
- changes to the documentation to meet the JSS requirements
- documentation polishing
nonprobsvy 0.2.2
CRAN release: 2025-05-24
- new hex sticker by Oliwia Awuku
- minor changes to the code, e.g.
control_out(eps=1e-8) - fixing a bug in the bootstrap variance estimator the
method_nnandmethod_pmm - fixing bootstrap for doubly robust estimators
- more unit-tests for doubly robust estimators and other methods
- more informative vignette for
method_glm
nonprobsvy 0.2.1
CRAN release: 2025-04-22
- titles corrected
- new S3 method
extractadded which allows to extract results from thenonprobobject - new S3 method
coefadded which allows to obtain the coefficients of underlying models (if possible) - fixed CRAN notes (unit tests for the IPW estimator
cloglog) - removed
samplingpackage from suggested package - added simple
plotmethod - improvements in the linear algebra
- corrected the
check_balanceerror (closes #75) - code cleaning
nonprobsvy 0.2.0
CRAN release: 2025-03-27
Breaking changes
- functions
pop.size,controlSel,controlOutandcontrolInfwere renamed topop_size,control_sel,control_outandcontrol_infrespectively. - function
genSimDataremoved completely as it is not used anywhere in the package. - argument
maxLik_methodrenamed tomaxlik_methodin thecontrol_selfunction. -
control_outfunction:-
predictive_matchrenamed topmm_match_typeto align with the PMM (Predictive Mean Matching) estimator naming convention, where all related parameters start withpmm_
-
-
control_selfunction:- argument
methodremoved as it was not used - argument
est_method_selrenamed toest_method - argument
hrenamed togee_h_funto make this more readable to the user -
start_typenow accepts onlyzeroandmle(forgeemodels only).
- argument
-
control_inffunction:-
bias_infrenamed tovars_combineand type changed tological.TRUEif variables (its levels) should be combined after variable selection algorithm for the doubly robust approach. -
pi_ij– argument removed as it is not used.
-
-
nonprobsvyclass renamed tononproband all related method adjusted to this change - functions
logit_model_nonprobsvy,probit_model_nonprobsvyandcloglog_model_nonprobsvyremoved in the favour of more readablemethod_psfunction that specifies the propensity score model - new option
control_inference=control_inf(vars_combine=TRUE)which allows doubly robust estimator to combine variables prior estimation i.e. ifselection=~x1+x2andy~x1+x3then the following models are fittedselection=~x1+x2+x3andy~x1+x2+x3. By default we setcontrol_inference=control_inf(vars_combine=FALSE). Note that this behaviour is assumed independently from variable selection. - argument
nonprob(weights=NULL)replaced tononprob(case_weights=NULL)to stress that this refer to case weights not sampling or other weights in non-probability sample
Features
- two additional datasets have been included:
jvs(Job Vacancy Survey; a probability sample survey) andadmin(Central Job Offers Database; a non-probability sample survey). The units and auxiliary variables have been aligned in a way that allows the data to be integrated using the methods implemented in this package. - a
check_balancefunction was added to check the balance in the totals of the variables based on the weighted weights between the non-probability and probability samples. - citation file added.
- argument
na_actionwith defaultna.omit - new generic methods added:
-
weights– returns IPW weights -
update– allows to update thenonprobclass object
-
- new functions added and exported:
-
method_ps– for modelling propensity score -
method_glm– for modelling y usingglmfunction -
method_nn– for the NN method -
method_pmm– for the PMM method -
method_npar– for the non-parametric method
-
- new
print.nonprob,summary.nonprobandprint.nonprob_summarymethods
> result_mi
A nonprob object
- estimator type: mass imputation
- method: glm (gaussian)
- auxiliary variables source: survey
- vars selection: false
- variance estimator: analytic
- population size fixed: false
- naive (uncorrected) estimators:
- variable y1: 3.1817
- variable y2: 1.8087
- selected estimators:
- variable y1: 2.9498 (se=0.0420, ci=(2.8674, 3.0322))
- variable y2: 1.5760 (se=0.0326, ci=(1.5122, 1.6399))number of digits can be changed using print(x, digits) as shown below
> print(result_mi,2)
A nonprob object
- estimator type: mass imputation
- method: glm (gaussian)
- auxiliary variables source: survey
- vars selection: false
- variance estimator: analytic
- population size fixed: false
- naive (uncorrected) estimators:
- variable y1: 3.18
- variable y2: 1.81
- selected estimators:
- variable y1: 2.95 (se=0.04, ci=(2.87, 3.03))
- variable y2: 1.58 (se=0.03, ci=(1.51, 1.64))> summary(result_mi) |> print(digits=2)
A nonprob_summary object
- call: nonprob(data = subset(population, flag_bd1 == 1), outcome = y1 +
y2 ~ x1 + x2, svydesign = sample_prob)
- estimator type: mass imputation
- nonprob sample size: 693011 (69.3%)
- prob sample size: 1000 (0.1%)
- population size: 1000000 (fixed: false)
- detailed information about models are stored in list element(s): "outcome"
----------------------------------------------------------------
- distribution of outcome residuals:
- y1: min: -4.79; mean: 0.00; median: 0.00; max: 4.54
- y2: min: -4.96; mean: -0.00; median: -0.07; max: 12.25
- distribution of outcome predictions (nonprob sample):
- y1: min: -2.72; mean: 3.18; median: 3.04; max: 16.28
- y2: min: -1.55; mean: 1.81; median: 1.58; max: 13.92
- distribution of outcome predictions (prob sample):
- y1: min: -0.46; mean: 2.95; median: 2.84; max: 10.31
- y2: min: -0.58; mean: 1.58; median: 1.39; max: 7.87
----------------------------------------------------------------Bugfixes
- basic methods and functions related to variance estimation, weights and probability linking methods have been rewritten in a more optimal and readable way.
Other
- more informative error messages added.
- documentation improved.
- switching completely to snake_case.
- extensive cleaning of the code.
- more unit-tests added.
- new dependencies:
formula.tools
Replication materials
- to verify the quality of the software please refer to the replication materials available here: https://github.com/ncn-foreigners/software-tutorials
nonprobsvy 0.1.1
CRAN release: 2024-11-14
Bugfixes
- bug Fix occurring when estimation was based on auxiliary variable, which led to compression of the data from the frame to the vector.
- bug Fix related to not passing
maxitargument fromcontrolSelfunction to internally usednleqslvfunction - bug Fix related to storing
vectorinmodel_framewhen predictingy_hatin mass imputationglmmodel when X is based in one auxiliary variable only - fix provided converting it todata.frameobject.
Features
- added information to
summaryabout quality of estimation basing on difference between estimated and known total values of auxiliary variables - added estimation of exact standard error for k-nearest neighbor estimator.
- added breaking change to
controlOutfunction by switching values forpredictive_matchargument. From now on, thepredictive_match = 1means \hat{y}-\hat{y} in predictive mean matching imputation andpredictive_match = 2corresponds to \hat{y}-y matching. - implemented
divoption when variable selection (more in documentation) for doubly robust estimation. - added more insights to
nonproboutput such as gradient, hessian and jacobian derived from IPW estimation formleandgeemethods whenIPWorDRmodel executed. - added estimated inclusion probabilities and its derivatives for probability and non-probability samples to
nonproboutput whenIPWorDRmodel executed. - added
model_framematrix data from probability sample used for mass imputation tononprobwhenMIorDRmodel executed.
nonprobsvy 0.1.0
CRAN release: 2024-04-04
Features
- implemented population mean estimation using doubly robust, inverse probability weighting and mass imputation methods
- implemented inverse probability weighting models with Maximum Likelihood Estimation and Generalized Estimating Equations methods with
logit,complementary log-logandprobitlink functions. - implemented
generalized linear models,nearest neighboursandpredictive mean matchingmethods for Mass Imputation - implemented bias correction estimators for doubly-robust approach
- implemented estimation methods when vector of population means/totals is available
- implemented variables selection with
SCAD,LASSOandMCPpenalization equations - implemented
analyticandbootstrap(with parallel computation -doSNOWpackage) variance for described estimators - added control parameters for models
- added S3 methods for object of
nonprobclass such as-
nobsfor samples size -
pop.sizefor population size estimation -
residualsfor residuals of the inverse probability weighting model -
cooks.distancefor identifying influential observations that have a significant impact on the parameter estimates -
hatvaluesfor measuring the leverage of individual observations -
logLikfor computing the log-likelihood of the model, -
AIC(Akaike Information Criterion) for evaluating the model based on the trade-off between goodness of fit and complexity, helping in model selection -
BIC(Bayesian Information Criterion) for a similar purpose as AIC but with a stronger penalty for model complexity -
confintfor calculating confidence intervals around parameter estimates -
vcovfor obtaining the variance-covariance matrix of the parameter estimates -
deviancefor assessing the goodness of fit of the model
-
