Control Parameters for Outcome Model

control_out constructs a list with all necessary control parameters for outcome model.

Usage

control_out(
  epsilon = 1e-08,
  maxit = 100,
  trace = FALSE,
  k = 5,
  penalty = c("SCAD", "lasso", "MCP"),
  a_SCAD = 3.7,
  a_MCP = 3,
  lambda = -1,
  lambda_min = 0.001,
  nlambda = 50,
  nfolds = 10,
  treetype = c("kd", "bd"),
  searchtype = c("standard", "priority"),
  pmm_match_type = 1,
  pmm_weights = c("none", "dist"),
  pmm_k_choice = c("none", "min_var"),
  pmm_k_max = NULL,
  pmm_reg_engine = c("glm", "loess"),
  npar_loess = stats::loess.control(surface = "direct", trace.hat = "approximate")
)

Arguments

epsilon: Tolerance for fitting algorithms. Default is 1e-8.
maxit: Maximum number of iterations.
trace: logical value. If TRUE trace steps of the fitting algorithms. Default is FALSE.
k: The k parameter in the RANN::nn2() function. Default is 5.
penalty: penalty algorithm for variable selection. Default is SCAD
a_SCAD: The tuning parameter of the SCAD penalty for outcome model. Default is 3.7.
a_MCP: The tuning parameter of the MCP penalty for outcome model. Default is 3.
lambda: A user-specified \(\lambda\) value during variable selection model fitting. The default value -1 uses cross-validation.
lambda_min: The smallest value for lambda, as a fraction of lambda.max. Default is .001.
nlambda: The number of lambda values. Default is 50.
nfolds: The number of folds during cross-validation for variables selection model.
treetype: Type of tree for nearest neighbour imputation (for the NN and PMM estimator) passed to RANN::nn2() function.
searchtype: Type of search for nearest neighbour imputation (for the NN and PMM estimator) passed to RANN::nn2() function.
pmm_match_type: (Only for the PMM Estimator) Indicates how to select 'closest' unit from non-probability sample for each unit in probability sample. Either 1 (default) or 2 where 2 is matching by minimizing distance between \(\hat{y}_{i}\) for \(i \in S_{\mathrm{NP}}\) and \(y_{j}\) for \(j \in S_{\mathrm{P}}\) and 1 is matching by minimizing distance between \(\hat{y}_{i}\) for \(i \in S_{\mathrm{NP}}\) and \(\hat{y}_{i}\) for \(i \in S_{\mathrm{NP}}\).
pmm_weights: (Only for the PMM Estimator) Indicate how to weight k nearest neighbours in \(S_{\mathrm{P}}\) to create imputed value for units in \(S_{\mathrm{NP}}\). The default value "none" indicates that mean of k nearest \(y\)'s from \(S_{\mathrm{P}}\) should be used whereas "dist" results in weighted mean of these k values where weights are inversely proportional to distance between matched values.
pmm_k_choice: (Only for the PMM Estimator) Character value indicating how k hyper-parameter should be chosen, by default "none" meaning k provided in control_outcome argument will be used. For now the only other option "min_var" means that k will be chosen by a full search over 1:n_NP (or 1:pmm_k_max, see below), where \(n_{\mathrm{NP}}\) is the non-probability sample size, minimizing the estimated variance of the mean estimator. The k value supplied in this control list is replaced by the selected value. Note that this search refits the full PMM stack for every candidate k, so its cost scales as \(O(n_{\mathrm{NP}} \times l)\) (with \(l\) the number of outcome variables) and can be substantial for large non-probability samples; cap it with pmm_k_max or supply k directly when \(n_{\mathrm{NP}}\) is large.
pmm_k_max: (Only for the PMM Estimator) Positive integer upper bound for the pmm_k_choice = "min_var" search grid. The default NULL searches the full 1:n_NP grid. Setting e.g. pmm_k_max = 30 caps the search at 1:min(n_NP, 30) to bound its cost.
pmm_reg_engine: (Only for the PMM Estimator) whether to use parametric ("glm") or non-parametric ("loess") regression model for the outcome. The default is "glm".
npar_loess: control parameters for the stats::loess via the stats::loess.control function.

Value

List with selected parameters.

Usage

Arguments

Value

See also