
Developing a New NMAR Estimator
Source:vignettes/developing_new_nmar_estimator.Rmd
developing_new_nmar_estimator.RmdDeveloping a New NMAR Estimator in the NMAR
Package
This guide explains how to add and integrate a new Not Missing at
Random (NMAR) statistical estimator into NMAR, following
the architecture used by the Empirical Likelihood (EL) engine and
emphasizing a clean separation of concerns:
- Shared, reusable utilities live in
src_dev/shared/(scaling, families, bootstrap, numerics, weights, verbosity). - Estimator-specific math and code live in
src_dev/engines/<method>/.
Repository conventions - Always edit source under
src_dev/. Do not edit generated artifacts inR/,man/, orNAMESPACEby hand. - Before running tests, vignettes, or pkgdown, refresh generated sources withRscript build_r_folder.R(orsource("build_r_folder.R")from an R session). - In development, preferdevtools::load_all()overlibrary(NMAR)so you always load the current working tree. - If you change roxygen comments, rundevtools::document()after refreshingR/so Rd/NAMESPACE are rebuilt from current sources. - Use testthat v3 undertests/testthat/and rundevtools::test()before opening a PR. - Recommended before publishing:R CMD check . --no-manual --as-cran. - Optional repo tooling:pre-commit run --all-files(styling, README render, roxygenize, and other guards). - Use roxygen2 for documentation; do not editNAMESPACEdirectly.
0) General guidelines
- Prefer small, pure functions with clear inputs/outputs; fail loudly
with informative
stop()messages. - Prefer type-stable iteration (
vapply(),lapply(), explicit loops) oversapply(). - Avoid adding new dependencies without prior discussion; prefer base R and existing shared infrastructure.
- Reuse
src_dev/shared/for method-agnostic components (scaling, families, bootstrap, numerics). - Use S3 with explicit constructors and validators for engine and result objects.
- Keep a single, predictable execution path; avoid unnecessary nesting and multi-modal behavior (KISS/YAGNI).
0.1) Minimal developer checklist
- Add an engine constructor
<method>_engine()with@exportand@keywords engine. - Add
run_engine.nmar_engine_<method>()with@exportS3Method run_engine nmar_engine_<method>. - Implement a method generic
<method>()and class methods<method>.data.frame()and<method>.survey.design()(as needed). - Construct results with
new_nmar_result()and return classnmar_result_<method>. - Add tests under
tests/testthat/(at least: smoke test, argument validation, and one regression test). - Refresh generated files and validate:
Rscript build_r_folder.R, thendevtools::document(), thendevtools::test(). - If you add a developer article or vignette, register it in
_pkgdown.yml. Note that this guide is intentionally excluded from the package tarball via.Rbuildignorebut can still be built into the pkgdown site from source.
1) Overview of the package flow
User workflow:
- User creates an engine configuration (e.g.,
el_engine(...)). - User calls
nmar(formula = ..., data = ..., engine = <engine>). -
nmar()validates inputs and dispatches torun_engine(engine, formula, data, trace_level)by S3. - The engine’s
run_engine.nmar_engine_<method>()builds arguments from the engine configuration and calls an internal method-specific generic (e.g.,el()), which dispatches ondatatype:-
*.data.frame(...)for IID data -
*.survey.design(...)for complex surveys
-
- The method computes the estimator and returns a result object of
class
c("nmar_result_<method>", "nmar_result")containing the primary estimand and supporting data for S3 methods.
2) What to implement for a new estimator
Assume your engine will be named <method> (e.g.,
exptilt2, abc, etc.).
Create the following files under
src_dev/engines/<method>/:
1) engine.R - engine constructor
method_engine <- function(
standardize = TRUE,
variance_method = c("bootstrap", "none"),
bootstrap_reps = 500,
auxiliary_means = NULL,
...
) {
validator_assert_scalar_logical(standardize, name = "standardize")
variance_method <- match.arg(variance_method)
engine <- list(
standardize = standardize,
variance_method = variance_method,
bootstrap_reps = bootstrap_reps,
auxiliary_means = auxiliary_means,
...
)
class(engine) <- c("nmar_engine_<method>", "nmar_engine")
engine
}Important: add @keywords engine in roxygen
2) run_engine.R - run_engine method
#' @exportS3Method run_engine nmar_engine_<method>
run_engine.nmar_engine_<method> <- function(engine, formula, data, trace_level = 0) {
args <- list(
data = data,
formula = formula,
standardize = engine$standardize,
auxiliary_means = engine$auxiliary_means,
variance_method = engine$variance_method,
bootstrap_reps = engine$bootstrap_reps,
trace_level = trace_level
)
fit <- do.call(method, args)
if (!inherits(fit, "nmar_result_<method>")) stop("Expected nmar_result_<method>.", call. = FALSE)
fit
}3) impl/<method>.R - define the method
generic
#' @param data A data.frame or survey.design.
#' @keywords internal
method <- function(data, ...) UseMethod("method")4) impl/dataframe.R - method for IID data
#' @keywords internal
method.data.frame <- function(data, formula, standardize = TRUE, auxiliary_means = NULL, ...) {
# Prepare inputs. You may adapt from EL's el_prepare_inputs() pattern.
parsed <- build_method_inputs(data, formula)
estimation_data <- parsed$data
obs_idx <- which(parsed$respondent_mask)
# Build design matrices supplied by the engine-specific parser
Z_un <- parsed$missingness_design
X_un <- parsed$auxiliary_matrix
has_aux <- !is.null(X_un) && ncol(X_un) > 0
# Resolve auxiliary means before scaling.
# If your estimator uses auxiliary moment constraints, you must define what
# "population means" mean in your workflow:
# - If auxiliaries are observed for all sampled units, a common default is
# colMeans(X_un) (iid) or design-weighted means (survey).
# - If only respondents are observed, you typically require user input.
mu_x <- if (has_aux) auxiliary_means else numeric(0)
if (has_aux && is.null(mu_x)) {
if (anyNA(X_un)) {
stop("Cannot compute auxiliary means when auxiliaries contain NA; supply auxiliary_means instead.", call. = FALSE)
}
mu_x <- colMeans(X_un)
}
if (has_aux && (is.null(names(mu_x)) || !setequal(names(mu_x), colnames(X_un)))) {
stop("auxiliary_means must be a named numeric vector matching auxiliary columns.", call. = FALSE)
}
# Shared scaling
sc <- validate_and_apply_nmar_scaling(
standardize = standardize,
has_aux = has_aux,
response_model_matrix_unscaled = Z_un,
aux_matrix_unscaled = if (is.null(X_un)) matrix(nrow = nrow(Z_un), ncol = 0) else X_un,
mu_x_unscaled = mu_x
)
# Solve on the scaled space (method-specific code)
fit <- method_solve_core(
full_data = estimation_data,
respondent_data = estimation_data[obs_idx, ],
response_model_matrix_scaled = sc$response_model_matrix_scaled,
auxiliary_matrix_scaled = sc$auxiliary_matrix_scaled,
mu_x_scaled = sc$mu_x_scaled,
...
)
# Unscale coefficients/vcov if you report missingness-model parameters.
# beta_unscaled <- unscale_coefficients(fit$beta_scaled, fit$vcov_scaled, sc$recipe)
# Wrap into a standard result object (see Section 5)
new_nmar_result_method(
y_hat = fit$y_hat,
se = fit$se,
coefficients = fit$model$coefficients,
vcov = fit$model$vcov,
diagnostics = fit$diagnostics
)
}5) impl/survey.R - method for survey.design
Follow EL’s el.survey.design pattern:
- Construct full-sample and respondent-only designs using
subset(). - Get analysis weights via
weights(design)and setN_pop <- sum(weights(design))on the analysis scale. - For design-based degrees of freedom, consider
survey::degf(design)(when available). - For design-based variance built from score totals, you may compute
survey::svytotal(~ scores, design)and usevcov().
6) impl/equations.R / impl/jacobian.R
(optional)
If your estimator uses a system of estimating equations and an analytic Jacobian (like EL), define and document them here.
7) impl/variance.R (optional)
Add an option to use shared bootstrap variance helpers
(bootstrap_variance()) for IID or survey designs.
8) impl/constructors.R - results
Call new_nmar_result() inside your constructor so the
object carries class
c("nmar_result_<method>", "nmar_result") and shares
the standard layout:
- Scalars:
y_hat(primary estimate),estimate_name,se,converged. - Lists:
model(coefficients/vcov),weights_info,sample,inference,diagnostics,meta,extra.
Populate only fields in the shared schema; put engine-specific
objects under extra. Call
new_nmar_result(estimate = ...) rather than assigning
y_hat directly. The helper attaches classes, adds the
parent "nmar_result" class, and routes the object through
validate_nmar_result() so missing pieces are filled with
safe defaults.
Minimal constructor example (inside
src_dev/engines/<method>/impl/constructors.R):
#' @keywords internal
new_nmar_result_method <- function(y_hat, se,
estimate_name = NA_character_,
coefficients = NULL, vcov = NULL,
weights = NULL,
sample = list(),
inference = list(),
diagnostics = list(),
meta = list(engine_name = "<method>", call = NULL, formula = NULL),
extra = list(),
converged = TRUE) {
new_nmar_result(
estimate = y_hat,
estimate_name = estimate_name,
se = se,
converged = converged,
model = list(coefficients = coefficients, vcov = vcov),
weights_info = list(values = weights, trimmed_fraction = NA_real_),
sample = sample,
inference = inference,
diagnostics = diagnostics,
meta = meta,
extra = extra,
class = "nmar_result_<method>"
)
}9) s3.R - S3 methods for your result class
Implement (or rely on parent defaults):
-
print.nmar_result_<method>(),summary.nmar_result_<method>()(optional). When overriding, callNextMethod()first to reuse the parent output and then append estimator-specific details. -
vcov(),confint(),tidy(),glance(): these work out of the box as long as you populate the shared schema. - Optional accessors:
weights(),fitted(),coef(), etc., if your engine exposes them.
3) Reusing shared infrastructure
Use these shared modules in src_dev/shared/:
Scaling
-
validate_and_apply_nmar_scaling()to standardizeZ/Xand auxiliary means before solving, returning annmar_scaling_recipefor unscaling. -
unscale_coefficients()to map scaled coefficients and vcov back to the original scale for reporting. - Intercept is never scaled; constant columns get sd = 1.
Families (response model)
-
logit_family()/probit_family()exposelinkinv,mu.eta,d2mu.deta2, andscore_etafor link-agnostic estimating equations and Jacobians. - Probit uses a tail-stable log-ratio for
phi/Phi.
Bootstrap variance
-
bootstrap_variance()is an internal helper that handlesdata.frameandsurvey.designinputs (it is not a formal exported S3 generic). - IID: resample rows and rerun the estimator.
- Survey: convert to bootstrap replicate weights
(
svrep::as_bootstrap_design()), then pass replicate estimates and the design’sscale/rscales/msedirectly tosurvey::svrVar(). - Engines may pass a
resample_guard(indices, data)in the IID case to reject structurally invalid resamples (e.g., those with zero respondents). - For survey designs,
survey_na_policycontrols how failed replicates are handled:-
"strict"(default): any non-finite replicate estimate is an error. -
"omit": failed replicates are dropped and the correspondingrscalesentries are removed before callingsvrVar().
-
4) Engine boundaries and naming
- Do not call helpers from other engines’
impl/folders. Engines should depend only onsrc_dev/shared/and their ownsrc_dev/engines/<method>/impl/. - Prefix engine-specific helpers with
<method>_(e.g.,el_...for empirical likelihood) to signal ownership and avoid accidental reuse. - The parent S3 class
nmar_resultprovides default methods (vcov,confint,tidy,glance,coef,fitted,weights,formula,se). Implement child print/summary only if you need extra presentation.
Developer notes: nmar_result schema and
conventions
All NMAR engines should construct result objects via
new_nmar_result() (or a thin engine wrapper such as
new_nmar_result_el()). The shared parent class has the
following schema:
- Scalars:
-
y_hat: numeric scalar primary estimand (e.g., adjusted mean). -
estimate_name: label for the estimand (usually the outcome variable name). -
se: numeric scalar standard error (may beNA_real_if unavailable). -
converged: logical flag indicating solver success.
-
- Lists:
-
model: list with-
coefficients: missingness-model coefficients (orNULL), -
vcov: covariance matrix forcoefficients(orNULL), - engine-specific extras such as
nuisance.
-
-
weights_info: list with-
values: unnormalized analysis weights or masses on the engine’s scale, -
trimmed_fraction: fraction of mass trimmed (numeric,NA_real_when not used).
-
-
sample: list with-
n_total: population size on the analysis scale, -
n_respondents: number of respondents, -
is_survey: logical;TRUEfor survey designs, -
design: underlyingsurvey.designobject when applicable.
-
-
inference: list with-
variance_method: label such as"none"or"bootstrap", -
df: degrees of freedom for t-based inference (numeric, may beNA_real_), -
message: variance-related notes (character, may beNA_character_).
-
-
diagnostics: list for engine-specific diagnostics (Jacobian condition numbers, equation residuals, denominator summaries, and so on). -
meta: list with-
engine_name: short engine label (for printing), -
call: outernmar()call, -
formula: estimation formula.
-
-
extra: list for any additional engine-specific objects (for examplenmar_scaling_recipe, fitted probabilities, or cached raw structures).
-
validate_nmar_result() is the single authority on this
schema; it normalises and type-checks objects so that parent S3 methods
can rely on the presence and shape of these fields. Engine constructors
supply whatever information they have and let the validator back-fill
missing components with safe defaults.
Two conventions are important for engines that want to reuse the parent methods:
Weights: if you want
weights()to return analysis weights, store unnormalised masses inweights_info$valuesand the population size insample$n_total. The sharedweights.nmar_result()method then returns probability weights (scale = "probability") and population weights (scale = "population") that satisfy the documented sum relationships (up to numerical precision), even under trimming. Engines that use a different weighting concept can either map into this convention or provide an engine-specificweights.nmar_result_<method>().Inference: parent methods (
confint(),tidy(),glance(),coef(summary()),confint(summary())) useinference$dftogether withsample$is_surveyto decide between t- and normal-based quantiles. Engines that provide design-based degrees of freedom should populatedf; those that do not can leave it asNA_real_, in which case normal-based inference is used.
As long as a new engine adheres to these conventions and calls
new_nmar_result(), it automatically inherits a consistent
interface for vcov(), confint(),
tidy(), glance(), coef(),
fitted(), weights(), formula(),
se(), summary(), and print().