Overview
This vignette explains the matrix-vectorized implementation of the
exponential tilting (EXPTILT) estimator used in this package. The
exposition focuses on concepts and linear-algebraic operations used in
the implementation (functions such as
generate_conditional_density_matrix,
generate_C_matrix, generate_Odds, and the
vectorized s_function), and mirrors the statistical
equations that underlie the algorithm (see Riddles et al. (2016) for the
theoretical background).
The method operates on a dataset split into respondents and non-respondents. Let where is the number of respondents (observed ) and is the number of non-respondents (missing ). The algorithm always treats these two groups separately: the conditional-density model is fit on respondents and used to impute support for non-respondents; the missingness model is fit using covariates that include the candidate values.
We emphasize two distinct and disjoint sets of covariates throughout, and we require they have empty intersection:
- covariates_for_outcome (denote ): variables used to model the conditional density on respondents;
- covariates_for_missingness (denote ): variables used in the response probability (this includes candidate values when evaluating the missing-data expectation).
Distinguishing these sets clearly in notation and code is crucial:
the conditional density model is fit using
covariates_for_outcome on respondents only, while the
missingness model uses covariates_for_missingness and
(importantly) takes
as an additional predictor when forming
.
Notation and main objects
Let:
- be the total number of units; split them into respondents (observed ) and non-respondents (missing ). Denote indices with for non-respondents and (or ) for respondent outcomes used as support.
- be the vector of observed outcome values (respondents).
- denote the matrix of covariates_for_outcome for respondents.
- denote the matrix of covariates_for_outcome for non-respondents.
- and denote the covariates_for_missingness for respondents and non-respondents, respectively (we shorten these to and when needed).
- the response probability (no explicit link notation here): for a given covariate row used in the missingness model and a candidate , denotes the probability of response under parameter .
We build three central matrices:
-
Conditional-density matrix (denoted in code as
f_matrix_nieobs):- Size: where is the number of non-respondents and is the number of distinct respondent values used as support.
- Entries:
where is the (estimated) parameter vector of the conditional-density model fit on respondents. This corresponds to the empirical approximation in equation (12):
-
Column-normalizer vector (denoted in code as
C_matrix_nieobs):- Size: (a column vector).
- Entries: the column-sums of the conditional densities evaluated at respondent covariates: Conceptually this is the denominator that appears when fractional weights are formed (see below).
-
Odds matrix (constructed by
generate_Odds):- Size: .
- Entries (for non-respondent
and candidate
):
The implementation exploits the separability of the linear predictor in
the parameters:
and uses
outer()to form the matrix efficiently.
Using these objects we form a non-normalized weight matrix and then a normalized fractional-weight matrix
These match the weights appearing in the theoretical mean score approximation (equation (15) in the notes): they are the fractional contribution of each imputed pair to the conditional expectation for unit .
The vectorized score S_2 and matrix algebra
Recall the mean-score approximation that leads to the estimating equation used to solve for (cf. equation (13) in the notes):
It is convenient to decompose this as the sum of observed and unobserved contributions: where is the score contribution from respondents, and the missing-unit contribution is approximated by the discrete-support expectation The remainder of this section explains how the double sum in is computed via matrix operations.
The crucial observation for vectorization is that the inner conditional expectation for a non-respondent is approximated by a weighted finite-sum over the respondent support :
Stacking the non-respondent expectations across all non-respondents gives a single matrix operation. Let
For each pair evaluate the vector-valued score s at . Collect these quickly by exploiting the algebraic factorization of the score (see the
s_functionimplementation): many parameter-specific components are separable in and which allows creation of low-memory representations.Denote by the p-dimensional score vector at . Organize these so that for each parameter index we have an matrix of values (parameter-wise maps). Then the non-respondent contribution to the overall score vector is where denotes elementwise multiplication and
vecfollowed by an appropriate collapse (row-sum or column-sum) implements the inner summation depending on the parameter’s factorization. In concrete computational terms:For parameter components that multiply per-row (i.e. depend only on times a factor that is a function of ) we compute elementwise products between and a factor matrix and then row-sum across to get an contribution per non-respondent, and then sum across .
For intercept-like or column-wise components, a column-sum followed by weighted multiplication suffices.
In the implementation this reduces to a sequence of dense matrix operations and row/column sums rather than explicit loops over the expanded index set of length . This yields large speed and memory benefits for real datasets.
Concise vectorized S_2 recipe (conceptual):
- Build
(size
)
via
generate_conditional_density_matrix(model). - Build
(size
)
via
generate_C_matrix(model)by summing the conditional densities over respondents. - For a candidate
compute
(size
)
via
generate_Odds(model,\phi). - Form (i.e. divide each column of by the corresponding scalar ).
- Compute observed-score sum: (this is small: one score vector per respondent).
- Compute non-respondent expected-score: use and parameter-wise factor matrices derived from to compute implemented via matrix multiplications and row/column sums.
- Return the total score .
The root-finder then searches for
such that
.
In the package this is implemented by repeatedly forming
for the current candidate
,
computing
and
,
and letting nleqslv perform the iteration.
Mean estimation and survey weights
After the response-model parameters are obtained the package reports a mean estimate of the target outcome using an inverse-probability reweighting form. Let respondents be indexed by ; denote by the fitted response probabilities evaluated at each respondent (here we write for the covariates used in the missingness model evaluated at respondent ). With design weights (by default for non-survey data) the point estimator computed in the implementation is
Notes on survey weights and fitting:
- The conditional-density fit used to build
can incorporate respondent sampling weights when provided (the
implementation passes respondent weights to the density-fitting
routine). Thus
may be a weighted fit when
design_weightsor a surveydesignis supplied. - The mean-estimate (mean-est) also uses design weights
in both numerator and denominator as shown above. In code these are
provided via
model$design_weightsand default to 1 for non-survey use.
In short: survey weights enter two places — (i) the conditional-density estimation (if requested) and (ii) the final mean calculation through the weighted IPW-type ratio in (mean-est).
Arguments passed to exptilt (summary)
The exptilt / exptilt.data.frame
user-facing function accepts a number of arguments that control model
specification and computation; the most important are:
-
data: adata.framewith outcome and covariates. -
formula: a partitioned formula of the formy ~ aux1 + aux2 | miss1 + miss2where the left part (before|) listscovariates_for_outcomeand the right part listscovariates_for_missingness(the package helper splits these automatically). -
auxiliary_means: (optional) population or target means for auxiliary variables used for scaling. -
standardize: logical, whether to standardize features before fitting. -
prob_model_type: character, either"logit"or"probit"to select the response probability family. -
y_dens: choice of conditional-density family;"auto","normal","lognormal", or"exponential". -
variance_method:"delta"or"bootstrap"for variance estimation. -
bootstrap_reps: number of bootstrap replications when bootstrap is used. -
control: list of control parameters forwarded to the nonlinear solver (nleqslv). -
stopping_threshold: numeric; the sup-norm threshold for early stopping of the score (see above). -
on_failure: behavior on failure ("return"or"error"). -
supress_warnings: logical to silence certain warnings. -
design_weights: optional vector of respondent design weights (or full-sample weights which are subset internally). -
survey_design: optionalsurvey.designobject; when provided some internal logic uses the survey path. -
trace_level: integer controlling verbosity. -
sample_size: integer for stratified subsampling used when data are large. -
outcome_label,user_formula: utility arguments used for presentation and bookkeeping.
These arguments appear in the exptilt.data.frame
function signature and control how the matrices
,
,
and
are built and how the solver is run.
Connection to the EM viewpoint
The EM-like update displayed in the theoretical notes (equation (14)) is exactly what the implementation achieves: for a fixed conditional-density estimate and a current , the expectations for the missing units are approximated by the discrete support on observed and the resulting equation for is solved (via root-finding). The heavy-lift is performed by the matrix calculus described earlier — constructing , , and then computing and multiplying by the parameter-wise score factors.
Stopping criterion (maximum-norm)
Practical optimization requires a convergence criterion. The
implementation uses the maximum absolute component of the vector-valued
score as a stopping rule. Concretely, if the solver produces a score
vector
with
for a user-specified stopping_threshold = \varepsilon, the
algorithm treats this as converged. In the code this is used as an
early-exit inside the target-function passed to the nonlinear solver:
when the score’s sup-norm is below the threshold, a zero vector is
returned to signal convergence and avoid further unnecessary
computations.
This choice matches the intuition that the root-finder should stop when the estimating equations are (componentwise) negligibly small.
Practical tutorial: from raw data to matrix operations (conceptual steps)
Fit the conditional density using respondents and
covariates_for_outcome. This gives a function that evaluates for any and any covariate row (used for both respondents and non-respondents).Evaluate the conditional density at the Cartesian product of non-respondent covariates and observed respondent values to form (done by
generate_conditional_density_matrix). This is the empirical imputation support. Think of rows as target non-respondents and columns as candidate respondent outcomes.Evaluate the same conditional density at respondent covariates to form the column-normalizer (done by
generate_C_matrix) — this is simply the column-sum of densities over respondents.For each trial value of the response-model parameters, compute the odds matrix using the separable linear predictor and link function (implemented efficiently via
outer()in code). Combine with and normalize columns by to obtain .Use the vectorized
s_functionto obtain parameter-specific factor matrices for the non-respondent-imputed scores; multiply (elementwise) by and reduce (row/column sums) to compute the non-respondent contribution to .Add the observed-respondent score and use a root-finder (e.g.
nleqslv) to find with . The solver may use the maximum-norm stopping threshold described above to exit early.
Why the vectorization matters (practical remarks)
- Memory: the naive expansion to an explicit dataset of size
would store duplicated covariate rows and blow up memory. The
implemention exploits separability (intercept +
+
)
and vectorized R primitives (
outer, matrix multiplications, column/row sums) to avoid large temporary allocations. - Speed: elementwise operations on dense matrices plus BLAS-accelerated matrix multiplication are much faster than interpreted loops in R for typical dataset sizes.
- Clarity: organizing logic as the three matrices , , and , followed by elementwise combination and reductions, makes the relationship between the statistical approximation and the implementation transparent and easier to reason about.
Closing notes and references
This vignette mapped the implementation functions to the math in the theoretical notes and showed how the EM-like mean-score step reduces to a small set of matrix operations. The implementation follows the ideas described in Riddles et al. (2016) for exponential tilting under NMAR: fit the conditional density on respondents, approximate the missing-data expectation by a finite sum over observed values, and solve the resulting estimating equations for the missingness-model parameters.
