Skip to contents

Overview

This vignette explains the matrix-vectorized implementation of the exponential tilting (EXPTILT) estimator used in this package. The exposition focuses on concepts and linear-algebraic operations used in the implementation (functions such as generate_conditional_density_matrix, generate_C_matrix, generate_Odds, and the vectorized s_function), and mirrors the statistical equations that underlie the algorithm (see Riddles et al. (2016) for the theoretical background).

The method operates on a dataset split into respondents and non-respondents. Let n=n0+n1n = n_0 + n_1 where n1n_1 is the number of respondents (observed YY) and n0n_0 is the number of non-respondents (missing YY). The algorithm always treats these two groups separately: the conditional-density model is fit on respondents and used to impute support for non-respondents; the missingness model is fit using covariates that include the candidate yy values.

We emphasize two distinct and disjoint sets of covariates throughout, and we require they have empty intersection: 𝒳outcome𝒳missingness=.\mathcal{X}_{\text{outcome}} \cap \mathcal{X}_{\text{missingness}} = \varnothing.

  • covariates_for_outcome (denote XoutX_{\text{out}}): variables used to model the conditional density f1(yXout)f_1(y\mid X_{\text{out}}) on respondents;
  • covariates_for_missingness (denote XmissX_{\text{miss}}): variables used in the response probability π(Xmiss,y;ϕ)\pi(X_{\text{miss}},y;\phi) (this includes candidate yy values when evaluating the missing-data expectation).

Distinguishing these sets clearly in notation and code is crucial: the conditional density model is fit using covariates_for_outcome on respondents only, while the missingness model uses covariates_for_missingness and (importantly) takes yy as an additional predictor when forming π(;ϕ)\pi(\cdot;\phi).

Notation and main objects

Let:

  • nn be the total number of units; split them into respondents (observed YY) and non-respondents (missing YY). Denote indices with ii for non-respondents and kk (or jj) for respondent outcomes used as support.
  • {yj}j=1n1\{y_j\}_{j=1}^{n_1} be the vector of observed outcome values (respondents).
  • XoutX^{\text{out}} denote the matrix of covariates_for_outcome for respondents.
  • XunX^{\text{un}} denote the matrix of covariates_for_outcome for non-respondents.
  • Xmiss,obsX^{\text{miss,obs}} and Xmiss,unX^{\text{miss,un}} denote the covariates_for_missingness for respondents and non-respondents, respectively (we shorten these to XmissobsX_{\text{miss}}^{\text{obs}} and XmissunX_{\text{miss}}^{\text{un}} when needed).
  • π(xmiss,y;ϕ)\pi(x_{\text{miss}},y;\phi) the response probability (no explicit link notation here): for a given covariate row used in the missingness model and a candidate yy, π(;ϕ)\pi(\cdot;\phi) denotes the probability of response under parameter ϕ\phi.

We build three central matrices:

  1. Conditional-density matrix FF (denoted in code as f_matrix_nieobs):

    • Size: n0×n1n_0 \times n_1 where n0n_0 is the number of non-respondents and n1n_1 is the number of distinct respondent yy values used as support.
    • Entries: Fij=f1(yjxiun;γ̂)F_{ij} = f_1(y_j \mid x^{\text{un}}_i; \hat\gamma)
      where γ̂\hat\gamma is the (estimated) parameter vector of the conditional-density model fit on respondents. This corresponds to the empirical approximation in equation (12): f̂1(yj)k:δk=1f1(yjxk;γ̂).\hat f_1(y_j) \propto \sum_{k:\,\delta_k=1} f_1(y_j \mid x_k; \hat\gamma).
  2. Column-normalizer vector CC (denoted in code as C_matrix_nieobs):

    • Size: n1×1n_1 \times 1 (a column vector).
    • Entries: the column-sums of the conditional densities evaluated at respondent covariates: Cj=C(yj;γ̂)=k:δk=1f1(yjxkobs;γ̂).C_j = C(y_j;\hat\gamma) = \sum_{k:\,\delta_k=1} f_1(y_j \mid x^{\text{obs}}_k;\hat\gamma). Conceptually this is the denominator that appears when fractional weights are formed (see below).
  3. Odds matrix O(ϕ)O(\phi) (constructed by generate_Odds):

    • Size: n0×n1n_0 \times n_1.
  • Entries (for non-respondent ii and candidate yjy_j): Oij(ϕ)=1π(ximiss,un,yj;ϕ)π(ximiss,un,yj;ϕ)O_{ij}(\phi) = \frac{1-\pi(x^{\text{miss,un}}_i, y_j;\phi)}{\pi(x^{\text{miss,un}}_i, y_j;\phi)} The implementation exploits the separability of the linear predictor in the parameters: ηij=β0+Ximiss,unβmiss+βyyj,\eta_{ij} = \beta_0 + X^{\text{miss,un}}_i\beta_{\text{miss}} + \beta_y y_j, and uses outer() to form the n0×n1n_0\times n_1 matrix efficiently.

Using these objects we form a non-normalized weight matrix Uij(ϕ)=Oij(ϕ)FijU_{ij}(\phi) = O_{ij}(\phi) \cdot F_{ij} and then a normalized fractional-weight matrix Wij(ϕ)=Uij(ϕ)Cj=Oij(ϕ)f1(yjxiun;γ̂)k:δk=1f1(yjxkobs;γ̂).W_{ij}(\phi) = \frac{U_{ij}(\phi)}{C_j} = \frac{O_{ij}(\phi)\,f_1(y_j\mid x^{\text{un}}_i;\hat\gamma)}{\sum_{k:\,\delta_k=1} f_1(y_j\mid x^{\text{obs}}_k;\hat\gamma)}.

These WijW_{ij} match the weights appearing in the theoretical mean score approximation (equation (15) in the notes): they are the fractional contribution of each imputed (i,j)(i,j) pair to the conditional expectation for unit ii.

The vectorized score S_2 and matrix algebra

Recall the mean-score approximation that leads to the estimating equation used to solve for ϕ\phi (cf. equation (13) in the notes): S2(ϕ;ϕ̂(t),γ̂)=r=1n[δrs(ϕ;δr,xrmiss,xrout,yr)+(1δr)Ẽ0{s(ϕ;δ,xrmiss,xrout,Y)xrout;ϕ̂(t),γ̂}]=0. S_2(\phi; \hat\phi^{(t)}, \hat\gamma) = \sum_{r=1}^n \Big[ \delta_r s(\phi;\delta_r, x^{\text{miss}}_r, x^{\text{out}}_r, y_r) + (1-\delta_r)\widetilde{E}_0\{ s(\phi;\delta, x^{\text{miss}}_r, x^{\text{out}}_r, Y) \mid x^{\text{out}}_r;\hat\phi^{(t)},\hat\gamma\} \Big] = 0.

It is convenient to decompose this as the sum of observed and unobserved contributions: S2(ϕ)=Sobs(ϕ)+Sun(ϕ), S_2(\phi) = S_{\text{obs}}(\phi) + S_{\text{un}}(\phi), where Sobs(ϕ)=r:δr=1s(ϕ;δ=1,xrmiss,xrout,yr) S_{\text{obs}}(\phi) = \sum_{r:\,\delta_r=1} s(\phi;\delta=1, x^{\text{miss}}_r, x^{\text{out}}_r, y_r) is the score contribution from respondents, and the missing-unit contribution is approximated by the discrete-support expectation Sun(ϕ)i:δi=0j=1n1Wij(ϕ)s(ϕ;δ=0,ximiss,xiout,yj). S_{\text{un}}(\phi) \approx \sum_{i:\,\delta_i=0} \sum_{j=1}^{n_1} W_{ij}(\phi)\, s(\phi;\delta=0, x^{\text{miss}}_i, x^{\text{out}}_i, y_j). The remainder of this section explains how the double sum in Sun(ϕ)S_{\text{un}}(\phi) is computed via matrix operations.

The crucial observation for vectorization is that the inner conditional expectation for a non-respondent ii is approximated by a weighted finite-sum over the respondent support {yj}\{y_j\}: Ẽ0{s(ϕ;δ,ximiss,xiout,Y)xiout}j=1n1Wij(ϕ)s(ϕ;δ=0,ximiss,xiout,yj). \widetilde{E}_0\{ s(\phi;\delta, x^{\text{miss}}_i, x^{\text{out}}_i, Y) \mid x^{\text{out}}_i\} \approx \sum_{j=1}^{n_1} W_{ij}(\phi)\, s(\phi;\delta=0, x^{\text{miss}}_i, x^{\text{out}}_i, y_j).

Stacking the non-respondent expectations across all non-respondents gives a single matrix operation. Let

  • For each pair (i,j)(i,j) evaluate the vector-valued score s at (δ=0,ximiss,xiout,yj)(\delta=0, x^{\text{miss}}_i, x^{\text{out}}_i, y_j). Collect these quickly by exploiting the algebraic factorization of the score (see the s_function implementation): many parameter-specific components are separable in ii and jj which allows creation of low-memory representations.

  • Denote by Sij(0)S^{(0)}_{ij} the p-dimensional score vector at (i,j)(i,j). Organize these so that for each parameter index mm we have an n0×n1n_0\times n_1 matrix of values [S(0)]m[S^{(0)}_{\bullet\bullet}]_{m} (parameter-wise maps). Then the non-respondent contribution to the overall score vector is i=1n0j=1n1Wij(ϕ)Sij(0)=[vec(W[S(0)]1),,vec(W[S(0)]p)] \sum_{i=1}^{n_0} \sum_{j=1}^{n_1} W_{ij}(\phi)\, S^{(0)}_{ij} = \left[ \,\text{vec} \big( W \circ [S^{(0)}]_{1} \big)\, ,\, \ldots\, ,\, \text{vec}\big( W \circ [S^{(0)}]_{p} \big)\,\right]^\top where \circ denotes elementwise multiplication and vec followed by an appropriate collapse (row-sum or column-sum) implements the inner summation depending on the parameter’s factorization. In concrete computational terms:

  • For parameter components that multiply per-row (i.e. depend only on xix_i times a factor that is a function of yjy_j) we compute elementwise products between WW and a factor matrix and then row-sum across jj to get an n0×1n_0\times 1 contribution per non-respondent, and then sum across ii.

  • For intercept-like or column-wise components, a column-sum followed by weighted multiplication suffices.

In the implementation this reduces to a sequence of dense matrix operations and row/column sums rather than explicit loops over the expanded index set of length n0×n1n_0\times n_1. This yields large speed and memory benefits for real datasets.

Concise vectorized S_2 recipe (conceptual):

  1. Build FF (size n0×n1n_0\times n_1) via generate_conditional_density_matrix(model).
  2. Build CC (size n1n_1) via generate_C_matrix(model) by summing the conditional densities over respondents.
  3. For a candidate ϕ\phi compute O(ϕ)O(\phi) (size n0×n1n_0\times n_1) via generate_Odds(model,\phi).
  4. Form W(ϕ)=O(ϕ)F𝟏n0CW(\phi)=\dfrac{O(\phi)\circ F}{\mathbf{1}_{n_0} C^\top} (i.e. divide each column of OFO\circ F by the corresponding scalar CjC_j).
  5. Compute observed-score sum: Sobs(ϕ)=r:δr=1s(ϕ;δ=1,xrmiss,xrout,yr)S_{\text{obs}}(\phi)=\sum_{r:\,\delta_r=1} s(\phi;\delta=1,x^{\text{miss}}_r,x^{\text{out}}_r,y_r) (this is small: one score vector per respondent).
  6. Compute non-respondent expected-score: use W(ϕ)W(\phi) and parameter-wise factor matrices derived from s()s(\cdot) to compute Sun(ϕ)=i=1n0j=1n1Wij(ϕ)s(ϕ;δ=0,ximiss,xiout,yj)S_{\text{un}}(\phi)=\sum_{i=1}^{n_0}\sum_{j=1}^{n_1} W_{ij}(\phi)\, s(\phi;\delta=0,x^{\text{miss}}_i,x^{\text{out}}_i,y_j) implemented via matrix multiplications and row/column sums.
  7. Return the total score S2(ϕ)=Sobs(ϕ)+Sun(ϕ)S_2(\phi)=S_{\text{obs}}(\phi) + S_{\text{un}}(\phi).

The root-finder then searches for ϕ\phi such that S2(ϕ)=0S_2(\phi)=0. In the package this is implemented by repeatedly forming O(ϕ)O(\phi) for the current candidate ϕ\phi, computing W(ϕ)W(\phi) and S2(ϕ)S_2(\phi), and letting nleqslv perform the iteration.

Mean estimation and survey weights

After the response-model parameters ϕ̂\hat\phi are obtained the package reports a mean estimate of the target outcome using an inverse-probability reweighting form. Let respondents be indexed by j=1,,n1j=1,\dots,n_1; denote by πj=π(xjmiss,yj;ϕ̂)\pi_j = \pi(x^{\text{miss}}_j,y_j;\hat\phi) the fitted response probabilities evaluated at each respondent (here we write xjmissx^{\text{miss}}_j for the covariates used in the missingness model evaluated at respondent jj). With design weights wjw_j (by default wj1w_j\equiv 1 for non-survey data) the point estimator computed in the implementation is μ̂=j=1n1wjyj/πjj=1n1wj/πj. \hat\mu = \frac{\sum_{j=1}^{n_1} w_j\, y_j / \pi_j}{\sum_{j=1}^{n_1} w_j / \pi_j}.\tag{mean-est}

Notes on survey weights and f1f_1 fitting:

  • The conditional-density fit used to build FF can incorporate respondent sampling weights when provided (the implementation passes respondent weights to the density-fitting routine). Thus f1(x;γ)f_1(\cdot\mid x;\gamma) may be a weighted fit when design_weights or a survey design is supplied.
  • The mean-estimate (mean-est) also uses design weights wjw_j in both numerator and denominator as shown above. In code these are provided via model$design_weights and default to 1 for non-survey use.

In short: survey weights enter two places — (i) the conditional-density estimation (if requested) and (ii) the final mean calculation through the weighted IPW-type ratio in (mean-est).

Arguments passed to exptilt (summary)

The exptilt / exptilt.data.frame user-facing function accepts a number of arguments that control model specification and computation; the most important are:

  • data: a data.frame with outcome and covariates.
  • formula: a partitioned formula of the form y ~ aux1 + aux2 | miss1 + miss2 where the left part (before |) lists covariates_for_outcome and the right part lists covariates_for_missingness (the package helper splits these automatically).
  • auxiliary_means: (optional) population or target means for auxiliary variables used for scaling.
  • standardize: logical, whether to standardize features before fitting.
  • prob_model_type: character, either "logit" or "probit" to select the response probability family.
  • y_dens: choice of conditional-density family; "auto", "normal", "lognormal", or "exponential".
  • variance_method: "delta" or "bootstrap" for variance estimation.
  • bootstrap_reps: number of bootstrap replications when bootstrap is used.
  • control: list of control parameters forwarded to the nonlinear solver (nleqslv).
  • stopping_threshold: numeric; the sup-norm threshold for early stopping of the score (see above).
  • on_failure: behavior on failure ("return" or "error").
  • supress_warnings: logical to silence certain warnings.
  • design_weights: optional vector of respondent design weights (or full-sample weights which are subset internally).
  • survey_design: optional survey.design object; when provided some internal logic uses the survey path.
  • trace_level: integer controlling verbosity.
  • sample_size: integer for stratified subsampling used when data are large.
  • outcome_label, user_formula: utility arguments used for presentation and bookkeeping.

These arguments appear in the exptilt.data.frame function signature and control how the matrices FF, CC, and OO are built and how the solver is run.

Connection to the EM viewpoint

The EM-like update displayed in the theoretical notes (equation (14)) ϕ̂(t+1)solve S2(ϕϕ̂(t),γ̂)=0\hat\phi^{(t+1)} \leftarrow \text{solve } S_2(\phi \mid \hat\phi^{(t)},\hat\gamma)=0 is exactly what the implementation achieves: for a fixed conditional-density estimate γ̂\hat\gamma and a current ϕ̂(t)\hat\phi^{(t)}, the expectations for the missing units are approximated by the discrete support on observed yjy_j and the resulting equation for ϕ\phi is solved (via root-finding). The heavy-lift is performed by the matrix calculus described earlier — constructing FF, CC, OO and then computing WW and multiplying by the parameter-wise score factors.

Stopping criterion (maximum-norm)

Practical optimization requires a convergence criterion. The implementation uses the maximum absolute component of the vector-valued score as a stopping rule. Concretely, if the solver produces a score vector S2(ϕ)S_2(\phi) with maxm=1,,p|S2,m(ϕ)|<ε\max_{m=1,\dots,p} |S_{2,m}(\phi)| < \varepsilon for a user-specified stopping_threshold = \varepsilon, the algorithm treats this as converged. In the code this is used as an early-exit inside the target-function passed to the nonlinear solver: when the score’s sup-norm is below the threshold, a zero vector is returned to signal convergence and avoid further unnecessary computations.

This choice matches the intuition that the root-finder should stop when the estimating equations are (componentwise) negligibly small.

Practical tutorial: from raw data to matrix operations (conceptual steps)

  1. Fit the conditional density f1(yx;γ)f_1(y\mid x;\gamma) using respondents and covariates_for_outcome. This gives a function that evaluates f1(y,x)f_1(y, x) for any yy and any covariate row xx (used for both respondents and non-respondents).

  2. Evaluate the conditional density at the Cartesian product of non-respondent covariates and observed respondent yy values to form FF (done by generate_conditional_density_matrix). This is the empirical imputation support. Think of rows as target non-respondents and columns as candidate respondent outcomes.

  3. Evaluate the same conditional density at respondent covariates to form the column-normalizer CC (done by generate_C_matrix) — this is simply the column-sum of densities over respondents.

  4. For each trial value ϕ\phi of the response-model parameters, compute the odds matrix O(ϕ)O(\phi) using the separable linear predictor and link function (implemented efficiently via outer() in code). Combine OO with FF and normalize columns by CC to obtain W(ϕ)W(\phi).

  5. Use the vectorized s_function to obtain parameter-specific factor matrices for the non-respondent-imputed scores; multiply (elementwise) by W(ϕ)W(\phi) and reduce (row/column sums) to compute the non-respondent contribution to S2(ϕ)S_2(\phi).

  6. Add the observed-respondent score and use a root-finder (e.g. nleqslv) to find ϕ\phi with S2(ϕ)=0S_2(\phi)=0. The solver may use the maximum-norm stopping threshold described above to exit early.

Why the vectorization matters (practical remarks)

  • Memory: the naive expansion to an explicit dataset of size n0×n1n_0\times n_1 would store duplicated covariate rows and blow up memory. The implemention exploits separability (intercept + XimissβmissX^{\text{miss}}_i\beta_{\text{miss}} + βyyj\beta_y y_j) and vectorized R primitives (outer, matrix multiplications, column/row sums) to avoid large temporary allocations.
  • Speed: elementwise operations on dense matrices plus BLAS-accelerated matrix multiplication are much faster than interpreted loops in R for typical dataset sizes.
  • Clarity: organizing logic as the three matrices FF, CC, and OO, followed by elementwise combination and reductions, makes the relationship between the statistical approximation and the implementation transparent and easier to reason about.

Closing notes and references

This vignette mapped the implementation functions to the math in the theoretical notes and showed how the EM-like mean-score step reduces to a small set of matrix operations. The implementation follows the ideas described in Riddles et al. (2016) for exponential tilting under NMAR: fit the conditional density on respondents, approximate the missing-data expectation by a finite sum over observed yy values, and solve the resulting estimating equations for the missingness-model parameters.