Empirical Likelihood Theory for NMAR

Notation

Units

$i = 1, \ldots, n$ index respondents (those with observed $Y$ )
$R_i \in \{0, 1\}$ is the response indicator. We work on observed subset $R_i = 1$

Data

Outcome: $Y_i$ (observed when $R_i = 1$ , missing otherwise)
Response covariates: row vector $Z_i \in \mathbb{R}^K$ , the $i$ th row of the missingness-model design matrix built from the formula as an intercept, the LHS outcome expression (evaluated in the model frame), and any additional missingness predictors on the RHS after |
Auxiliary covariates: row vector $X_i \in \mathbb{R}^L$ (possibly $L = 0$ ), from auxiliary RHS, without intercept
Population auxiliary means: $\mu_x \in \mathbb{R}^L$ , known. Names match columns of $X$

Mapping to code: - response_model_matrix (the missingness_design in el_prepare_inputs()) corresponds to $Z$ and has columns: (Intercept), the evaluated LHS outcome expression, and any RHS2 predictors (after the | in the formula) - auxiliary_matrix corresponds to $X$ . We center it in code as $X - \mu_x$

Response Model

Linear predictor: $\eta_i = Z_i \, \beta$
Response probability: $w_i \equiv g(\eta_i) = \mathrm{linkinv}(\eta_i)$
First derivative: $\dfrac{dw}{d\eta}(\eta_i) = \mu_{\eta,i} = \mathrm{mu.eta}(\eta_i)$
Second derivative: $\dfrac{d^2 w}{d\eta^2}(\eta_i) = \mathrm{d2mu.deta2}(\eta_i)$ Here linkinv, mu.eta, and d2mu.deta2 refer to the chosen response family (logit or probit). We follow the paper’s $w_i$ notation for the response probability and reserve $p_i^{\text{EL}}$ for empirical-likelihood weights.

Weight Re-parameterization

$W \in (0,1)$ nuisance scalar. We parameterize via $z = \text{logit}(W)$ for stability and set $W = \text{plogis}(z)$
$\lambda_W \in \mathbb{R}$ and $\lambda_x \in \mathbb{R}^L$ are EL Lagrange multipliers for the $W$ -constraint and the auxiliary constraints

EL Weights

Denominator: $D_i = 1 + \lambda_W (w_i - W) + (X_i - \mu_x)^T \lambda_x$
Base sampling weights: $a_i = 1$ (IID) or $a_i =$ survey base weight for respondent $i$
EL weights for respondents: $p_i^{\text{EL}} \propto a_i / D_i$ (proportionality normalized by totals below)

Estimator

$\hat{Y} = \sum p_i^{\text{EL}} Y_i / \sum p_i^{\text{EL}}$

Notation at a Glance

Symbol	Meaning
$i$	Respondent index (rows with observed $Y$ )
$Y_i$	Outcome for unit $i$ (observed if $R_i=1$ )
$Z_i$	Row of response design matrix (includes intercept)
$X_i$	Row of auxiliary design (no intercept)
$\mu_x$	Known population means of auxiliaries (vector)
$\beta$	Response-model coefficients
$\eta_i=Z_i\beta$	Linear predictor for response model
$w_i$	$\mathrm{linkinv}(\eta_i)$ (logit: $\mathrm{plogis}$ ; probit: $\Phi$ )
$\mu_{\eta,i}$	$\dfrac{dw_i}{d\eta_i}$
$\lambda_W$	Multiplier for the $W$ -constraint $\sum (w_i-W)/D_i=0$
$\lambda_x$	Multipliers for auxiliary constraints $\sum (X_i-\mu_x)/D_i=0$
$D_i$	$1+\lambda_W(w_i-W)+(X_i-\mu_x)^T\lambda_x$
$a_i$	Base weight (IID: 1; survey: design weight)
$p_i^{\mathrm{EL}}$	Empirical-likelihood weight $\propto a_i/D_i$
$\hat Y$	$\sum p_i^{\mathrm{EL}} Y_i/\sum p_i^{\mathrm{EL}}$
$F(\theta)$	Stacked estimating system (beta, W, constraints)
$A$	Jacobian $\partial F/\partial \theta$

Note: QLS use $\theta$ for the response-model parameter. In this vignette that parameter is $\beta$ . We use $\theta$ for the full stacked unknown vector solved by the EL engine.

Engines

Family: “logit” (default) or “probit”. For respondents ( $R_i = 1$ ), the score with respect to $\eta$ is $s_i = \partial\log w_i/\partial\eta_i = \mu_{\eta,i}/w_i$ (equals $1 - w_i$ for logit and $\phi(\eta_i)/\Phi(\eta_i)$ for probit). In code we compute these using stable log-domain formulas for probit and clip probabilities away from 0 and 1 when they appear in ratios.
Scaling: optional standardization of design matrices and $\mu_x$ via nmar_scaling_recipe

Data and Interface Constraints

Before applying the EL equations, the implementation enforces several constraints on the formula and data (el_prepare_inputs() in src_dev/engines/el/impl/input.R and the entry points in src_dev/engines/el/impl/dataframe.R and src_dev/engines/el/impl/survey.R):

Single outcome source: The LHS expression must reference exactly one outcome source variable in data (for example Y_miss). Any transformation is applied to this variable in the model frame, and the transformed values must be finite for all respondent rows (no new NA/NaN introduced among $R_i=1$ ).
Outcome only via LHS in the response model: The raw outcome variable and the LHS expression are not allowed on RHS1 (auxiliaries) or RHS2 (missingness predictors), either explicitly or via . expansion. The response model uses the evaluated LHS outcome column as a dedicated predictor in missingness_design, together with an intercept and any additional RHS2 predictors.
Auxiliaries among respondents: Auxiliary variables (RHS1) must be fully observed and non-constant among respondents. If auxiliary_means are not supplied, auxiliaries must be fully observed in the full data so that population means can be estimated from the sample.
Missingness predictors among respondents: Missingness predictors (RHS2) must be fully observed among respondents. Zero-variance predictors are allowed but generate a warning. Their columns still enter the response model design matrix.
Respondents-only data: When the outcome has no missing values (respondents-only data), the EL engines require n_total to be supplied so that $N_{\text{pop}}$ can be set on the analysis scale. If auxiliaries are requested in this setting, auxiliary_means must also be supplied, otherwise the engines error with a descriptive message.

From Paper to Implementation: Core Ideas

The paper (Qin-Leung-Shao, JASA 2002) sets EL under nonignorable response using:

Empirical likelihood weights for respondents that satisfy:
- Zero-sum residual: $\sum p_i^{\text{EL}} (w_i - W) = 0$
- Auxiliary moments: $\sum p_i^{\text{EL}} (X_i - \mu_x) = 0$
A response model probability $w_i = g(\eta_i)$ , $\eta_i = Z_i \, \beta$

In our code, we adopt the same EL structure and estimating equations. We extend it to arbitrary $Z$ and $X$ , and to survey designs. For uncertainty, we provide bootstrap variance (IID resampling and survey replicate-weight bootstrap). Because $\hat Y$ is a ratio-of-weights estimator, any common normalization of $p_i^{\text{EL}} \propto a_i/D_i$ cancels in $\hat Y$ . Only relative weights matter (the KKT multipliers $\lambda$ enforce the constraints. Normalization affects only a common scale that vanishes in the ratio).

Closed-form $\lambda_W$

QLS derive $\lambda_W = (N/n - 1)/(1-W)$ when $a_i \equiv 1$ and $n$ counts respondents. In our IID (data-frame) path we reuse this relation with base weights fixed at $a_i \equiv 1$ :

Let $n_{\text{resp\_weighted}} = \sum_{i:R_i=1} a_i$ be the respondent-weighted total.
Let $N_{\text{pop}}$ be the analysis-scale population total (by default nrow(data), or a user-supplied n_total).

In the data.frame path el_prepare_inputs() is called with weights = NULL, so respondent_weights are identically 1 and $n_{\text{resp\_weighted}} = n$ . We then set

$\lambda_W = \frac{N_{\text{pop}}/n_{\text{resp\_weighted}} - 1}{1 - W},$

which reduces exactly to the original QLS formula when $N_{\text{pop}}$ is the total number of sampled units. This closed form is used only in the IID path to profile out $\lambda_W$ . For complex survey designs we instead treat $\lambda_W$ as a free parameter and solve for it jointly with $(\beta, W, \lambda_x)$ via the design-weighted system described in the survey extension section.

Guarding and Numerical Stability

We solve the stacked system with a consistent guarding policy across equations, Jacobian, and post-solution:

Cap $\eta$ : $\eta \leftarrow \max\{\min(\eta,\,\eta_{\max}),\,-\eta_{\max}\}$
Compute $w = \mathrm{linkinv}(\eta)$ . Clip $w$ to $[10^{-12}, 1-10^{-12}]$ when used in ratios
Guard denominators: $D_i \leftarrow \max\{D_i,\,\delta\}$ with a small $\delta>0$
In the Jacobian, multiply terms involving $\partial(1/D_i)/\partial\cdot$ by $\mathbb{1}\{D_i^{\text{raw}} > \delta\}$ so the analytic Jacobian matches the piecewise-smooth equations being solved

For the probit link, $s_i(\eta) = \partial\log w/\partial\eta = \phi(\eta)/\Phi(\eta)$ (Mills ratio) is computed in the log domain for stability. Its derivative is $\frac{d s_i}{d\eta} = -\eta\,s_i - s_i^2$ .

Equation Crosswalk

QLS (5): Discrete mass form for $p_i$ with two multipliers -> Our $D_i = 1 + \lambda_W (w_i - W) + (X_i - \mu_x)^T \lambda_x$ and $p_i^{\text{EL}} \propto a_i/D_i$ .
QLS (7): $\sum \dfrac{x_i - \mu_x}{1 + \cdots} = 0$ (or with $\mu_x$ replaced by $\bar X$ when auxiliary variables are observed for all sampled units) -> Our auxiliary constraints $\sum a_i (X_i - \mu_x)/D_i = 0$ , where $\mu_x$ is taken from auxiliary_means if supplied, otherwise estimated from the full input (unweighted for IID, design-weighted for surveys).
QLS (8): $\sum \dfrac{w_i - W}{1 + \cdots} = 0$ -> Our $W$ -equation $\sum a_i (w_i - W)/D_i = 0$ .
QLS (10): $\hat{\lambda}_2 = (N/n - 1)/(1 - W)$ -> In the IID path we set $\lambda_W = ((N_{\text{pop}}/n_{\text{resp\_weighted}}) - 1)/(1 - W)$ . In the survey path $\lambda_W$ is solved from the additional linkage equation $g_W^{(1)}$ .
Estimator $\hat Y$ in QLS -> Our ratio $\hat Y = \sum p_i^{\mathrm{EL}} Y_i/\sum p_i^{\mathrm{EL}}$ using $p_i^{\mathrm{EL}} \propto a_i/D_i$ .

Likelihood and Profiling sketch

QLS start from the factorized semiparametric likelihood (their Eq. (2)): $\mathcal{L}(\beta, W, F) = \left\{\prod_{i=1}^{n} \frac{w(Y_i, X_i; \beta)\, dF(Y_i, X_i)}{W}\right\} W^{n}(1-W)^{N-n},$ where $W=\iint w(y,x;\beta)\,dF(y,x)$ is the unconditional response rate. The $W^{-n}$ factor in the conditional likelihood cancels the $W^n$ in the binomial term, so the overall likelihood is equivalently proportional to $\left\{\prod_{i=1}^{n} w(Y_i, X_i; \beta)\, dF(Y_i, X_i)\right\}(1-W)^{N-n}.$

Maximization is subject to (i) $\int dF = 1$ , (ii) $\int X \, dF = \mu_x$ (or $\bar X$ when applicable), and (iii) $\int w(Y,X;\beta)\, dF = W$ . Discretizing $F$ at observed respondents by assigning unknown masses $p_i$ and introducing multipliers $\lambda$ , the KKT conditions yield the familiar EL weight form with denominator

$D_i \;=\; 1 + \lambda_W (w_i - W) + (X_i - \mu_x)^\top \lambda_x,$

and, with base weights $a_i$ , the working masses are proportional to $a_i / D_i$ .

Remark on conditioning: QLS’s Eq. (2) writes the first product as $\prod_i [\, w(y_i,x_i;\beta)\,dF(y_i,x_i)/W\,]$ so that it explicitly represents the likelihood of $(Y_i,X_i)$ conditional on $R_i=1$ . Multiplying by the binomial term $W^n(1-W)^{N-n}$ yields the same overall likelihood as above because the $W^{-n}$ in the first factor cancels the $W^n$ in the second. Both factorizations lead to the same estimating equations and the same profiled log-likelihood form used subsequently in QLS after introducing the multipliers.

KKT and Denominator

There are two closely related objects in the EL construction:

The unknown conditional masses $p_i$ on respondent support points (these are the $p_i$ in QLS).
The probability mass weights actually used to form expectations under the discretized law. For surveys this mass is proportional to $a_i p_i$ (because $a_i$ represents how many population units respondent $i$ stands for).

In a survey-weighted setting (with base weights $a_i$ acting as multiplicities), we can write the discretized empirical distribution as $F_{\text{EL}}(A) = \sum_i a_i p_i\,\mathbf{1}\{(y_i,x_i)\in A\},$ with constraints $\sum_i a_i p_i = 1,\qquad \sum_i a_i p_i(X_i-\mu_x)=0,\qquad \sum_i a_i p_i(w_i-W)=0.$

Introducing Lagrange multipliers $(\lambda_0,\lambda_x,\lambda_W)$ for these constraints and profiling the $p_i$ ’s gives the KKT stationarity conditions

$\frac{\partial}{\partial p_i} \Big[ \sum_j a_j\, \log p_j - \lambda_0 (\sum_j a_j p_j - 1) - \lambda_x^T \sum_j a_j p_j (X_j - \mu_x) - \lambda_W \sum_j a_j p_j (w_j - W) \Big] = 0,$

which solve to

$p_i \;\propto\; \frac{1}{\,1 + \lambda_x^T (X_i-\mu_x) + \lambda_W (w_i - W)\,} \;\equiv\; \frac{1}{D_i}.$

Normalizing to enforce $\sum_i a_i p_i = 1$ yields $p_i = \frac{D_i^{-1}}{\sum_j a_j D_j^{-1}}.$ The probability mass weight placed on respondent $i$ under $F_{\text{EL}}$ is then $a_i p_i = \frac{\displaystyle a_i D_i^{-1}}{\displaystyle \sum_j a_j D_j^{-1}} \;\propto\; \frac{a_i}{D_i}.$ In the implementation we store unnormalized EL masses $m_i = a_i/D_i$ and use probability-scale weights $p_i^{\text{EL}} = m_i/\sum_j m_j$ for expectations.

$m_i \;\propto\; \frac{a_i}{D_i} \quad \text{with} \quad D_i = 1 + \lambda_W (w_i - W) + (X_i - \mu_x)^T\lambda_x.$

The EL weights $p_i^{\text{EL}}$ are then used to build the mean estimator

$\hat Y \;=\; \frac{\sum_i p_i^{\text{EL}} Y_i}{\sum_i p_i^{\text{EL}}}.$

The remaining unknowns $(\beta, W, \lambda_x)$ (and $\lambda_W$ in the survey system) are determined by the estimating equations below.

Relationship Between $W$ and $\lambda_W$

In the IID (data-frame) path, the EL multiplier for the response-rate constraint is expressed as

$\lambda_W = \frac{C}{1 - W}, \quad \text{with } C = \frac{N_{\text{pop}}}{n_{\text{resp\_weighted}}} - 1 \text{ and } W = \text{plogis}(z).$

Intuition: In the EL KKT system, the constraint $\sum p_i^{\text{EL}} (w_i - W) = 0$ sits alongside normalization and auxiliary constraints. Incorporating base weights $a_i$ and the ratio between population and respondent totals induces a scaling of the multiplier linked to the mass constraint. Writing $\lambda_W$ in this scaled form keeps the parameter on a numerically stable scale and lets the derivative structure (with respect to $z$ via $W$ ) be handled cleanly. This is consistent with the EL structure when the baseline mass is $n_{\text{resp\_weighted}}$ and the “full population” target is $N_{\text{pop}}$ , and it is exactly what the IID code path uses to match the normalization implied by base weights.

Derivation sketch (KKT, IID case): The discretized semiparametric likelihood (QLS, 2002) maximizes, over the unknown masses $\{p_i\}$ at observed points and over $(\beta, W)$ ,

$\ell(\beta, W, \lambda_x, \lambda_W) \;=\; \sum_{i=1}^{n} \log w_i(\beta) \; +\; (N_{\text{pop}} - n_{\text{resp\_weighted}}) \log(1 - W) \; -\; \sum_{i=1}^{n} \log\!\Big(1 + (X_i - \mu_x)^\top \lambda_x + \lambda_W (w_i - W)\Big),$

subject to the normalization and moment constraints that generate the EL denominator. In the IID QLS case ( $a_i \equiv 1$ ), profiling the $p_i$ ’s under $\sum_i p_i = 1$ gives $p_i = 1/(n D_i)$ and therefore $\sum_i D_i^{-1} = n$ . Combining this identity with the first-order condition for $W$ yields the closed form

$\lambda_W \;=\; \frac{\tfrac{N_{\text{pop}}}{n_{\text{resp\_weighted}}} - 1}{1 - W} \;=\; \frac{C}{1 - W},$

which coincides with QLS (10) when $a_i \equiv 1$ . This closed-form relationship is used in the IID EL implementation to profile out $\lambda_W$ . In the survey-design path, by contrast, $\lambda_W$ is treated as an explicit unknown and the linkage between $W$ and $\lambda_W$ is enforced through the additional equation

$g_W^{(1)}(\beta, W, \lambda_W, \lambda_x) \;=\; \frac{T_0}{1 - W} - \lambda_W \sum_{i \in R} \frac{d_i}{D_i} = 0,$

where $T_0 = N_{\mathrm{pop}} - \sum_{i \in R} d_i$ and $d_i$ are the design weights. At the QLS simple-random-sampling limit (equal weights, no auxiliaries) this system reduces to the same closed-form relation.

Estimating Equations

Unknown parameters: $\beta \in \mathbb{R}^K$ , $z \in \mathbb{R}$ (for $W = \text{plogis}(z)$ ), $\lambda_x \in \mathbb{R}^L$ ; define $\theta = (\beta, z, \lambda_x)$ .

Define $w_i = \mathrm{linkinv}(\eta_i)$ and $\mu_{\eta,i} = \frac{dw}{d\eta}(\eta_i)$ (denoted mu.eta(eta_i) in code).

In the IID (data-frame) path all base weights are $a_i \equiv 1$ , so we can use the closed-form Qin-Leung-Shao (QLS) relation between $W$ and the EL multiplier for the response constraint. Writing $C = \frac{N_{\text{pop}}}{n_{\text{resp\_weighted}}} - 1,\qquad n_{\text{resp\_weighted}} = \sum_i a_i,$ QLS show that $\lambda_W = \frac{C}{1 - W} = \frac{N_{\text{pop}}/n_{\text{resp\_weighted}} - 1}{1 - W}.$ Our IID implementation follows this and profiles out $\lambda_W$ : the unknowns for the Newton solver are $(\beta, z, \lambda_x)$ .

In the survey path, base weights are general design weights $a_i = d_i$ and the corresponding QLS-style relation no longer has a simple closed form. In that case we treat $\lambda_W$ as an additional free parameter and include a separate equation linking $\lambda_W$ and $W$ (see the “Survey extension” section below).

Denominator: $D_i = 1 + \lambda_W (w_i - W) + (X_i - \mu_x)^T \lambda_x$ , with $D_i \geq \epsilon$ enforced numerically.

Define the score term $s_i = \mu_{\eta,i}/w_i$ (the unit-level contribution to the log-likelihood score with respect to $\eta$ ). For logit, $s_i = 1 - w_i$ . For probit, $s_i$ behaves like $\phi(\eta_i)/\Phi(\eta_i)$ when $w_i$ is bounded away from 0 via clipping (as implemented).

Intuition (why this score appears): for each respondent we observe $R_i=1$ , so the Bernoulli log-likelihood contribution of the response model is $\log w_i(\eta_i)$ . Differentiating w.r.t. the linear predictor gives $\frac{\partial}{\partial\eta_i} \log w_i(\eta_i) \,=\, \frac{1}{w_i}\, \frac{dw_i}{d\eta_i} \,=\, \frac{\mu_{\eta,i}}{w_i} \;\equiv\; s_i.$

Thus $s_i$ measures the local sensitivity of the observed-response likelihood to $\eta_i$ . In the logit family, $\mu_{\eta,i}=w_i(1-w_i)$ so $s_i=1-w_i$ -the familiar residual-like term. In the probit family, $s_i=\phi(\eta_i)/\Phi(\eta_i)$ , the (inverse) Mills ratio. The EL $\beta$ -equations balance this likelihood score against the EL penalty term $\lambda_W\,\mu_{\eta,i}/D_i$ , enforcing the calibration constraints while fitting the response model.

The System of Estimating Equations

$\beta$ -equations ( $K$ equations): $\sum a_i Z_i [s_i - \lambda_W \mu_{\eta,i} / D_i] = 0$

W-equation (1 equation): $\sum a_i (w_i - W) / D_i = 0$

Auxiliary constraints ( $L$ equations): $\sum a_i (X_i - \mu_x) / D_i = 0$

These are exactly how el_build_equation_system constructs the function in code (src_dev/engines/el/impl/equations.R).

Intuition: the $\beta$ -equations equate the score of the respondent log-likelihood with the EL penalty term $\lambda_W \mu_{\eta,i}/D_i$ . The $W$ -equation centers the modeled response probabilities around the unconditional mean $W$ under the EL weights. The auxiliary equations calibrate the centered auxiliaries to zero mean under the EL weights.

Code cross-reference

This table maps the theory blocks to the exact builders and argument/variable names in: src_dev/engines/el/impl/equations.R and src_dev/engines/el/impl/jacobian.R.

Estimating-equation builders (`equations.R`)

Theory block	IID (data.frame) implementation	Survey (survey.design) implementation	Code identifiers used inside the closure
Unknown vector $\theta$	`el_build_equation_system(...)(params)` with `params = c(beta, z, lambda_x)`	`el_build_equation_system_survey(...)(params)` with `params = c(beta, z, lambda_W, lambda_x)`	`beta_vec`, `z`, `W <- plogis(z)`, `lambda_x`, (survey) `lambda_W`
Denominator $D_i$	`el_denominator(lambda_W, W_bounded, Xc_lambda, w_i, denom_floor)`	`el_denominator(lambda_W, W_bounded, Xc_lambda, w_i, denom_floor)`	`dpack$denom` (guarded), `inv_denominator <- dpack$inv`, `active <- dpack$active`
$w_i$ and derivatives	`el_core_eta_state(family, eta_raw, ETA_CAP)`	`el_core_eta_state(family, eta_raw, ETA_CAP)`	`eta_raw`, `w_i`, `mu_eta_i`, `s_eta_i`
$\beta$ equations $g_\beta$	`eq_betas <- shared_weighted_Xty(missingness_model_matrix, respondent_weights, beta_eq_term)`	same	`missingness_model_matrix`, `respondent_weights`, `beta_eq_term <- s_eta_i - lambda_W * mu_eta_i * inv_denominator`
$W$ constraint $g_W^{(2)}$	`eq_W <- crossprod(respondent_weights * inv_denominator, (w_i - W_bounded))`	`eq_W_constraint <- crossprod(respondent_weights * inv_denominator, (w_i - W_bounded))`	`w_i`, `W_bounded`, `inv_denominator`
Auxiliary constraints $g_x$	`eq_constraints <- shared_weighted_Xty(X_centered, respondent_weights, inv_denominator)`	same	`X_centered <- sweep(auxiliary_matrix, 2, mu_x_scaled, "-")`
$\lambda_W$ profiling / linkage	IID: `lambda_W <- el_lambda_W(C_const, W_bounded)` with `C_const <- (N_pop/n_resp_weighted) - 1`	Survey: `eq_W_link <- (T0/(1-W_bounded)) - lambda_W * sum_d_over_D`	IID: `C_const`, `n_resp_weighted`; Survey: `T0 <- N_pop - n_resp_weighted`, `sum_d_over_D <- crossprod(respondent_weights, inv_denominator)`

Analytic Jacobian builders (`jacobian.R`)

Object	IID (data.frame) builder	Survey (survey.design) builder	Notes on block ordering / names
$A(\theta) = \partial F/\partial\theta$	`el_build_jacobian(...)(params)`	`el_build_jacobian_survey(...)(params)`	Both return a square matrix `full_mat` with parameter ordering matching the corresponding `equations.R` closure.
Parameter ordering	`params = c(beta, z, lambda_x)`	`params = c(beta, z, lambda_W, lambda_x)`	Indices in code: IID uses `idx_beta`, `idx_W`; survey uses `idx_beta`, `idx_z`, `idx_lambdaW`, `idx_lambda_x`.
Equation ordering (rows)	`c(beta eqs, W eq, aux eqs)`	`c(beta eqs, W constraint, aux eqs, W link)`	Survey row indices are annotated in `el_build_jacobian_survey()` as `idx_eq_beta`, `idx_eq_W`, `idx_eq_aux`, `idx_eq_link`.
Guard consistency	`el_denominator(...); active <- dpack$active`	same	Terms involving derivatives of `1/D_i` are multiplied by `active` to match the denominator floor in the equations.

Survey extension: design-weighted QLS system

The original QLS paper derives these equations under simple random sampling, where each respondent has equal weight. In practice we often work with complex survey designs and design weights $d_i \approx 1/\pi_i$ , where $\pi_i$ is the inclusion probability for unit $i$ . In our implementation we extend the QLS system using a design-weighted empirical likelihood:

For respondents $i \in R$ we use base weights $a_i = d_i$ .
We approximate the unknown distribution of $(Y,X)$ by a discrete measure $F_{\text{EL}}(A) = \sum_{i\in R} d_i p_i\,\mathbf{1}\{(y_i,x_i)\in A\},$ where $p_i \ge 0$ and $\sum_i d_i p_i = 1$ .
Expectations under $F$ are represented by design-weighted sums $\sum_i d_i p_i(\cdot)$ .

We impose the following constraints, which are the design-weighted analogues of QLS (3):

Normalization: $\sum_{i\in R} d_i p_i = 1.$
Response-rate constraint: $\sum_{i\in R} d_i p_i \bigl(w_i(\theta) - W\bigr) = 0.$
Auxiliary constraints (vector case): $\sum_{i\in R} d_i p_i (X_i - \mu_x) = 0.$

Maximizing the design-weighted pseudo-likelihood under these constraints yields EL weights of the same tilted form as in QLS: $p_i(\theta, W, \lambda_x, \lambda_W) \;\propto\; \frac{1}{D_i},\qquad D_i = 1 + \lambda_W\bigl(w_i(\theta) - W\bigr) + (X_i - \mu_x)^\top \lambda_x,$ with the proportionality constant chosen such that $\sum_i d_i p_i = 1$ . In our implementation the unnormalized EL masses are $m_i = \frac{d_i}{D_i},$ and the probability-scale weights are $p_i^{\mathrm{EL}} = m_i / \sum_j m_j$ .

The corresponding design-weighted QLS estimating system in $(\beta, W, \lambda_W, \lambda_x)$ can be written as:

Auxiliary block: $g_x(\beta, W, \lambda_W, \lambda_x) = \sum_{i\in R} d_i \frac{X_i - \mu_x}{D_i} = 0.$
Response-rate constraint: $g_W^{(2)}(\beta, W, \lambda_W, \lambda_x) = \sum_{i\in R} d_i \frac{w_i(\beta) - W}{D_i} = 0.$
Score equations for $\beta$ : $g_\beta(\beta, W, \lambda_W, \lambda_x) = \sum_{i\in R} d_i \left[ \frac{\partial \log w_i(\beta)}{\partial \beta} - \lambda_W \frac{1}{D_i} \frac{\partial w_i(\beta)}{\partial \beta} \right] = 0.$
Linkage between $\lambda_W$ and the nonrespondent total: $g_W^{(1)}(\beta, W, \lambda_W, \lambda_x) = \frac{T_0}{1 - W} - \lambda_W \sum_{i\in R} \frac{d_i}{D_i} = 0,$ where $T_0 = N_{\mathrm{pop}} - \sum_{i\in R} d_i$ on the analysis scale.

In code this system is implemented by el_build_equation_system_survey() in src_dev/engines/el/impl/equations.R. The parameter vector is $\theta_{\text{survey}} = (\beta, z, \lambda_W, \lambda_x),\qquad z = \operatorname{logit}(W),$ and the solver treats $\lambda_W$ as an explicit unknown. When all design weights are equal and $N_{\text{pop}}$ and the respondent count match the simple random sampling setup, this system reduces exactly to the original QLS equations (6)-(10).

For survey designs we build an analytic Jacobian for this design-weighted system whenever the response family supplies a second derivative d2mu.deta2 (logit and probit). The Jacobian structure mirrors the IID case but with the expanded parameter vector $(\beta, z, \lambda_W, \lambda_x)$ and the additional blocks for $g_W^{(1)}$ and $g_W^{(2)}$ . When analytic derivatives are not available, nleqslv falls back to numeric/Broyden Jacobians.

Strata augmentation

For some stratified designs, especially when the NMAR mechanism varies strongly across strata, it is important that the EL weights preserve the stratum composition implied by the survey design. Following ideas from Wu (2005), we augment the auxiliary vector with stratum indicators when a survey.design object is provided:

Recover a strata factor from the design (prefer design$strata. Fall back to the original strata= call when needed).
Build dummy variables for strata (dropping one reference level).
Compute stratum totals $N_h$ on the analysis scale from the design weights and convert to stratum shares $W_h = N_h / N_{\mathrm{pop}}$ .
Append these stratum dummies to the auxiliary matrix $X$ and their targets $W_h$ to the auxiliary means.

The EL constraints then include additional terms of the form $\sum_{i\in R} d_i p_i \bigl(\mathbf{1}\{H_i = h\} - W_h\bigr) = 0$ for each nonreference stratum $h$ . This forces the EL weights to reproduce the design-implied stratum shares while still adjusting within strata for NMAR. In the implementation this augmentation is performed in the survey entry point (src_dev/engines/el/impl/survey.R) before auxiliary means are resolved, and the resulting augmented auxiliaries flow through to el_build_equation_system() or el_build_equation_system_survey() depending on the data type. The behavior is controlled by the logical strata_augmentation argument of el_engine() (default TRUE). It has an effect only when data is a survey.design with defined strata.

Implementation detail: when the user does not supply auxiliary_means, the targets for the augmented stratum indicators are obtained automatically as the design-weighted means of those dummy columns in the full sample (via el_resolve_auxiliaries()), which equals the design-implied stratum shares on the analysis scale. When the user does supply auxiliary_means, the augmentation appends the implied $W_h$ targets to that vector.

Our survey EL implementation should be viewed as a design-weighted analogue of QLS, informed by the pseudo empirical likelihood literature (Chen and Sitter 1999; Wu 2005), rather than a verbatim implementation of any single paper.

Analytic Jacobian

For the IID (data-frame) path we differentiate $F(\theta) = 0$ with respect to $\theta = (\beta, z, \lambda_x)$ . Let:

$\eta_i = Z_i \beta$ , $w_i = \text{linkinv}(\eta_i)$ , $\mu_{\eta,i} = \dfrac{dw}{d\eta}(\eta_i)$ , $\mu''_i = \dfrac{d^2 w}{d\eta^2}(\eta_i)$
$W = \text{plogis}(z)$ , $\frac{dW}{dz} = W(1 - W)$
$\lambda_W = \frac{C}{1 - W}$ , so $\frac{d\lambda_W}{dW} = \frac{C}{(1 - W)^2}$ and $\frac{d\lambda_W}{dz} = \frac{d\lambda_W}{dW} \cdot \frac{dW}{dz}$
$X_{\text{centered},i} = X_i - \mu_x$

Intermediate Derivatives

$s_i = \mu_{\eta,i}/w_i \Rightarrow \;\frac{ds_i}{d\eta_i} = (\mu'_{\eta,i}w_i - \mu_{\eta,i}^2)/w_i^2$ with $\mu'_{\eta,i} = \dfrac{d\mu_{\eta,i}}{d\eta_i} = \dfrac{d^2 w}{d\eta_i^2} \equiv \mu''_i$ (this is d2mu.deta2(eta_i) in code)
Di=1+λW(wi−W)+Xcentered,iTλxD_i = 1 + \lambda_W (w_i - W) + X_{\text{centered},i}^T \lambda_x
- $\frac{\partial D_i}{\partial \eta_i} = \lambda_W \mu_{\eta,i}$
- $\frac{\partial D_i}{\partial z} = \frac{\partial \lambda_W}{\partial z} \cdot (w_i - W) - \lambda_W \cdot \frac{dW}{dz}$
- $\frac{\partial D_i}{\partial \lambda_x} = X_{\text{centered},i}$

Define $\text{inv}_i = 1 / D_i$ and the scalar term driving $\beta$ -equations:

$T_i = s_i - \lambda_W \mu_{\eta,i} \text{inv}_i,\quad s_i = \frac{\mu_{\eta,i}}{w_i}.$

For the logit and probit families we use simpler closed-form derivatives of $s_i$ in code: for logit, $ds_i/d\eta_i = -\mu_{\eta,i}$ (because $s_i = 1 - w_i$ ). For probit, $ds_i/d\eta_i = -(\eta_i r_i + r_i^2)$ with $r_i = \phi(\eta_i)/\Phi(\eta_i)$ the Mills ratio. The expression above is kept as the generic fallback for other families.

Compute Its Derivatives

Using $\,\mu'_{\eta,i} = d\mu_{\eta,i}/d\eta_i = \mathrm{d2mu\,deta2}(\eta_i)$ and $\,dw_i/d\eta_i = \mu_{\eta,i}$ ,

$\frac{\partial s_i}{\partial \eta_i} = \frac{\mu'_{\eta,i} w_i - \mu_{\eta,i}^2}{w_i^2}.$

Also $\,\frac{\partial \text{inv}_i}{\partial \eta_i} = -\text{inv}_i^2 \cdot \frac{\partial D_i}{\partial \eta_i} = -\text{inv}_i^2 (\lambda_W \mu_{\eta,i})$ . Therefore

$\frac{\partial T_i}{\partial \eta_i} = \frac{\mu'_{\eta,i} w_i - \mu_{\eta,i}^2}{w_i^2} - \lambda_W \mu'_{\eta,i} \text{inv}_i + \lambda_W^2 (\mu_{\eta,i})^2 \text{inv}_i^2.$

$\frac{\partial T_i}{\partial z} = -\frac{\partial \lambda_W}{\partial z} \cdot \mu_{\eta,i} \text{inv}_i + \lambda_W \mu_{\eta,i} \text{inv}_i^2 \cdot \frac{\partial D_i}{\partial z}$

$\frac{\partial T_i}{\partial \lambda_x} = \lambda_W \mu_{\eta,i} \text{inv}_i^2 \cdot X_{\text{centered},i}$

Assemble Jacobian Blocks

$J_{\beta\beta}$ ( $K \times K$ ): $J_{11} = \sum a_i Z_i^T \left[ \frac{\partial T_i}{\partial \eta_i} \right] Z_i$

$J_{\beta z}$ ( $K \times 1$ ): $J_{12} = \sum a_i Z_i^T \left[ \frac{\partial T_i}{\partial z} \right]$

$J_{\beta \lambda}$ ( $K \times L$ ): $J_{13} = \sum a_i Z_i^T \left[ \frac{\partial T_i}{\partial \lambda_x} \right]$

$J_{z\beta}$ ( $1 \times K$ ): derivative of W-equation w.r.t. $\beta$

Equation: $G_W = \sum a_i (w_i - W) \text{inv}_i$

$\frac{\partial G_W}{\partial \eta_i} = a_i \left[ \mu_{\eta,i} \text{inv}_i - (w_i - W) \text{inv}_i^2 \left(\frac{\partial D_i}{\partial \eta_i}\right) \right] = a_i \left[ \mu_{\eta,i} \text{inv}_i - (w_i - W) \text{inv}_i^2 (\lambda_W \mu_{\eta,i}) \right]$

Then: $J_{21} = \sum \frac{\partial G_W}{\partial \eta_i} \cdot Z_i$

$J_{zz}$ ( $1 \times 1$ ): $\frac{\partial G_W}{\partial z} = \sum a_i \left[ -\frac{dW}{dz} \cdot \text{inv}_i - (w_i - W) \text{inv}_i^2 \cdot \frac{\partial D_i}{\partial z} \right]$

$J_{z\lambda}$ ( $1 \times L$ ): $\frac{\partial G_W}{\partial \lambda_x} = \sum a_i \left[ -(w_i - W) \text{inv}_i^2 X_{\text{centered},i} \right]$

$J_{\lambda\beta}$ ( $L \times K$ ): constraints $H(\lambda): \sum a_i \text{inv}_i X_{\text{centered},i} = 0$

$\frac{\partial H}{\partial \eta_i} = -a_i \text{inv}_i^2 \frac{\partial D_i}{\partial \eta_i} X_{\text{centered},i} = -a_i \text{inv}_i^2 (\lambda_W \mu_{\eta,i}) X_{\text{centered},i}$

Thus, component-wise $J_{31} = \sum_i a_i\,(-\lambda_W \mu_{\eta,i}\,\text{inv}_i^2)\, X_{\text{centered},i}^T Z_i$ . In compact matrix form:

$J_{31} = X_{\text{centered}}^T \operatorname{diag}\!\big(-a_i\,\lambda_W\,\mu_{\eta,i}\,\text{inv}_i^2\big) Z.$

$J_{\lambda z}$ ( $L \times 1$ ): $\frac{\partial H}{\partial z} = -\sum a_i \text{inv}_i^2 \left(\frac{\partial D_i}{\partial z}\right) X_{\text{centered},i}$

$J_{\lambda\lambda}$ ( $L \times L$ ): $\frac{\partial H}{\partial \lambda_x} = -X_{\text{centered}}^T \operatorname{diag}(a_i\,\text{inv}_i^2) X_{\text{centered}}.$

These expressions match the unguarded analytic derivatives. In the code (src_dev/engines/el/impl/jacobian.R), any terms involving derivatives of $1/D_i$ are additionally multiplied by the active mask $\mathbb{1}\{D_i^{\text{raw}} > \delta\}$ to respect the denominator floor used for numerical stability.

Why Analytic A Helps

Newton-Raphson linearizes $F(\theta)$ near the current iterate: $F(\theta + \Delta) \approx F(\theta) + A(\theta)\,\Delta$ . The update $\Delta$ solves $A\,\Delta = -F$ , hence a high-quality $A$ is critical for fast, stable convergence.

Solving Strategy and Initialization

In the IID path the unknowns are $\theta = (\beta, z, \lambda_x)$ with $W = \mathrm{plogis}(z)$ . In the survey path the unknowns are $(\beta, z, \lambda_W, \lambda_x)$ , with $\lambda_W$ treated as a free parameter. In both cases we solve the full stacked system $F(\theta) = 0$ via Newton with the analytic Jacobian $A = \partial F/\partial\theta$ using nleqslv.
Globalization and scaling: we rely on nleqslv’s globalization (default global = "qline", xscalm = "auto") and enforce denominator positivity ( $\min_i D_i \ge \varepsilon$ ) within equation evaluations. Optional standardization of design matrices improves conditioning.
Initialization: by default $\beta$ starts at zeros in the scaled space (unless the user supplies start$beta), and $z$ is seeded at $\mathrm{logit}(\text{observed response rate})$ . An internal last-chance Broyden retry may be used if Newton fails to converge.

Practical Identifiability and Diagnostics

The EL system balances the parametric response-model score against calibration constraints. Identifiability can weaken in the following situations:

Weak or nearly collinear auxiliaries: if $X_i-\mu_x$ have little variation or are nearly collinear with the response score direction, the constraint block in $A=\partial F/\partial\theta$ becomes ill-conditioned.
Inconsistent auxiliary means: if supplied $\mu_x$ are far from what the respondent sample can support (under the response model), denominators $D_i$ cluster near 0 and $\kappa(A)$ inflates.
Heavy nonresponse or near-boundary $W$ : when $W$ approaches 0 or 1, $\lambda_W=C/(1-W)$ can spike and amplify sensitivity.

Diagnostics exposed by the implementation help assess these issues:

jacobian_condition_number ( $\kappa(A)$ ), max_equation_residual, denominator summaries (min, lower quantiles, median), weight concentration (max share, top-5 share, ESS), and the trimming fraction.

Mitigations include standardizing predictors, trimming extreme weights (trim_cap), adding informative response-model predictors, and preferring bootstrap variance when diagnostics indicate fragility.

Survey Design Details

We extend QLS’s methodology to complex surveys in two complementary ways:

Estimating equations with base weights: All sums include the base weight $a_i$ . Set $a_i$ to the survey design weight for respondents. Totals $N_{\text{pop}}$ and $n_{\text{resp\_weighted}}=\sum a_i$ are computed from the design weights and used throughout the design-weighted system.
Nonrespondent total $T_0$ in the linkage equation: In the survey-specific system we form $T_0 = N_{\text{pop}} - n_{\text{resp\_weighted}}$ and enforce the linkage between $W$ and $\lambda_W$ through the equation $T_0/(1-W) - \lambda_W \sum d_i/D_i = 0$ rather than using the closed-form $\lambda_W = ((N_{\text{pop}}/n_{\text{resp\_weighted}}) - 1)/(1-W)$ .
Bootstrap variance via replicate weights: For standard errors, we use bootstrap replicate-weight designs created with svrep::as_bootstrap_design. For each replicate, the estimator is re-fit on a reconstructed design using that replicate’s weights, and survey::svrVar is used to compute the variance of replicate estimates with appropriate scaling.

Weight scale note

The survey system is defined on an analysis scale through $N_{\text{pop}}$ and the design weights $d_i$ . By default we set $N_{\text{pop}}=\sum_i d_i$ using weights(design). If the design weights have been rescaled (for example, to sum to the sample size for numerical reasons), you should supply n_total on the intended population-total scale so that $T_0 = N_{\text{pop}}-\sum_{i\in R} d_i$ is computed consistently with your analysis.

This matches the paper’s guidance to adapt the likelihood/estimating framework to stratification or unequal-probability sampling while relying on standard survey resampling for uncertainty. Analytic variance has not been implemented yet.

Degrees-of-freedom: For confidence intervals, we use survey degrees-of-freedom (t-quantiles) when a survey.design is supplied, otherwise, we use normal quantiles.

Scaling and Unscaling

Scaling

Compute a nmar_scaling_recipe: for each column jj in ZZ and XX (excluding intercept), using (if present) the same base weights aia_i that enter the estimating equations:
- $\text{mean}_j$ , $\text{sd}_j$ . If $\text{sd}_j \approx 0$ , set $\text{sd}_j = 1$ to avoid blow-ups.
Transform:
- $Z_{\text{scaled}}[,j] = (Z_{\text{un}}[,j] - \text{mean}_j) / \text{sd}_j$
- $X_{\text{scaled}}[,j] = (X_{\text{un}}[,j] - \text{mean}_j) / \text{sd}_j$
- $\mu_{x,\text{scaled}}[j] = (\mu_{x,\text{un}}[j] - \text{mean}_j) / \text{sd}_j$

Unscaling $\beta$ and vcov

Construct linear map DD of size K×KK \times K:
- For columns $j \neq$ intercept: $D[j,j] = 1/\text{sd}_j$
- For intercept: adjust to absorb centering: $D[\text{intercept},j] = -\text{mean}_j/\text{sd}_j$
Transform: $\beta_{\text{unscaled}} = D \beta_{\text{scaled}}$ . If a covariance matrix is available, $\text{vcov}_{\text{unscaled}} = D \, \text{vcov}_{\text{scaled}} \, D^T$

Code: centralized in src_dev/shared/scaling.R. Engines call validate_and_apply_nmar_scaling() and unscale_coefficients(). For the EL engine, only $\beta$ is currently unscaled because no analytic coefficient covariance is computed.

Bootstrap Variance

IID:
- Resample rows with replacement ( $n$ to $n$ ), re-run estimator, compute $\text{var}$ of bootstrap $\hat{Y}$ s. Warn if many failures, return $\sqrt{\text{var}}$ .
Survey:
- Convert to bootstrap replicate-weight design via svrep::as_bootstrap_design.
- For each replicate, re-construct a temporary design and run estimator. Use survey::svrVar to compute variance of replicate estimates (with scale/rscales).

Code mapping:

Engine: el_engine(..., family, standardize, trim_cap, variance_method, ...) in src_dev/engines/el/engine.R
Dispatch: run_engine.nmar_engine_el(...) in src_dev/engines/el/run_engine.R adapts the formula and forwards arguments to internal el() methods.
- el.data.frame() / el.survey.design() in src_dev/engines/el/impl/dataframe.R and src_dev/engines/el/impl/survey.R prepare inputs, call el_estimator_core(), and wrap results.
EL Core: el_estimator_core(...) in src_dev/engines/el/impl/core.R runs:
- Construct $F(\theta)$ via el_build_equation_system() (src_dev/engines/el/impl/equations.R).
- Solve $F(\theta)=0$ via nleqslv (Newton with analytic Jacobian when available, Broyden fallback).
- Build EL weights, mean, and diagnostics.
Jacobian: el_build_jacobian(...) in src_dev/engines/el/impl/jacobian.R returns analytic A whenever family supplies d2mu.deta2 (logit, probit).
Variance: Bootstrap variance is implemented in src_dev/shared/bootstrap.R.
S3 result: src_dev/engines/el/s3.R defines EL-specific print and summary methods (print.nmar_result_el, summary.nmar_result_el). Generic methods such as tidy(), glance(), weights(), and coef() are defined for the parent nmar_result class in src_dev/S3/nmar_result_methods.R.

Practical Notes

Denominator guard: $D_i \ge \varepsilon$ (default $10^{-8}$ ) across all steps. Diagnostics report extreme fractions.
Eta cap option: you can adjust the $\eta$ cap via options(nmar.eta_cap = 60) (default is 50) to suit your data scale and link

Algorithm

We solve the full stacked system $F(\theta)=0$ with Newton using the analytic Jacobian $A = \partial F/\partial \theta$ and globalization via nleqslv. Denominator positivity ( $\min_i D_i \ge \varepsilon$ ), predictor standardization, and capped $\eta$ ensure numerical stability. For the IID path the estimating equations are Qin, Leung and Shao (2002) up to the small numeric guards on $\eta$ , $w_i$ , and $D_i$ . For survey designs we use a consistent design-weighted analogue.

Input: Z (response design), X (auxiliary design), mu_x (population means),
       a (base weights), family (logit/probit), trim_cap, tolerances.
Initialize: beta = 0 in scaled space (or user-supplied start),
            z = logit(observed response rate), lambda_x = 0
            (and lambda_W = 0 for survey designs).
Repeat until convergence of F(theta) = 0:
  1) Compute eta = Z beta, w = linkinv(eta), W = plogis(z).
     - IID (data.frame): set lambda_W = ((N_pop/n_resp_weighted) - 1)/(1 - W).
     - survey.design: use the current lambda_W component of theta.
  2) Evaluate full stacked equations using guarded denominators
     D_i = 1 + lambda_W (w_i - W) + (X_i - mu_x)^T lambda_x.
  3) Compute analytic Jacobian A = dF/dtheta (if available, else numeric/Broyden).
  4) Newton step: solve A * step = -F with globalization, enforce min D_i >= eps.
  5) Update theta <- theta + step.
Return: p_i \propto a_i / D_i and \hat{Y} = Sum p_i Y_i / Sum p_i.

References

Qin, J., Leung, D., and Shao, J. (2002). Estimation with survey data under nonignorable nonresponse or informative sampling. Journal of the American Statistical Association, 97(457), 193-200.
Chen, J., and Sitter, R. R. (1999). A pseudo empirical likelihood approach to the effective use of auxiliary information in complex surveys. Statistica Sinica, 9, 385-406.
Wu, C. (2005). Algorithms and R codes for the pseudo empirical likelihood method in survey sampling. Survey Methodology, 31(2), 239-243.
For practical usage guidance, see the companion “Empirical Likelihood” article.