Skip to contents

Computes the estimated total unauthorized population \(\hat{\xi} = \sum_{i} N_i^{\hat{\alpha}}\) for each group defined by the covariates in alpha. Includes bias correction via second-order Taylor expansion and confidence intervals via monotone transformation of the Wald interval on the link scale.

Usage

popsize(object, ...)

# S3 method for class 'uncounted'
popsize(
  object,
  by = NULL,
  level = 0.95,
  bias_correction = TRUE,
  total = FALSE,
  ...
)

Arguments

object

An "uncounted" object.

...

Additional arguments (ignored).

by

Optional formula specifying grouping variables for stratified population size estimation (e.g., ~ year, ~ country, ~ year + sex). The variables must exist in the data used to fit the model. When NULL (default), groups are defined by the alpha covariate pattern from cov_alpha. When provided, the same fitted alpha values are used but summed over the by-groups instead. This is analogous to singleRcapture::stratifyPopsize().

level

Confidence level for intervals (default 0.95).

bias_correction

Logical; apply Taylor-expansion bias correction? Default TRUE. Uses model-based variance (not HC-robust) to avoid overcorrection from inflated leverage-driven standard errors.

total

Logical; if TRUE and multiple groups exist, compute a delta-method total with SE and CI, stored in attr(result, "total"). Default FALSE. Warning: for panel data where groups are defined by time periods (e.g., by = ~ year), the total sums population estimates across years. This is generally not meaningful because the same individuals may appear in multiple years. The total is only appropriate when groups represent non-overlapping subpopulations (e.g., by = ~ sex within a single year).

Value

A data frame with columns:

group

Group label derived from alpha covariates, or "(all)" when no alpha covariates are specified.

estimate

Plug-in estimate \(\hat{\xi}_g = \sum N_i^{\hat{\alpha}_g}\).

estimate_bc

Bias-corrected estimate \(\hat{\xi}^{BC}_g\); NA when bias_correction = FALSE.

lower

Lower bound of the (bias-corrected) confidence interval.

upper

Upper bound of the (bias-corrected) confidence interval.

share_pct

Group share as percentage of total \(\hat{\xi}\).

When multiple groups exist, attr(result, "total") contains the total estimate, bias-corrected estimate, standard error, and CI.

Details

Point estimate. For a group \(g\) with reference populations \(N_1, \ldots, N_n\) and estimated exponent \(\hat{\alpha}_g\), the plug-in population size is $$\hat{\xi}_g = \sum_{i=1}^{n} N_i^{\hat{\alpha}_g}.$$

Bias correction via Taylor expansion. Because \(\xi\) is a nonlinear function of \(\hat{\alpha}\), the plug-in estimate is biased upward (by Jensen's inequality, \(E[N^{\hat{\alpha}}] \geq N^{E[\hat{\alpha}]}\) when \(N > 1\)). A second-order Taylor expansion of \(h(\alpha) = \sum_i N_i^{\alpha}\) around \(\alpha_0\) gives the approximate bias $$ \mathrm{Bias}(\hat{\xi}_g) \approx \frac{1}{2} \sum_{i=1}^{n} N_i^{\alpha_g} (\log N_i)^2 \; \mathbf{x}_g' \mathbf{V} \mathbf{x}_g, $$ where \(\mathbf{x}_g\) is the design vector for group \(g\) and \(\mathbf{V}\) is the variance-covariance matrix of \(\hat{\alpha}\). For unconstrained models, the exact multiplicative correction is used: $$ \hat{\xi}^{BC}_g = \sum_{i=1}^{n} N_i^{\hat{\alpha}_g} \exp\!\left(-\frac{1}{2} (\log N_i)^2 \mathbf{x}_g' \mathbf{V} \mathbf{x}_g\right), $$ which is exact under normality of \(\hat{\alpha}\) and always positive. For constrained models the subtractive Taylor correction is used instead (the logistic-normal integral has no closed form), and the bias can be positive or negative depending on \(\alpha\). Model-based variance is used for bias correction rather than HC-robust variance. For iOLS/GPML, the model-based variance is \((\mathbf{Z}'\mathbf{Z})^{-1}\) (Gamma Fisher information, no dispersion scaling).

When constrained = TRUE, the delta method accounts for the logit link: \(\mathrm{Var}(\alpha) = \mathrm{Var}(\eta) \cdot [\sigma'(\eta)]^2\) where \(\sigma'(\eta) = \alpha(1-\alpha)\).

Confidence intervals via monotone transformation. A Wald interval is first constructed on the link scale for the linear predictor: $$\hat{\eta}_g \pm z_{\alpha/2} \cdot \mathrm{se}(\hat{\eta}_g),$$ where the standard error uses the HC-robust variance. The interval endpoints are then mapped through the monotone transformation \(g(\alpha) = \sum_i N_i^{\alpha}\) (increasing for \(N_i \geq 1\)) to obtain \([\hat{\xi}_L, \hat{\xi}_U]\). When constrained = TRUE, the logit link is applied before exponentiation. Bias correction is also applied to the CI bounds at their respective alpha values.

Total across groups. When multiple groups exist, the total \(\hat{\xi} = \sum_g \hat{\xi}_g\) has its own delta-method CI computed via the gradient \(\nabla_\alpha \xi\) and a log-normal approximation for positivity. This is stored in attr(result, "total").

Examples

# Simulate synthetic data for 5 countries, 3 years each
set.seed(42)
n_obs <- 15
sim_data <- data.frame(
  country = rep(paste0("C", 1:5), each = 3),
  year    = rep(2018:2020, 5),
  N       = rpois(n_obs, lambda = 500000),
  n       = rpois(n_obs, lambda = 1000),
  m       = rpois(n_obs, lambda = 50)
)

# Fit a Poisson model
fit <- estimate_hidden_pop(
  data = sim_data, observed = ~m, auxiliary = ~n,
  reference_pop = ~N, method = "poisson"
)
#> Warning: Some alpha values < 0 (min = -0.467). Consider using constrained = TRUE.

# Population size with bias correction and 95% CI
popsize(fit)
#>   group observed   estimate estimate_bc lower upper share_pct
#> 1 (all)      752 0.03262174           0     0     0       100

# Without bias correction
popsize(fit, bias_correction = FALSE)
#>   group observed   estimate estimate_bc         lower         upper share_pct
#> 1 (all)      752 0.03262174          NA 3.458853e-219 3.076679e+215       100

# 90% confidence interval
popsize(fit, level = 0.90)
#>   group observed   estimate estimate_bc lower upper share_pct
#> 1 (all)      752 0.03262174           0     0     0       100