| Title: | Approximate the Variance of the Horvitz-Thompson Total Estimator |
|---|---|
| Description: | Variance approximations for the Horvitz-Thompson total estimator in Unequal Probability Sampling using only first-order inclusion probabilities. See Matei and Tillé (2005) and Haziza, Mecatti and Rao (2008) for details. |
| Authors: | Roberto Sichera [aut, cre] |
| Maintainer: | Roberto Sichera <[email protected]> |
| License: | GPL-3 |
| Version: | 0.1.4 |
| Built: | 2026-05-20 07:47:32 UTC |
| Source: | https://github.com/rhobis/upsvarapprox |
Variance approximations for the Horvitz-Thompson total estimator in Unequal Probability Sampling using only first-order inclusion probabilities. See Matei and Tillé (2005) and Haziza, Mecatti and Rao (2008) for details.
The package provides function Var_approx for the approximation of the
Horvitz-Thompson variance, and function approx_var_est for the computation
of approximate variance estimators.
For both functions, different estimators are implemented,
see their documentation for details.
Maintainer: Roberto Sichera [email protected]
Matei, A.; Tillé, Y., 2005. Evaluation of variance approximations and estimators in maximum entropy sampling with unequal probability and fixed sample size. Journal of Official Statistics 21 (4), 543-570.
Haziza, D.; Mecatti, F.; Rao, J.N.K. 2008. Evaluation of some approximate variance estimators under the Rao-Sampford unequal probability sampling design. Metron LXVI (1), 91-108.
Useful links:
Report bugs at https://github.com/rhobis/UPSvarApprox/issues
Approximated variance estimators which use only first-order inclusion probabilities
approx_var_est(y, pik, method, sample = NULL, ...)approx_var_est(y, pik, method, sample = NULL, ...)
y |
numeric vector of sample observations |
pik |
numeric vector of first-order inclusion probabilities of length N, the population size, or n, the sample size depending on the chosen method (see Details for more information) |
method |
string indicating the desired approximate variance estimator. One of "Deville1", "Deville2", "Deville3", "Hajek", "Rosen", "FixedPoint", "Brewer1", "HartleyRao", "Berger", "Tille", "MateiTille1", "MateiTille2", "MateiTille3", "MateiTille4", "MateiTille5", "Brewer2", "Brewer3", "Brewer4". |
sample |
Either a numeric vector of length equal to the sample size with
the indices of sample units, or a boolean vector of the same length of |
... |
two optional parameters can be modified to control the iterative
procedures in methods |
The choice of the estimator to be used is made through the argument method,
the list of methods and their respective equations is presented below.
Matei and Tillé (2005) divides the approximated variance estimators into three classes, depending on the quantities they require:
First and second-order inclusion probabilities:
The first class is composed of the Horvitz-Thompson estimator (Horvitz and Thompson 1952)
and the Sen-Yates-Grundy estimator (Yates and Grundy 1953; Sen 1953),
which are available through function varHT in package sampling;
Only first-order inclusion probabilities and only for sample units;
Only first-order inclusion probabilities, for the entire population.
Haziza, Mecatti and Rao (2008) provide a common form to express most of the estimators in class 2 and 3:
where , with
and and are parameters that define the different
estimators:
method="Hajek" [Class 2]
method="Deville2" [Class 2]
method="Deville3" [Class 2]
method="Rosen" [Class 2]
method="Brewer1" [Class 2]
method="Brewer2" [Class 3]
method="Brewer3" [Class 3]
method="Brewer4" [Class 3]
method="Berger" [Class 3]
method="HartleyRao" [Class 3]
Some additional estimators are defined in Matei and Tillé (2005):
method="Deville1" [Class 2]
where
and
method="Tille" [Class 3]
where ,
and
The coefficients are computed iteratively through the
following procedure:
with
method="MateiTille1" [Class 3]
where
and the coefficients are computed iteratively by the algorithm:
a necessary condition for convergence is checked and, if not satisfied, the function returns an alternative solution that uses only one iteration:
method="MateiTille2" [Class 3]
where
method="MateiTille3" [Class 3]
where is defined as in method="MateiTille2".
method="MateiTille4" [Class 3]
where
and
method="MateiTille5" [Class 3]
This estimator is defined as in method="MateiTille4", and the
values are defined as in method="MateiTille1"
a scalar, the estimated variance
Matei, A.; Tillé, Y., 2005. Evaluation of variance approximations and estimators in maximum entropy sampling with unequal probability and fixed sample size. Journal of Official Statistics 21 (4), 543-570.
Haziza, D.; Mecatti, F.; Rao, J.N.K. 2008. Evaluation of some approximate variance estimators under the Rao-Sampford unequal probability sampling design. Metron LXVI (1), 91-108.
### Generate population data --- N <- 500; n <- 50 set.seed(0) x <- rgamma(500, scale=10, shape=5) y <- abs( 2*x + 3.7*sqrt(x) * rnorm(N) ) pik <- n * x/sum(x) s <- sample(N, n) ys <- y[s] piks <- pik[s] ### Estimators of class 2 --- approx_var_est(ys, piks, method="Deville1") approx_var_est(ys, piks, method="Deville2") approx_var_est(ys, piks, method="Deville3") approx_var_est(ys, piks, method="Hajek") approx_var_est(ys, piks, method="Rosen") approx_var_est(ys, piks, method="FixedPoint") approx_var_est(ys, piks, method="Brewer1") ### Estimators of class 3 --- approx_var_est(ys, pik, method="HartleyRao", sample=s) approx_var_est(ys, pik, method="Berger", sample=s) approx_var_est(ys, pik, method="Tille", sample=s) approx_var_est(ys, pik, method="MateiTille1", sample=s) approx_var_est(ys, pik, method="MateiTille2", sample=s) approx_var_est(ys, pik, method="MateiTille3", sample=s) approx_var_est(ys, pik, method="MateiTille4", sample=s) approx_var_est(ys, pik, method="MateiTille5", sample=s) approx_var_est(ys, pik, method="Brewer2", sample=s) approx_var_est(ys, pik, method="Brewer3", sample=s) approx_var_est(ys, pik, method="Brewer4", sample=s)### Generate population data --- N <- 500; n <- 50 set.seed(0) x <- rgamma(500, scale=10, shape=5) y <- abs( 2*x + 3.7*sqrt(x) * rnorm(N) ) pik <- n * x/sum(x) s <- sample(N, n) ys <- y[s] piks <- pik[s] ### Estimators of class 2 --- approx_var_est(ys, piks, method="Deville1") approx_var_est(ys, piks, method="Deville2") approx_var_est(ys, piks, method="Deville3") approx_var_est(ys, piks, method="Hajek") approx_var_est(ys, piks, method="Rosen") approx_var_est(ys, piks, method="FixedPoint") approx_var_est(ys, piks, method="Brewer1") ### Estimators of class 3 --- approx_var_est(ys, pik, method="HartleyRao", sample=s) approx_var_est(ys, pik, method="Berger", sample=s) approx_var_est(ys, pik, method="Tille", sample=s) approx_var_est(ys, pik, method="MateiTille1", sample=s) approx_var_est(ys, pik, method="MateiTille2", sample=s) approx_var_est(ys, pik, method="MateiTille3", sample=s) approx_var_est(ys, pik, method="MateiTille4", sample=s) approx_var_est(ys, pik, method="MateiTille5", sample=s) approx_var_est(ys, pik, method="Brewer2", sample=s) approx_var_est(ys, pik, method="Brewer3", sample=s) approx_var_est(ys, pik, method="Brewer4", sample=s)
Approximations of the Horvitz-Thompson variance for High-Entropy sampling designs. Such methods use only first-order inclusion probabilities.
Var_approx(y, pik, n, method, ...)Var_approx(y, pik, n, method, ...)
y |
numeric vector containing the values of the variable of interest for all population units |
pik |
numeric vector of first-order inclusion probabilities, of length equal to population size |
n |
a scalar indicating the sample size |
method |
string indicating the approximation that should be used. One of "Hajek1", "Hajek2", "HartleyRao1", "HartleyRao2", "FixedPoint". |
... |
two optional parameters can be modified to control the iterative
procedure in |
The variance approximations available in this function are described below, the notation used is that of Matei and Tillé (2005).
Hájek variance approximation (method="Hajek1"):
where
and
Starting from Hajék (1964), Brewer (2002) defined the following estimator
(method="Hajek2"):
where
and
Hartley and Rao (1962) variance approximation (method="HartleyRao1"):
Hartley and Rao (1962) provide a simplified version of the
variance above (method="HartleyRao2"):
method="FixedPoint" computes the Fixed-Point variance approximation
proposed by Deville and Tillé (2005).
The variance can be expressed in the same form as in method="Hajek1",
and the coefficients are computed iteratively by the algorithm:
a necessary condition for convergence is checked and, if not satisfied, the function returns an alternative solution that uses only one iteration:
a scalar, the approximated variance.
Matei, A.; Tillé, Y., 2005. Evaluation of variance approximations and estimators in maximum entropy sampling with unequal probability and fixed sample size. Journal of Official Statistics 21 (4), 543-570.
N <- 500; n <- 50 set.seed(0) x <- rgamma(n=N, scale=10, shape=5) y <- abs( 2*x + 3.7*sqrt(x) * rnorm(N) ) pik <- n * x/sum(x) pikl <- outer(pik, pik, '*'); diag(pikl) <- pik ### Variance approximations --- Var_approx(y, pik, n, method = "Hajek1") Var_approx(y, pik, n, method = "Hajek2") Var_approx(y, pik, n, method = "HartleyRao1") Var_approx(y, pik, n, method = "HartleyRao2") Var_approx(y, pik, n, method = "FixedPoint")N <- 500; n <- 50 set.seed(0) x <- rgamma(n=N, scale=10, shape=5) y <- abs( 2*x + 3.7*sqrt(x) * rnorm(N) ) pik <- n * x/sum(x) pikl <- outer(pik, pik, '*'); diag(pikl) <- pik ### Variance approximations --- Var_approx(y, pik, n, method = "Hajek1") Var_approx(y, pik, n, method = "Hajek2") Var_approx(y, pik, n, method = "HartleyRao1") Var_approx(y, pik, n, method = "HartleyRao2") Var_approx(y, pik, n, method = "FixedPoint")