Title: | Approximate the Variance of the Horvitz-Thompson Total Estimator |
---|---|
Description: | Variance approximations for the Horvitz-Thompson total estimator in Unequal Probability Sampling using only first-order inclusion probabilities. See Matei and Tillé (2005) and Haziza, Mecatti and Rao (2008) for details. |
Authors: | Roberto Sichera [aut, cre] |
Maintainer: | Roberto Sichera <[email protected]> |
License: | GPL-3 |
Version: | 0.1.4 |
Built: | 2025-02-16 04:30:23 UTC |
Source: | https://github.com/rhobis/upsvarapprox |
Variance approximations for the Horvitz-Thompson total estimator in Unequal Probability Sampling using only first-order inclusion probabilities. See Matei and Tillé (2005) and Haziza, Mecatti and Rao (2008) for details.
The package provides function Var_approx
for the approximation of the
Horvitz-Thompson variance, and function approx_var_est
for the computation
of approximate variance estimators.
For both functions, different estimators are implemented,
see their documentation for details.
Maintainer: Roberto Sichera [email protected]
Matei, A.; Tillé, Y., 2005. Evaluation of variance approximations and estimators in maximum entropy sampling with unequal probability and fixed sample size. Journal of Official Statistics 21 (4), 543-570.
Haziza, D.; Mecatti, F.; Rao, J.N.K. 2008. Evaluation of some approximate variance estimators under the Rao-Sampford unequal probability sampling design. Metron LXVI (1), 91-108.
Useful links:
Report bugs at https://github.com/rhobis/UPSvarApprox/issues
Approximated variance estimators which use only first-order inclusion probabilities
approx_var_est(y, pik, method, sample = NULL, ...)
approx_var_est(y, pik, method, sample = NULL, ...)
y |
numeric vector of sample observations |
pik |
numeric vector of first-order inclusion probabilities of length N, the population size, or n, the sample size depending on the chosen method (see Details for more information) |
method |
string indicating the desired approximate variance estimator. One of "Deville1", "Deville2", "Deville3", "Hajek", "Rosen", "FixedPoint", "Brewer1", "HartleyRao", "Berger", "Tille", "MateiTille1", "MateiTille2", "MateiTille3", "MateiTille4", "MateiTille5", "Brewer2", "Brewer3", "Brewer4". |
sample |
Either a numeric vector of length equal to the sample size with
the indices of sample units, or a boolean vector of the same length of |
... |
two optional parameters can be modified to control the iterative
procedures in methods |
The choice of the estimator to be used is made through the argument method
,
the list of methods and their respective equations is presented below.
Matei and Tillé (2005) divides the approximated variance estimators into three classes, depending on the quantities they require:
First and second-order inclusion probabilities:
The first class is composed of the Horvitz-Thompson estimator (Horvitz and Thompson 1952)
and the Sen-Yates-Grundy estimator (Yates and Grundy 1953; Sen 1953),
which are available through function varHT
in package sampling
;
Only first-order inclusion probabilities and only for sample units;
Only first-order inclusion probabilities, for the entire population.
Haziza, Mecatti and Rao (2008) provide a common form to express most of the estimators in class 2 and 3:
where , with
and and
are parameters that define the different
estimators:
method="Hajek"
[Class 2]
method="Deville2"
[Class 2]
method="Deville3"
[Class 2]
method="Rosen"
[Class 2]
method="Brewer1"
[Class 2]
method="Brewer2"
[Class 3]
method="Brewer3"
[Class 3]
method="Brewer4"
[Class 3]
method="Berger"
[Class 3]
method="HartleyRao"
[Class 3]
Some additional estimators are defined in Matei and Tillé (2005):
method="Deville1"
[Class 2]
where
and
method="Tille"
[Class 3]
where ,
and
The coefficients are computed iteratively through the
following procedure:
with
method="MateiTille1"
[Class 3]
where
and the coefficients are computed iteratively by the algorithm:
a necessary condition for convergence is checked and, if not satisfied, the function returns an alternative solution that uses only one iteration:
method="MateiTille2"
[Class 3]
where
method="MateiTille3"
[Class 3]
where is defined as in
method="MateiTille2"
.
method="MateiTille4"
[Class 3]
where
and
method="MateiTille5"
[Class 3]
This estimator is defined as in method="MateiTille4"
, and the
values are defined as in
method="MateiTille1"
a scalar, the estimated variance
Matei, A.; Tillé, Y., 2005. Evaluation of variance approximations and estimators in maximum entropy sampling with unequal probability and fixed sample size. Journal of Official Statistics 21 (4), 543-570.
Haziza, D.; Mecatti, F.; Rao, J.N.K. 2008. Evaluation of some approximate variance estimators under the Rao-Sampford unequal probability sampling design. Metron LXVI (1), 91-108.
### Generate population data --- N <- 500; n <- 50 set.seed(0) x <- rgamma(500, scale=10, shape=5) y <- abs( 2*x + 3.7*sqrt(x) * rnorm(N) ) pik <- n * x/sum(x) s <- sample(N, n) ys <- y[s] piks <- pik[s] ### Estimators of class 2 --- approx_var_est(ys, piks, method="Deville1") approx_var_est(ys, piks, method="Deville2") approx_var_est(ys, piks, method="Deville3") approx_var_est(ys, piks, method="Hajek") approx_var_est(ys, piks, method="Rosen") approx_var_est(ys, piks, method="FixedPoint") approx_var_est(ys, piks, method="Brewer1") ### Estimators of class 3 --- approx_var_est(ys, pik, method="HartleyRao", sample=s) approx_var_est(ys, pik, method="Berger", sample=s) approx_var_est(ys, pik, method="Tille", sample=s) approx_var_est(ys, pik, method="MateiTille1", sample=s) approx_var_est(ys, pik, method="MateiTille2", sample=s) approx_var_est(ys, pik, method="MateiTille3", sample=s) approx_var_est(ys, pik, method="MateiTille4", sample=s) approx_var_est(ys, pik, method="MateiTille5", sample=s) approx_var_est(ys, pik, method="Brewer2", sample=s) approx_var_est(ys, pik, method="Brewer3", sample=s) approx_var_est(ys, pik, method="Brewer4", sample=s)
### Generate population data --- N <- 500; n <- 50 set.seed(0) x <- rgamma(500, scale=10, shape=5) y <- abs( 2*x + 3.7*sqrt(x) * rnorm(N) ) pik <- n * x/sum(x) s <- sample(N, n) ys <- y[s] piks <- pik[s] ### Estimators of class 2 --- approx_var_est(ys, piks, method="Deville1") approx_var_est(ys, piks, method="Deville2") approx_var_est(ys, piks, method="Deville3") approx_var_est(ys, piks, method="Hajek") approx_var_est(ys, piks, method="Rosen") approx_var_est(ys, piks, method="FixedPoint") approx_var_est(ys, piks, method="Brewer1") ### Estimators of class 3 --- approx_var_est(ys, pik, method="HartleyRao", sample=s) approx_var_est(ys, pik, method="Berger", sample=s) approx_var_est(ys, pik, method="Tille", sample=s) approx_var_est(ys, pik, method="MateiTille1", sample=s) approx_var_est(ys, pik, method="MateiTille2", sample=s) approx_var_est(ys, pik, method="MateiTille3", sample=s) approx_var_est(ys, pik, method="MateiTille4", sample=s) approx_var_est(ys, pik, method="MateiTille5", sample=s) approx_var_est(ys, pik, method="Brewer2", sample=s) approx_var_est(ys, pik, method="Brewer3", sample=s) approx_var_est(ys, pik, method="Brewer4", sample=s)
Approximations of the Horvitz-Thompson variance for High-Entropy sampling designs. Such methods use only first-order inclusion probabilities.
Var_approx(y, pik, n, method, ...)
Var_approx(y, pik, n, method, ...)
y |
numeric vector containing the values of the variable of interest for all population units |
pik |
numeric vector of first-order inclusion probabilities, of length equal to population size |
n |
a scalar indicating the sample size |
method |
string indicating the approximation that should be used. One of "Hajek1", "Hajek2", "HartleyRao1", "HartleyRao2", "FixedPoint". |
... |
two optional parameters can be modified to control the iterative
procedure in |
The variance approximations available in this function are described below, the notation used is that of Matei and Tillé (2005).
Hájek variance approximation (method="Hajek1"
):
where
and
Starting from Hajék (1964), Brewer (2002) defined the following estimator
(method="Hajek2"
):
where
and
Hartley and Rao (1962) variance approximation (method="HartleyRao1"
):
Hartley and Rao (1962) provide a simplified version of the
variance above (method="HartleyRao2"
):
method="FixedPoint"
computes the Fixed-Point variance approximation
proposed by Deville and Tillé (2005).
The variance can be expressed in the same form as in method="Hajek1"
,
and the coefficients are computed iteratively by the algorithm:
a necessary condition for convergence is checked and, if not satisfied, the function returns an alternative solution that uses only one iteration:
a scalar, the approximated variance.
Matei, A.; Tillé, Y., 2005. Evaluation of variance approximations and estimators in maximum entropy sampling with unequal probability and fixed sample size. Journal of Official Statistics 21 (4), 543-570.
N <- 500; n <- 50 set.seed(0) x <- rgamma(n=N, scale=10, shape=5) y <- abs( 2*x + 3.7*sqrt(x) * rnorm(N) ) pik <- n * x/sum(x) pikl <- outer(pik, pik, '*'); diag(pikl) <- pik ### Variance approximations --- Var_approx(y, pik, n, method = "Hajek1") Var_approx(y, pik, n, method = "Hajek2") Var_approx(y, pik, n, method = "HartleyRao1") Var_approx(y, pik, n, method = "HartleyRao2") Var_approx(y, pik, n, method = "FixedPoint")
N <- 500; n <- 50 set.seed(0) x <- rgamma(n=N, scale=10, shape=5) y <- abs( 2*x + 3.7*sqrt(x) * rnorm(N) ) pik <- n * x/sum(x) pikl <- outer(pik, pik, '*'); diag(pikl) <- pik ### Variance approximations --- Var_approx(y, pik, n, method = "Hajek1") Var_approx(y, pik, n, method = "Hajek2") Var_approx(y, pik, n, method = "HartleyRao1") Var_approx(y, pik, n, method = "HartleyRao2") Var_approx(y, pik, n, method = "FixedPoint")