Package 'UPSvarApprox'

Title: Approximate the Variance of the Horvitz-Thompson Total Estimator
Description: Variance approximations for the Horvitz-Thompson total estimator in Unequal Probability Sampling using only first-order inclusion probabilities. See Matei and Tillé (2005) and Haziza, Mecatti and Rao (2008) for details.
Authors: Roberto Sichera [aut, cre]
Maintainer: Roberto Sichera <[email protected]>
License: GPL-3
Version: 0.1.4
Built: 2024-08-28 02:38:00 UTC
Source: https://github.com/rhobis/upsvarapprox

Help Index


UPSvarApprox: Approximate the variance of the Horvitz-Thompson estimator

Description

Variance approximations for the Horvitz-Thompson total estimator in Unequal Probability Sampling using only first-order inclusion probabilities. See Matei and Tillé (2005) and Haziza, Mecatti and Rao (2008) for details.

Variance approximation

The package provides function Var_approx for the approximation of the Horvitz-Thompson variance, and function approx_var_est for the computation of approximate variance estimators. For both functions, different estimators are implemented, see their documentation for details.

Author(s)

Maintainer: Roberto Sichera [email protected]

References

Matei, A.; Tillé, Y., 2005. Evaluation of variance approximations and estimators in maximum entropy sampling with unequal probability and fixed sample size. Journal of Official Statistics 21 (4), 543-570.

Haziza, D.; Mecatti, F.; Rao, J.N.K. 2008. Evaluation of some approximate variance estimators under the Rao-Sampford unequal probability sampling design. Metron LXVI (1), 91-108.

See Also

Useful links:


Approximated Variance Estimators

Description

Approximated variance estimators which use only first-order inclusion probabilities

Usage

approx_var_est(y, pik, method, sample = NULL, ...)

Arguments

y

numeric vector of sample observations

pik

numeric vector of first-order inclusion probabilities of length N, the population size, or n, the sample size depending on the chosen method (see Details for more information)

method

string indicating the desired approximate variance estimator. One of "Deville1", "Deville2", "Deville3", "Hajek", "Rosen", "FixedPoint", "Brewer1", "HartleyRao", "Berger", "Tille", "MateiTille1", "MateiTille2", "MateiTille3", "MateiTille4", "MateiTille5", "Brewer2", "Brewer3", "Brewer4".

sample

Either a numeric vector of length equal to the sample size with the indices of sample units, or a boolean vector of the same length of pik, indicating which units belong to the sample (TRUE if the unit is in the sample, FALSE otherwise. Only used with estimators of the third class (see Details for more information).

...

two optional parameters can be modified to control the iterative procedures in methods "MateiTille5", "Tille" and "FixedPoint": maxIter sets the maximum number of iterations to perform and eps controls the convergence error

Details

The choice of the estimator to be used is made through the argument method, the list of methods and their respective equations is presented below.

Matei and Tillé (2005) divides the approximated variance estimators into three classes, depending on the quantities they require:

  1. First and second-order inclusion probabilities: The first class is composed of the Horvitz-Thompson estimator (Horvitz and Thompson 1952) and the Sen-Yates-Grundy estimator (Yates and Grundy 1953; Sen 1953), which are available through function varHT in package sampling;

  2. Only first-order inclusion probabilities and only for sample units;

  3. Only first-order inclusion probabilities, for the entire population.

Haziza, Mecatti and Rao (2008) provide a common form to express most of the estimators in class 2 and 3:

var^(t^HT)=isciei2\widehat{var}(\hat{t}_{HT}) = \sum_{i \in s}c_i e_i^2

where ei=yiπiB^e_i = \frac{y_i}{\pi_i} - \hat{B}, with

B^=isai(yi/πi)isai\hat{B} = \frac{\sum_{i\in s} a_i (y_i/\pi_i) }{\sum_{i\in s} a_i}

and aia_i and cic_i are parameters that define the different estimators:

  • method="Hajek" [Class 2]

    ci=nn1(1πi);ai=cic_i = \frac{n}{n-1}(1-\pi_i) ; \quad a_i= c_i

  • method="Deville2" [Class 2]

    ci=(1πi){1js[1πjks(1πk)]2}1;ai=cic_i = (1-\pi_i)\Biggl\{ 1 - \sum_{j\in s}\Bigl[ \frac{1-\pi_j}{\sum_{k\in s} (1-\pi_k)} \Bigr]^2 \Biggr\}^{-1} ; \quad a_i= c_i

  • method="Deville3" [Class 2]

    ci=(1πi){1js[1πjks(1πk)]2}1;ai=1c_i = (1-\pi_i)\Biggl\{ 1 - \sum_{j\in s}\Bigl[ \frac{1-\pi_j}{\sum_{k\in s} (1-\pi_k)} \Bigr]^2 \Biggr\}^{-1}; \quad a_i= 1

  • method="Rosen" [Class 2]

    ci=nn1(1πi);ai=(1πi)log(1πi)/πic_i = \frac{n}{n-1} (1-\pi_i); \quad a_i= (1-\pi_i)log(1-\pi_i) / \pi_i

  • method="Brewer1" [Class 2]

    ci=nn1(1πi);ai=1c_i = \frac{n}{n-1}(1-\pi_i); \quad a_i= 1

  • method="Brewer2" [Class 3]

    ci=nn1(1πi+πinn2jUπj2);ai=1c_i = \frac{n}{n-1} \Bigl(1-\pi_i+ \frac{\pi_i}{n} - n^{-2}\sum_{j \in U} \pi_j^2 \Bigr) ; \quad a_i=1

  • method="Brewer3" [Class 3]

    ci=nn1(1πiπinn2jUπj2);ai=1c_i = \frac{n}{n-1} \Bigl(1-\pi_i - \frac{\pi_i}{n} - n^{-2}\sum_{j \in U} \pi_j^2 \Bigr); \quad a_i = 1

  • method="Brewer4" [Class 3]

    ci=nn1(1πiπin1+n1(n1)1jUπj2);ai=1c_i = \frac{n}{n-1} \Bigl(1-\pi_i - \frac{\pi_i}{n-1} + n^{-1}(n-1)^{-1}\sum_{j \in U} \pi_j^2 \Bigr); \quad a_i=1

  • method="Berger" [Class 3]

    ci=nn1(1πi)[js(1πj)jU(1πj)];ai=cic_i = \frac{n}{n-1} (1-\pi_i) \Biggl[ \frac{\sum_{j\in s} (1-\pi_j)}{\sum_{j\in U} (1-\pi_j)} \Biggr] ; \quad a_i=c_i

  • method="HartleyRao" [Class 3]

    ci=nn1(1πin1jsπi+n1jUπj2);ai=1c_i = \frac{n}{n-1} \Bigl(1-\pi_i - n^{-1}\sum_{j \in s}\pi_i + n^{-1}\sum_{j\in U} \pi_j^2 \Bigr) ; \quad a_i=1

Some additional estimators are defined in Matei and Tillé (2005):

  • method="Deville1" [Class 2]

    var^(t^HT)=isciπi2(yiyi)2\widehat{var}(\hat{t}_{HT}) = \sum_{i \in s} \frac{c_i}{ \pi_i^2} (y_i - y_i^*)^2

    where

    yi=πijscjyj/πjjscjy_i^* = \pi_i \frac{\sum_{j \in s} c_j y_j / \pi_j}{\sum_{j \in s} c_j}

    and ci=(1πi)nn1c_i = (1-\pi_i)\frac{n}{n-1}

  • method="Tille" [Class 3]

    var^(t^HT)=(isωi)isωi(y~iy~ˉω)2nis(y~it^HTn)2\widehat{var}(\hat{t}_{HT}) = \biggl( \sum_{i \in s} \omega_i \biggr) \sum_{i\in s} \omega_i (\tilde{y}_i - \bar{\tilde{y}}_\omega )^2 - n \sum_{i\in s}\biggl( \tilde{y}_i - \frac{\hat{t}_{HT}}{n} \biggr)^2

    where y~i=yi/πi\tilde{y}_i = y_i / \pi_i, ωi=πi/βi\omega_i = \pi_i / \beta_i and y~ˉω=(isωi)1isωiy~i\bar{\tilde{y}}_\omega = \biggl( \sum_{i \in s} \omega_i \biggr)^{-1} \sum_{i \in s} \omega_i \tilde{y}_i

    The coefficients βi\beta_i are computed iteratively through the following procedure:

    1. βi(0)=πi,iU\beta_i^{(0)} = \pi_i, \,\, \forall i\in U

    2. βi(2k1)=(n1)πiβ(2k2)βi(2k2)\beta_i^{(2k-1)} = \frac{(n-1)\pi_i}{\beta^{(2k-2)} - \beta_i^{(2k-2)}}

    3. βi2k=βi(2k1)(n(n1)(β(2k1))2iU(βk(2k1))2)(1/2)\beta_i^{2k} = \beta_i^{(2k-1)} \Biggl( \frac{n(n-1)}{(\beta^(2k-1))^2 - \sum_{i\in U} (\beta_k^{(2k-1)})^2 } \Biggr)^{(1/2)}

    with β(k)=iUβii,k=1,2,3,\beta^{(k)} = \sum_{i\in U} \beta_i^{i}, \,\, k=1,2,3, \dots

  • method="MateiTille1" [Class 3]

    var^(t^HT)=n(N1)N(n1)isbiπi3(yiy^i)2\widehat{var}(\hat{t}_{HT}) = \frac{n(N-1)}{N(n-1)} \sum_{i\in s} \frac{b_i}{\pi_i^3} (y_i - \hat{y}_i^*)^2

    where

    y^i=πiisbiyi/πi2isbi/πi\hat{y}_i^* = \pi_i \frac{\sum_{i\in s} b_i y_i/\pi_i^2}{\sum_{i\in s} b_i/\pi_i}

    and the coefficients bib_i are computed iteratively by the algorithm:

    1. bi(0)=πi(1πi)NN1,iUb_i^{(0)} = \pi_i (1-\pi_i) \frac{N}{N-1}, \,\, \forall i \in U

    2. bi(k)=(bi(k1))2jUbj(k1)+πi(1πi)b_i^{(k)} = \frac{(b_i^{(k-1)})^2 }{\sum_{j\in U} b_j^{(k-1)} } + \pi_i(1-\pi_i)

    a necessary condition for convergence is checked and, if not satisfied, the function returns an alternative solution that uses only one iteration:

    bi=πi(1πi)(Nπi(1πi)(N1)jUπj(1πj)+1)b_i = \pi_i(1-\pi_i)\Biggl( \frac{N\pi_i(1-\pi_i)}{ (N-1)\sum_{j\in U}\pi_j(1-\pi_j) } + 1 \Biggr)

  • method="MateiTille2" [Class 3]

    var^(t^HT)=11iUdi2πiis(1πi)(yiπit^HTn)2\widehat{var}(\hat{t}_{HT}) = \frac{1}{1 - \sum_{i\in U} \frac{d_i^2}{\pi_i} } \sum_{i\in s} (1-\pi_i) \Biggl( \frac{y_i}{\pi_i} - \frac{\hat{t}_{HT}}{n} \Biggr)^2

    where

    di=πi(1πi)jUπj(1πj)d_i = \frac{\pi_i(1-\pi_i)}{\sum_{j\in U} \pi_j(1-\pi_j) }

  • method="MateiTille3" [Class 3]

    var^(t^HT)=11iUdi2πiis(1πi)(yiπijs(1πj)yjπjjs(1πj))2\widehat{var}(\hat{t}_{HT}) = \frac{1}{1 - \sum_{i\in U} \frac{d_i^2}{\pi_i} } \sum_{i\in s} (1-\pi_i) \Biggl( \frac{y_i}{\pi_i} - \frac{ \sum_{j\in s} (1-\pi_j)\frac{y_j}{\pi_j} }{ \sum_{j\in s} (1-\pi_j) } \Biggr)^2

    where did_i is defined as in method="MateiTille2".

  • method="MateiTille4" [Class 3]

    var^(t^HT)=11iUbi/n2isbiπi3(yiyi)2\widehat{var}(\hat{t}_{HT}) = \frac{1}{1 - \sum_{i\in U} b_i/n^2 } \sum_{i\in s} \frac{b_i}{\pi_i^3} (y_i - y_i^* )^2

    where

    yi=πijsbjyj/πj2jsbj/πjy_i^* = \pi_i \frac{ \sum_{j\in s} b_j y_j/\pi_j^2 }{ \sum_{j\in s} b_j/\pi_j }

    and

    bi=πi(1πi)NN1b_i = \frac{ \pi_i(1-\pi_i)N }{ N-1 }

  • method="MateiTille5" [Class 3] This estimator is defined as in method="MateiTille4", and the bib_i values are defined as in method="MateiTille1"

Value

a scalar, the estimated variance

References

Matei, A.; Tillé, Y., 2005. Evaluation of variance approximations and estimators in maximum entropy sampling with unequal probability and fixed sample size. Journal of Official Statistics 21 (4), 543-570.

Haziza, D.; Mecatti, F.; Rao, J.N.K. 2008. Evaluation of some approximate variance estimators under the Rao-Sampford unequal probability sampling design. Metron LXVI (1), 91-108.

Examples

### Generate population data ---
N <- 500; n <- 50

set.seed(0)
x <- rgamma(500, scale=10, shape=5)
y <- abs( 2*x + 3.7*sqrt(x) * rnorm(N) )

pik <- n * x/sum(x)
s   <- sample(N, n)

ys <- y[s]
piks <- pik[s]

### Estimators of class 2 ---
approx_var_est(ys, piks, method="Deville1")
approx_var_est(ys, piks, method="Deville2")
approx_var_est(ys, piks, method="Deville3")
approx_var_est(ys, piks, method="Hajek")
approx_var_est(ys, piks, method="Rosen")
approx_var_est(ys, piks, method="FixedPoint")
approx_var_est(ys, piks, method="Brewer1")

### Estimators of class 3 ---
approx_var_est(ys, pik, method="HartleyRao", sample=s)
approx_var_est(ys, pik, method="Berger", sample=s)
approx_var_est(ys, pik, method="Tille", sample=s)
approx_var_est(ys, pik, method="MateiTille1", sample=s)
approx_var_est(ys, pik, method="MateiTille2", sample=s)
approx_var_est(ys, pik, method="MateiTille3", sample=s)
approx_var_est(ys, pik, method="MateiTille4", sample=s)
approx_var_est(ys, pik, method="MateiTille5", sample=s)
approx_var_est(ys, pik, method="Brewer2", sample=s)
approx_var_est(ys, pik, method="Brewer3", sample=s)
approx_var_est(ys, pik, method="Brewer4", sample=s)

Approximate the Variance of the Horvitz-Thompson estimator

Description

Approximations of the Horvitz-Thompson variance for High-Entropy sampling designs. Such methods use only first-order inclusion probabilities.

Usage

Var_approx(y, pik, n, method, ...)

Arguments

y

numeric vector containing the values of the variable of interest for all population units

pik

numeric vector of first-order inclusion probabilities, of length equal to population size

n

a scalar indicating the sample size

method

string indicating the approximation that should be used. One of "Hajek1", "Hajek2", "HartleyRao1", "HartleyRao2", "FixedPoint".

...

two optional parameters can be modified to control the iterative procedure in method="FixedPoint": maxIter sets the maximum number of iterations and eps controls the convergence error

Details

The variance approximations available in this function are described below, the notation used is that of Matei and Tillé (2005).

  • Hájek variance approximation (method="Hajek1"):

    Var~=iUbiπi2(yiyi)2\tilde{Var} = \sum_{i \in U} \frac{b_i}{\pi_i^2}(y_i - y_i^*)^2

    where

    yi=πijUbjyj/πjjUbjy_i^* = \pi_i \frac{ \sum_{j\in U} b_j y_j/\pi_j }{ \sum_{j \in U} b_j }

    and

    bi=πi(1πi)NN1b_i = \frac{ \pi_i(1-\pi_i)N }{ N-1 }

  • Starting from Hajék (1964), Brewer (2002) defined the following estimator (method="Hajek2"):

    Var~=iUπi(1πi)(yiπiY~n)2\tilde{Var} = \sum_{i \in U} \pi_i(1-\pi_i) \Bigl( \frac{y_i}{\pi_i} - \frac{\tilde{Y}}{n} \Bigr)^2

    where Y~=iUaiyi\tilde{Y} = \sum_{i \in U} a_i y_i and ai=n(1πi)/jUπj(1πj)a_i = n(1-\pi_i)/\sum_{j \in U} \pi_j(1-\pi_j)

  • Hartley and Rao (1962) variance approximation (method="HartleyRao1"):

    Var~=iUπi(1n1nπi)(yiπiYn)2\tilde{Var} = \sum_{i \in U} \pi_i \Bigl( 1 - \frac{n-1}{n}\pi_i \Bigr) \Biggr( \frac{y_i}{\pi_i} - \frac{Y}{n} \Biggr)^2

    n1n2iU(2πi3πi22jUπj2)(yiπiYn)2\qquad - \frac{n-1}{n^2} \sum_{i \in U} \Biggl( 2\pi_i^3 - \frac{\pi_i^2}{2} \sum_{j \in U} \pi_j^2 \Biggr) \Biggr( \frac{y_i}{\pi_i} - \frac{Y}{n} \Biggr)^2

    +2(n1)n3(iUπiyiYniUπi2)2\quad \qquad + \frac{2(n-1)}{n^3} \Biggl( \sum_{i \in U}\pi_i y_i - \frac{Y}{n}\sum_{i\in U} \pi_i^2 \Biggr)^2

  • Hartley and Rao (1962) provide a simplified version of the variance above (method="HartleyRao2"):

    Var~=iUπi(1n1nπi)(yiπiYn)2\tilde{Var} = \sum_{i \in U} \pi_i \Bigl( 1 - \frac{n-1}{n}\pi_i \Bigr) \Biggr( \frac{y_i}{\pi_i} - \frac{Y}{n} \Biggr)^2

  • method="FixedPoint" computes the Fixed-Point variance approximation proposed by Deville and Tillé (2005). The variance can be expressed in the same form as in method="Hajek1", and the coefficients bib_i are computed iteratively by the algorithm:

    1. bi(0)=πi(1πi)NN1,iUb_i^{(0)} = \pi_i (1-\pi_i) \frac{N}{N-1}, \,\, \forall i \in U

    2. bi(k)=(bi(k1))2jUbj(k1)+πi(1πi)b_i^{(k)} = \frac{(b_i^{(k-1)})^2 }{\sum_{j\in U} b_j^{(k-1)} } + \pi_i(1-\pi_i)

    a necessary condition for convergence is checked and, if not satisfied, the function returns an alternative solution that uses only one iteration:

    bi=πi(1πi)(Nπi(1πi)(N1)jUπj(1πj)+1)b_i = \pi_i(1-\pi_i)\Biggl( \frac{N\pi_i(1-\pi_i)}{ (N-1)\sum_{j\in U}\pi_j(1-\pi_j) } + 1 \Biggr)

Value

a scalar, the approximated variance.

References

Matei, A.; Tillé, Y., 2005. Evaluation of variance approximations and estimators in maximum entropy sampling with unequal probability and fixed sample size. Journal of Official Statistics 21 (4), 543-570.

Examples

N <- 500; n <- 50

set.seed(0)
x <- rgamma(n=N, scale=10, shape=5)
y <- abs( 2*x + 3.7*sqrt(x) * rnorm(N) )

pik  <- n * x/sum(x)
pikl <- outer(pik, pik, '*'); diag(pikl) <- pik

### Variance approximations ---
Var_approx(y, pik, n, method = "Hajek1")
Var_approx(y, pik, n, method = "Hajek2")
Var_approx(y, pik, n, method = "HartleyRao1")
Var_approx(y, pik, n, method = "HartleyRao2")
Var_approx(y, pik, n, method = "FixedPoint")