Loading AI tools
From Wikipedia, the free encyclopedia
The ratio estimator is a statistical estimator for the ratio of means of two random variables. Ratio estimates are biased and corrections must be made when they are used in experimental or survey work. The ratio estimates are asymmetrical and symmetrical tests such as the t test should not be used to generate confidence intervals.
The bias is of the order O(1/n) (see big O notation) so as the sample size (n) increases, the bias will asymptotically approach 0. Therefore, the estimator is approximately unbiased for large sample sizes.
Assume there are two characteristics – x and y – that can be observed for each sampled element in the data set. The ratio R is
The ratio estimate of a value of the y variate (θy) is
where θx is the corresponding value of the x variate. θy is known to be asymptotically normally distributed.[1]
The sample ratio (r) is estimated from the sample
That the ratio is biased can be shown with Jensen's inequality as follows (assuming independence between and ):
where is the mean of the variate and is the mean of the variate .
Under simple random sampling the bias is of the order O( n−1 ). An upper bound on the relative bias of the estimate is provided by the coefficient of variation (the ratio of the standard deviation to the mean).[2] Under simple random sampling the relative bias is O( n−1/2 ).
The correction methods, depending on the distributions of the x and y variates, differ in their efficiency making it difficult to recommend an overall best method. Because the estimates of r are biased a corrected version should be used in all subsequent calculations.
A correction of the bias accurate to the first order is[citation needed]
where mx is the mean of the variate x and sxy is the covariance between x and y.
To simplify the notation sxy will be used subsequently to denote the covariance between the variates x and y.
Another estimator based on the Taylor expansion is[3]
where n is the sample size, N is the population size, mx is the mean of the x variate and sx2 and sy2 are the sample variances of the x and y variates respectively.
A computationally simpler but slightly less accurate version of this estimator is
where N is the population size, n is the sample size, mx is the mean of the x variate and sx2 and sy2 are the sample variances of the x and y variates respectively. These versions differ only in the factor in the denominator (N - 1). For a large N the difference is negligible.
If x and y are unitless counts with Poisson distribution a second-order correction is[4]
Other methods of bias correction have also been proposed. To simplify the notation the following variables will be used
Pascual's estimator:[5]
Beale's estimator:[6]
Tin's estimator:[7]
Sahoo's estimator:[8]
Sahoo has also proposed a number of additional estimators:[9]
If x and y are unitless counts with Poisson distribution and mx and my are both greater than 10, then the following approximation is correct to order O( n−3 ).[4]
An asymptotically correct estimator is[3]
A jackknife estimate of the ratio is less biased than the naive form. A jackknife estimator of the ratio is
where n is the size of the sample and the ri are estimated with the omission of one pair of variates at a time.[10]
An alternative method is to divide the sample into g groups each of size p with n = pg.[11] Let ri be the estimate of the ith group. Then the estimator
where is the mean of the ratios rg of the g groups, has a bias of at most O( n−2 ).
Other estimators based on the division of the sample into g groups are:[12]
where is the mean of the ratios rg of the g groups and
where ri' is the value of the sample ratio with the ith group omitted.
Other methods of estimating a ratio estimator include maximum likelihood and bootstrapping.[10]
The estimated total of the y variate ( τy ) is
where ( τx ) is the total of the x variate.
The variance of the sample ratio is approximately:
where sx2 and sy2 are the variances of the x and y variates respectively, mx and my are the means of the x and y variates respectively and sxy is the covariance of x and y.
Although the approximate variance estimator of the ratio given below is biased, if the sample size is large, the bias in this estimator is negligible.
where N is the population size, n is the sample size and mx is the mean of the x variate.
Another estimator of the variance based on the Taylor expansion is
where n is the sample size and N is the population size and sxy is the covariance of x and y.
An estimate accurate to O( n−2 ) is[3]
If the probability distribution is Poissonian, an estimator accurate to O( n−3 ) is[4]
A jackknife estimator of the variance is
where ri is the ratio with the ith pair of variates omitted and rJ is the jackknife estimate of the ratio.[10]
The variance of the estimated total is
The variance of the estimated mean of the y variate is
where mx is the mean of the x variate, sx2 and sy2 are the sample variances of the x and y variates respectively and sxy is the covariance of x and y.
The skewness and the kurtosis of the ratio depend on the distributions of the x and y variates. Estimates have been made of these parameters for normally distributed x and y variates but for other distributions no expressions have yet been derived. It has been found that in general ratio variables are skewed to the right, are leptokurtic and their nonnormality is increased when magnitude of the denominator's coefficient of variation is increased.
For normally distributed x and y variates the skewness of the ratio is approximately[7]
where
Because the ratio estimate is generally skewed confidence intervals created with the variance and symmetrical tests such as the t test are incorrect.[10] These confidence intervals tend to overestimate the size of the left confidence interval and underestimate the size of the right.
If the ratio estimator is unimodal (which is frequently the case) then a conservative estimate of the 95% confidence intervals can be made with the Vysochanskiï–Petunin inequality.
An alternative method of reducing or eliminating the bias in the ratio estimator is to alter the method of sampling. The variance of the ratio using these methods differs from the estimates given previously. Note that while many applications such as those discussion in Lohr[13] are intended to be restricted to positive integers only, such as sizes of sample groups, the Midzuno-Sen method works for any sequence of positive numbers, integral or not. It's not clear what it means that Lahiri's method works since it returns a biased result.
The first of these sampling schemes is a double use of a sampling method introduced by Lahiri in 1951.[14] The algorithm here is based upon the description by Lohr.[13]
The same procedure for the same desired sample size is carried out with the y variate.
Lahiri's scheme as described by Lohr is biased high and, so, is interesting only for historical reasons. The Midzuno-Sen technique described below is recommended instead.
In 1952 Midzuno and Sen independently described a sampling scheme that provides an unbiased estimator of the ratio.[15][16]
The first sample is chosen with probability proportional to the size of the x variate. The remaining n - 1 samples are chosen at random without replacement from the remaining N - 1 members in the population. The probability of selection under this scheme is
where X is the sum of the N x variates and the xi are the n members of the sample. Then the ratio of the sum of the y variates and the sum of the x variates chosen in this fashion is an unbiased estimate of the ratio estimator.
In symbols we have
where xi and yi are chosen according to the scheme described above.
The ratio estimator given by this scheme is unbiased.
Särndal, Swensson, and Wretman credit Lahiri, Midzuno and Sen for the insights leading to this method[17] but Lahiri's technique is biased high.
Tin (1965)[18] described and compared ratio estimators proposed by Beale (1962)[19] and Quenouille (1956)[20] and proposed a modified approach (now referred to as Tin's method). These ratio estimators are commonly used to calculate pollutant loads from sampling of waterways, particularly where flow is measured more frequently than water quality. For example see Quilbe et al., (2006)[21]
If a linear relationship between the x and y variates exists and the regression equation passes through the origin then the estimated variance of the regression equation is always less than that of the ratio estimator[citation needed]. The precise relationship between the variances depends on the linearity of the relationship between the x and y variates: when the relationship is other than linear the ratio estimate may have a lower variance than that estimated by regression.
Although the ratio estimator may be of use in a number of settings it is of particular use in two cases:
The first known use of the ratio estimator was by John Graunt in England who in 1662 was the first to estimate the ratio y/x where y represented the total population and x the known total number of registered births in the same areas during the preceding year.
Later Messance (~1765) and Moheau (1778) published very carefully prepared estimates for France based on enumeration of population in certain districts and on the count of births, deaths and marriages as reported for the whole country. The districts from which the ratio of inhabitants to birth was determined only constituted a sample.
In 1802, Laplace wished to estimate the population of France. No population census had been carried out and Laplace lacked the resources to count every individual. Instead he sampled 30 parishes whose total number of inhabitants was 2,037,615. The parish baptismal registrations were considered to be reliable estimates of the number of live births so he used the total number of births over a three-year period. The sample estimate was 71,866.333 baptisms per year over this period giving a ratio of one registered baptism for every 28.35 persons. The total number of baptismal registrations for France was also available to him and he assumed that the ratio of live births to population was constant. He then used the ratio from his sample to estimate the population of France.
Karl Pearson said in 1897 that the ratio estimates are biased and cautioned against their use.[22]
Seamless Wikipedia browsing. On steroids.
Every time you click a link to Wikipedia, Wiktionary or Wikiquote in your browser's search results, it will show the modern Wikiwand interface.
Wikiwand extension is a five stars, simple, with minimum permission required to keep your browsing private, safe and transparent.