From Wikipedia, the free encyclopedia
This article is rated C-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | |||||||||||
|
In my [Mood, Graybill, Boes] the quantile is defined in a completely different way:
I find that definition simpler to explain and understand, and more general. Quantiles are continuous, not necessarily chopped in 4, 5, 10 or 100 or whatever. Else it would not be possible to express an irrational quantile (e.g. 1/sqrt(2)). And let's face it, what would the world be without irrational quantiles? :-)
Also this article, as it stands, does not clearly recognise that quantiles are properties of the distribution (be it continuous or discrete), and not of a sample. From a sample, we can only estimate (and not calculate, by the way) these attributes.--PizzaMargherita 20:18, 4 October 2005 (UTC)
Be nice for some NON Mathematical dicussions of these terms - you guys make it too hard!
I think there might be a slight mistake in the equvivalent characterization of the p- and q-quantil. Should the second line not be or somthing simlair?
Two images from Yrithinnd's list of Commons images might be of interest here?
--DO11.10 00:19, 5 May 2007 (UTC)
I came across the word tertile, but found no article for it, so I made a stub. The word is infrequent in use, but I ask that someone with a better command of the statistical lingo than mine refine the article. --Shingra 08:29, 23 July 2007 (UTC)
I came across this http://mathforum.org/library/drmath/view/60969.html which i believe have a much simplier explanation.
Should we put it in the external links at least? —Preceding unsigned comment added by 189.33.225.219 (talk) 05:41, 25 December 2007 (UTC)
I had added the "Quantiles of a sample" section, distinct from the "Quantiles of a population" section, because I have used the ideas of the former for several years, and had assumed that they were commonplace. However, since the time of adding that section, I have not seen any published works that are similar to my approach. Thus, I must conclude that my approach is actually original research — and I have removed it. Quantling (talk) 20:39, 10 August 2009 (UTC)
Quantiles of a sample, revisited: Can we really compute a 1 percentile or 99 percentile if we have, say, only points drawn from some distribution? The expected (average) percentile of the smaller of two points is 33 1/3 and the expected percentile of the larger of two points is 66 2/3. (See order statistics.) I have no problem computing percentiles between 33 1/3 and 66 2/3 by interpolating the two sampled values, but I am tempted to say that, for percentiles that are below 33 1/3 or above 66 2/3, we don't have enough information to estimate the value. Does that make sense? If so, is it appropriate to touch on these ideas in the article? Quantling (talk) 18:39, 2 February 2010 (UTC)
The distinction manifests itself in interpolation as well. If I had to choose one of points to represent the 35 percentile, I would choose the smallest value (with an expected percentile of 25, which is only 10 away from 35) rather than the middle value (50 percentile). However, the article tells me that the smallest value represents the percentile range up to at most 33 1/3 and that, among the three points, the middle value is the appropriate choice. Quantling (talk) 18:53, 2 February 2010 (UTC)
The current "Estimating the quantiles of a population" section now addresses these issues, by listing all R and SAS approaches for estimating quantiles. Quantling (talk) 15:28, 22 March 2010 (UTC)
I added a section / stub on approximate quantiles from a stream because these methods are becoming very popular. The section should be expanded a bit to summarize the method and explain the pro and cons, but at least, the issue is visible. — Preceding unsigned comment added by Jdfekete (talk • contribs) 07:24, 20 June 2020 (UTC)
I have removed the following
{{cite journal}}
: CS1 maint: multiple names: authors list (link)/refas this is a letter in a medical journal and it looks as it the writers are expressing personal opinions having looked a particular case rather than widely held view among statisticians.--Rumping (talk) 22:59, 13 December 2011 (UTC)
I've added Quartiles to the list of Specialized quantiles. Is there a reason it wasn't there in the first place, while less common quantiles like duo-deciles are present? --Adam Matan (talk) 11:35, 8 September 2013 (UTC)
Dear Leegrc,
The recent edit on estimating sample quantiles was undone by you. According to the Wikipedia rules, fully referenced new information should not be reverted, especially not when based on unreferenced arguments.
Your argument that the quantile at the kth value depends on "the context and purpose" suggest that the matter is somehow subjective. But this is mathematics. The quantiles are defined by the CDF, and the CDF is defined by the non-exceedance probabilities. The probability itself is not subjective, of course, but is defined by Kolmogorov's axioms. It follows that the quantile point values for the order ranks cannot depend on anything related to their use.
On the contrary, they are known exactly, please read the reference http://dx.doi.org/10.1155/2014/326579
Best regards,
RTELAM
RATLAM (talk) 08:33, 12 September 2015 (UTC)
Thank you for your response. I can see your point that the method of interpolation should be chosen according to the CDF when it is known, or can be deduced from the data. This is analogous to transforming the probability axis according to the assumed CDF in order to obtain a linear relationship, and then interpolating linearly.
However, my edit was not about the interpolation method. It was about the points between which interpolation is made. These points represent the probability that a random observation selected from the population does not exceed the hth value in the sample. They are known exactly, and since we are discussing order-statistics here, they are independent on the CDF. These probabilities are h/(N+1). For example, a new random observation does not exceed the highest value of two observations in a sample with the probability of 2/3 (regardless of the CDF!). Since the quantile function is defined as the inverse of the CDF, only R-6 gives the exact and correct value of h. It follows that all the other formulas given in the table are in error.
This issue is discussed in the reference Makkonen and Pajari (2014) in detail. The paper has been published in a peer reviewed journal. It has not been cited yet because it was published only nine months ago.
The point that I tried to make in my edit to the page “Empirical distribution function” was quite the same, i.e., that the traditional definition of EDF could be improved, because it gives erroneous point values and steps. That the steps are actually 1/(N+1), not 1/N, is outlined in Makkonen and Pajari (2014). However, this particular fact has also been proven and published, with illustrations, in the following reference: Makkonen, L. (2008) Bringing closure to the plotting position controversy. Communications in Statistics – Theory and Methods, 37, 460-467. This paper is highly cited (52 citations in Google Scholar so far) and could be used as the reference. A hint towards this finding was also given as a figure on p. 149 in the textbook Madsen et al (2006) “Methods of Structural Safety” ISBN-10: 0486445976. The first edition of this book was published in 1986 and the book has 2545 citations in Google Scholar. RATLAM (talk) 16:43, 18 September 2015 (UTC)
To both of your questions, my answer is no. Instead of using the midmost of the observed value, it is preferable to associate a probability ph to each of the order-ranked values xh and find a curve which fits to these points (xh,ph). This curve is an estimate for the CDF, and the inverse of this estimate is used to determine the quantiles, e.g. median. This is how methods R-4...R-9 are deduced. In all of them, a broken line connects point (xh,ph) to point (xh+1,ph+1). However, a better fit is often obtained when the curve is not forced to go via the points (xh,ph). In such a case, the estimated median and the midmost xh seldom coincide.
All methods R-4...R-9 in your counterexamples give the same median. Therefore, they are not helpful in clarifying, why only R-6 is adequate. My point is that methods R-4, R-5 and R-7...R-9 should be abandoned because they associate a wrong non-exceedance probability ph to xh, as shown in Makkonen, L. Bringing closure to the plotting position controversy. Communications in Statistics – Theory and Methods 37, 460-467 and Makkonen. L., Pajari, M. (2014) Defining Sample Quantiles by the True Rank Probability. Journal of Probability and Statistics, Article ID 326579, http://dx.doi.org/10.1155/2014/326579. RATLAM (talk) 17:30, 22 September 2015 (UTC)
The answer is yes to your first three questions. ph = h/(n+1) is the probability of a random observation not to exceed the h-th smallest value in the sample. This is always correct. Also the other formulas (except R-4) give the correct probability for the middle observation, but not for any other rank. Consider an example with N = 3 and h = 3, i.e. the probability of not exceeding the largest value in the sample of three. For this case, the formulas in the table give the following ph
Only one of these is the correct probability for this random process, and that is 3/4.
RATLAM (talk) 20:27, 22 September 2015 (UTC)
I was discussing continuous distributions. The estimation is done by fitting a distribution by using the points (xh,ph), and the loss function depends on how well this is done. Obviously, any such estimation should use all the available information. This includes the exact non-exceedance probabilities of the ranked data ph. So, why not use them? Regardless of how "best" is measured, it makes no sense to use, instead, some other incorrect probabilities as the basis of the fitting.
Pearson derived the beta distribution, and Emil Gumbel utilized it in 1958 to show that "ph = h/(n+1) is the expectation of the CDF associated with the rank h. That this actually equals the probability of not exceeding the h-th value is a more recent finding, the application of which to quantile estimation is the subject of the paper by Makkonen and Pajari (1984). RATLAM (talk) 19:40, 23 September 2015 (UTC)
That the loss function can be best minimized by some quantile formula does not prove anything of the correctness of that formula. This is because the loss function has no direct connection with the probability, and is thus biased in the probabilistic sense. What your example demonstrates is merely that it is possible to obtain a useful result by combining a biased method with suitably adjusted incorrect quantiles.
The probabilistically correct way to compare the quantile formulas is the following. Draw a line (estimated CDF) through points (x(h), p(h)) and (y(h), p(h)) and determine, by this line, X and Y that correspond to the p-values 1/3 and 2/3. Then, take from the original distribution a new random observation z, and see to which bin 1: [0, X), 2: [X, Y), 3: [Y, 1] it belongs. Repeat this and count the relative frequency of observations falling into each bin. The result where the relative frequencies in the bins approach the same shows that the method is good (in the same way as an equal frequency of the results 1-6 shows that a dice is fair). The Hazen formula (R-5) and the Weibull formula (R-6) provide the following result when 50 000 random observations are taken.
This figure is for the uniform distribution, discussed above, but the same result is obtained for any other continuous distribution. Only by using R-6, a unique asymptotic relative frequency is obtained, showing that all other quantile formulas are incorrect. RATLAM (talk) 11:27, 30 September 2015 (UTC)
Quantiles are not defined by their ability to minimize some function, but in the probablistic way explained in the beginning of the "Quantile" page. Their correctness should be viewed against the definition, of course. It is not inappropriate to use a loss function, but already your example shows that it can be misleading in the probabilistic sense.
The second paragraph of your reply nails it. Yes! People indeed think that they are using quantiles, when they are not. This is why these practices require sharpening and why, in my opinion, the "Quantile" page of Wikipedia should be edited further. RATLAM (talk) 13:53, 1 October 2015 (UTC)
Whether we are in agreement or not depends on the meaning of "legitimate". Any quantile formula other than R-6 will result in a poor estimate of the CDF, because incorrect point values of the probability are then used as the basis of the estimation. A good empirical estimate of the CDF is crucial in determining extreme values for structural design, as an example. Hence, in this connection, formulas other than R-6 are certainly not legitimate, and typically result in underestimating the risk of extreme events.
If one is, instead, interested in estimating something else than the CDF, then the situation may be different. For example, commonly, people estimate a parameter of a CDF in the sense that if one would take more samples, their mean would asymptotically approach the parameter of the underlying CDF. In this case, the result may be improved by using points other than those of R-6. This is because the parameters of a non-linear CDF are non-additive (p is additive by definition), which means that the mean of a distribution parameter actually does not approach the true parameter value. The error arising from this can be compensated by making another error: distorting the quantiles from their true values.
In this light, the incorrect probabilities should not be called anything that implies a probabilistic interpretation, such as quantiles. Neither is there any point in calling them estimates since the excact probabilites are known, and there is no need to estimate them - only to correct for an inadequate method. Perhaps, distorted quantiles, would be appropriate. RATLAM (talk) 13:27, 2 October 2015 (UTC)
Pr[y ≤ xk] = k/(N+1) is the probability of a new observation not to exceed the k-th smallest value in a sample. This is order statistics, so that Pr does not depend on the actual values in the sample. Thus, also in the experiment described in the second paragraph of your reply, Pr[y ≤ xk] = k/(N+1). Counting frequencies of y ≤ xk for a given value of xk cannot be done, since we have no given values of x, as we do not know the CDF. We wish to estimate the CDF and the quantiles, and for that we have estimates of xk (the observations in the sample) and their known rank probabilities Prk. RATLAM (talk) 18:10, 4 October 2015 (UTC)
Perhaps, the discussion is drifting away from "Estimating quantiles from a sample" which is the title of this section of the "Quantile" page. So, I would like to summarize. The quantile function is defined as the inverse of the CDF. The CDF is defined by the probability that a random observation, selected from the population, does not exceed some value x. For the h-th smallest value in a sample of N observations, this probability is h/(N+1). Therefore, R-6 in the table of the "Quantile" page gives the correct quantiles for the points. The other formulas provide erroneous quantiles. They may provide "best unbiased estimates" only for parameters that have no definite connection with the probability. The use of such parameters may be misleading, and it is important to be aware of this. The "Quantile" page of Wikipedia could be considerably improved by pointing out these issues. RATLAM (talk) 10:17, 6 October 2015 (UTC)
The value of xh has nothing to do with order-statistics. Example: Our randomly chosen sample consists of three men in a room, and we ask what is the probability Pr of a randomly selected man entering the room next not to be taller than the h-th shortest man in this sample. The answer is Pr = h/(N+1). For example, the probability of the incomer not to be taller than anyone in the sample is 3/4. This is easy to understand by noting that, once the newcomer has entered the room, there are four men in it, each randomly chosen, so that the probability of one of them (e.g. the newcomer) to be the tallest is 1/4. All this has nothing whatsoever to do with how tall the men actually are. RATLAM (talk) 13:41, 6 October 2015 (UTC)
There are no "other experiments" that need to be discussed here. The first sentence in the section Estimating quantiles from a sample of the Quantile page reads: "When one has a sample drawn from an unknown population, the cumulative distribution function and quantile function of the underlying population are not known and the task becomes that of estimating the quantiles." This is our problem, not something related to known distributions or intervals. RATLAM (talk) 11:08, 7 October 2015 (UTC)
Our problem is defined in the quoted sentence, in which I have now emphasized important words in Italics. Your example is a very different problem.
But I will briefly reply. In your example, where you know the distribution, estimation with the observations only is a poor alternative. One should utilize the information on the CDF when available. You have no bet, because nobody is suggesting that the estimation should be done so that an observed value as such (here 1.9 meters) is taken as a quantile in your example. Rather, a probability h/(N+1) is associated with each observed value, and the quantiles are obtained by the CDF fitted to all the points. RATLAM (talk) 14:48, 7 October 2015 (UTC)
I disagree with the last statement (8). As discussed, the probability of not exceeding the h-th smallest value in the sample of N is h/(N+1). Because CDF is defined by the non-exceedance probability, R-6 gives the best empirical estimate of the quantiles. “Efficiency” and “bias” are related to the expected value. Nothing can be said about them based on a single artificial sample. RATLAM (talk) 09:15, 14 October 2015 (UTC)
Yes, you have it right. A poor estimate is still an estimate. The point that should, in my opinion, be clear on the Quantile-page is that because R-6 gives the correct probabilities associated with order ranked observations, there is no benefit from using any of the other formulas in estimating quantiles, albeit they have historically been used. RATLAM (talk) 20:42, 15 October 2015 (UTC)
We know that the non-exceedance probability is h/(N+1) exactly. CDF and the quantiles are defined by the non-exceedance probability. Probability has a unique value assigned to an event. Therefore, it makes no sense to speak of its average or median. RATLAM (talk) 09:10, 17 October 2015 (UTC)
Your claim is incomprehensible, because the concept of "almost surely" applies to an outcome of an event. Probability is not an outcome, but a specific number assigned to an event. The condition x ≤ xh defines an event in a two-dimensional space with variates x and xh, and P(x ≤ xh) is the probability assigned to this event. A mathematical proof has been given by Madsen et al. (2006) and Makkonen and Pajari (2014) for that P(x ≤ xh) equals h/(N+1) . If you wish to claim otherwise, you would need to point out an error in this proof. RATLAM (talk) 07:38, 20 October 2015 (UTC)
But then, to be useful at all, each of the formulas in the Table should be connected to a specific definition for "close". This would mean that when discussing quantiles, one should always outline which "closeness" criterion they are based on. This would not be in harmony with the specific mathematical definition of quantiles (which requires the use of R-6). Moreover, for most methods in the Table, no valid closeness criterion has been presented in the literature. Based on Monte Carlo simulations, some of them have been claimed to give unbiased estimates for the distribution parameters for a particular distribution, but these "proofs" include a fundamental error discussed in detail in Makkonen and Pajari (2014).
How to get from the initial two-dimensional case of order ranked observations to the one-dimensional case of the final estimate of the CDF is explained graphically in Fig. 2 of reference Makkonen et al. (2012) Closure to Problems in the extreme value analysis. Structural Safety 40, 65-67.RATLAM (talk) 16:50, 23 October 2015 (UTC)
The formulas in the Table date from the time when it was thought that P(x ≤ xh) needs to be estimated in some way, i.e. when it was not understood that h/(N+1) gives the exact non-exceedance probability associated with each order ranked observation xh. Since all of them, except R-6, result in unnecessarily poor estimates, the Table should be removed from the page rather than the notes in it updated.
The two-dimensional formulation of considering Ph(x ≤ xh), is not "wrong". On the contrary, order-ranking is the foundation of the analysis, because it is the only way to use random obervations for estimating the underlying CDF and the quantiles. The starting point is to plot the points (Ph, xh). Using them, one can then make a fit and proceed to estimating the actual CDF. RATLAM (talk) 18:47, 24 October 2015 (UTC)
All the formulas in the Table are based on treating xh as a variable, i.e. a two-dimensional interpretation. In regard to the interplay between ordinary statistics and order statistics, there is no issue of one being more appropriate than the the other. One starts the analysis by order-ranking the observations, associates the rank probabilities to them, and next uses some fitting method to transfer to the one-dimensional interpretation, i.e., to making the final estimate of the CDF and the quantiles. Obviously, it makes sense to begin this procedure by using the correct rank probablities (R-6) rather than something else.
But, I think that I am repeating the arguments already made, and I would like to close this long general discussion on my part here. It would be good to discuss the specific improvements to be made on the Quantile page. RATLAM (talk) 09:03, 28 October 2015 (UTC)
It is self-evident that an estimate of a continuous variable will almost surely not give the exact result. Therefore, Criterion 2 is of no help. To compare the methods, one needs a criterion that provides a measure of the goodness of the estimate. Here, that criterion is:
Let Frp(S) be the estimate of the probability based on sample S, and let us call it the sample frequency. The frequency interpretation of probability tells us that the mean of Frp(S) over all possible samples S is Pr. Thus, criterion 5 becomes
Regarding the methods in the Table, R-6 provides the correct probability Pr = E[Frp(S)], whereas the other methods do not. RATLAM (talk) 19:16, 4 November 2015 (UTC)
Answers:
RATLAM (talk) 19:03, 5 November 2015 (UTC)
Frp(S) is an estimate, based on sample S, of the probability Prx[x ≤ G(p)] at points p=h/(N+1). This probability equals Prx[x ≤ G(h/(N+1))] = h/(N+1) = ES[Frp(S)]. Here G(p) is the underlying quantile function that Qp(S) tries to estimate. RATLAM (talk) 16:39, 9 November 2015 (UTC)
One could say that it was Andrey Kolmogorov, whose probability axioms define probability in the probability theory. The third axiom means that probability is additive. Therefore, its best estimate is the mean. This is also evident in the frequentist probability that is defined as: "The relative frequency of occurrence of an event, observed in a number of repetitions of the experiment, is a measure of the probability of that event." The relative frequency here is the sum of sample frequencies scaled by the number of samples, i.e. the mean of sample frequencies (not the median). RATLAM (talk) 15:41, 11 November 2015 (UTC)
The problem is described by Criterion 5. Criterion 5 equals Criterion 6. Thus, the problem is finding a method so that ES[Frp(S)] approaches p when S gets large. This method is R-6. RATLAM (talk) 12:08, 13 November 2015 (UTC)
We simply wish to find a method that best meets Criterions 5 and 6. We may use the fact that with a continuous variate x, x* = Q(p*) is close to x** = Q(p**) if and only if p* is close to p**. RATLAM (talk) 20:11, 15 November 2015 (UTC)
Stating that ES[Frp(S)] = ES{Prx[x ≤ Qp(S)]} = Prx[x ≤ G(p)] = p does not require pretending anything, because it is the definition of probability (see frequentist probability). This definition transforms Criterion 5 into Criterion 6. Thus, the requirement to use Criterion 6, i.e. the mean, could not be more fundamental.
One could still say that the method does not matter if "close to" could not be quantified. But it can. This is explained in connection with the figure above (Week of September 27, 2015). Of course, we should use the theoretically justified and quantitatively best method. It is R-6. RATLAM (talk) 11:23, 17 November 2015 (UTC)
Perhaps this problem originated on 9th November when you wrote; "You are then stating that since Frp(S) is an estimate of a probability, only functions Qp(S) that give an expectation ES[Frp(S)] = ES[Prx[x ≤ Qp(S)]] equal to p are reasonable."
This was not the crux of my argument. Frp(S) represents a sample, and the argument is: Only functions Qp(S) that give an expectation ES[Frp(S)] = ES[Prx[x ≤ Qp(S)]] equal to p are reasonable, because probability is defined so that ES[Frp(S)] = Prx[x ≤ G(p)] = p. RATLAM (talk) 16:46, 17 November 2015 (UTC)
The question related to Criterion 6 is not: How close Frp(S) is to p?. It is: How close ES[Frp(S)] is to p? RATLAM (talk) 18:03, 18 November 2015 (UTC)
I agree on the first two statements, but not the third one. Expectation over S has been mentioned, because p has been mentioned and p = ES[Frp(S)].
In statistics, one sample value does not give the truth. The truth comes out in a repeated test. This is where the "definition of the probability" enters the scene.
Your comment on what you may care most applies to measures of the variable of interest, i.e. on the x-axis. On the p-axis one is interested in the probability only (i.e. the mean of Frp(S), not some other measure of Frp(S)), because it is the probability that needs to be associated with x in order to estimate the CDF or its inverse.
We need a statistically waterproof verification for the method we are using, and the choice of the method must be based on the definition of probability and its consequences. Otherwise, our quantile estimate will be biased, and no one knows how much. Since Prx[x ≤ xh] = h/(N+1) = ES[Frp(S)], a unique probability h/(N+1) is associated with xh also when we have only one sample. This is done in method R-6. When doing so, and considering any given distribution, it comes out that the number of hits of random x-values in bins (-∞,G(p1)], …, (G(ph), G(ph+1)], …, (G(pN),+∞) approaches a uniform distribution, when number of samples S increases. This is a fundamental property of the original distribution, too. No other criterion for "close to" gives this result. RATLAM (talk) 14:22, 20 November 2015 (UTC)
Yes, using your formalism, I meant Prx,S[x ≤ xh].
Your comment “I still do not see why I should have to follow a recipe that is designed to estimate/predict the latter” appears to include a misunderstanding. Taking the probability as the mean of sample frequencies has nothing to do with estimating or predicting probability. It is the definition of probability.
Let us try again with the bin-criterion demonstrated in the figure. Perhaps, this will clarify the issues related to some of your questions. Considering any given distribution, it comes out that the number of hits of random x-values in bins (-∞,G(p1)], …, (G(ph), G(ph+1)], …, (G(pN),+∞), where ph = h/(N+1), approaches uniform distribution, when the number of samples S increases. This is equivalent to the fact that the number of hits of p = F(x)-values in bins [0,p1],(ph,ph+1] …, (pN,1] also approaches uniform distribution. For a given sample S and given Qp(S), take m random x-values, and record, how many times the values of the inverse of Qp(S) hit in each bin [0,p1],(ph,ph+1] …, (pN,1]. Repeat this with a new sample S and so on. Then, the total number of hits in each bin [0,p1],(ph,ph+1] …, (pN,1] approaches uniform distribution. This is the case only if ph, the probability associated to xh in Qp(S), is h/(N+1). RATLAM (talk) 22:11, 22 November 2015 (UTC)
You have that right. For a given p, xp,S = Qp(S) is a variate which depends on N variates x1,..., xN. However, for a given p and S, F(xp,S) = Prx[x ≤ Qp(S)] is a number, the value of which cannot be calculated because F is unknown. Therefore, it is impossible to compare, whether this number is close to p or not. Even more important is the fact that, in statistics, the goodness of an estimate can never be based on one sample. In other words, for given p and S, it is irrelevant to ask whether Prx[x ≤ Qp(S)] is close to p or not. RATLAM (talk) 13:30, 24 November 2015 (UTC)
Yes. RATLAM (talk) 11:34, 25 November 2015 (UTC)
You don't want to compare apples with oranges. What you need is a statistical measure of Frp(S) that can be compared with p. p is a probability. Therefore, your measure must be a probability, i.e., it must be ES[Frp(S)]. RATLAM (talk) 07:57, 26 November 2015 (UTC)
I was not using the word "equal". "Comparing with p, of course, means quantifying "close to p". For doing that, one needs an estimator of p. You are arguing that medianS[Frp(S)] is a better estimator of p than E[Frp(S)].
Consider an analogy: A coin is flipped ten times, and this sampling is repeated S times. We then get sample frequencies for the coin landing heads-up, Fr1, Fr2,..FrS. We wish to know how close to being fair the coin is. To find that, we need to compare an estimator of Frp(S) with the probability p = 0.5 of an ideally fair coin. Since p is a probability, and probability is defined as ES[Frp(S)], when S gets large, our unbiased estimator is the mean, and our criterion is "how close ES[Frp(S)] is to p". When we test many coins, the one for which ES[Frp(S)] is closest to p is the fairest one.
But, in analogy with this, you insist that the criterion should not be the one used above, but instead "how close medianS[Frp(S)] is to p". Note that the distribution related to an unfair coin may not be symmetric.
Why? Do you actually think that such a criterion would provide us with a better selection of the fairest coin?
RATLAM (talk) 14:45, 28 November 2015 (UTC)
How do you define "close to"?
How do you quantify "close to" without using an estimator?
The other approaches are not "fine", because they give an erroneous probability P(x ≤ xh), as has been discussed (e.g. Week of October 18, 2015).
The analogy was not about quantiles, but about how to evaluate "close to". I would like to understand what you imply, and hence repeat my question: Do you think that the median of sample frequencies results in a better quantitative estimate of the probability than the mean? RATLAM (talk) 08:48, 1 December 2015 (UTC)
Thank you for clarifying the nature of our disagreement. I think that it is becoming clear now, so that we should finally be able to sort this out.
The main disagreement is reflected in your response "How to define and quantify "close to" depends upon context and purpose". I agree with that. However, what you are missing is that, in our task, the context and purpose are known! The purpose is to estimate the quantiles, which is a well defined statistics problem, for which there exists a specific statistical measure.
About the coin analogy: You imply that the only reason for the median not to work better than the mean is granularity? What if we have 1000 tosses per experiment?
I will comment your understanding of my assertions below.
1. Yes. It is the same problem, so that we can use "can" instead of "must" if you wish. However, I am not comfortable with your wording here, as it implies that this is our goal. Comparing random samples individually with p does not lead to anything. We need a statistical measure. Thus, only the next issue (2.) is relevant.
2. Not equal. The statistic should be a consistent estimator of p. The mean of sample frequencies is the consistent estimator of a probability.
3. Yes. But this can be formulated more generally: It should be a consistent estimator of p.
4. See, 3. above.
As to your thoughts about issue 2, I agree that one can, in principle, use the minimum of a cost function as the statistical measure here. However, for that measure to be reasonable, it would have to be a consistent estimator of p. Since the consistent estimator of p is the mean of Frp(S), the function used must be the squared-error loss function (see, loss function). Thus, it cannot be freely chosen, and using it reduces the estimation problem to applying the mean of Frp(S) in the first place.
RATLAM (talk) 21:42, 2 December 2015 (UTC)
You ask who says that we must use a consistent estimator. The estimation theory says so, for example "Consistency is a relatively weak property and is considered necessary of all reasonable estimators" (http://www.cc.gatech.edu/~lebanon/notes/consistency.pdf).
We now appear to be able to reduce the core of our disagreement into one question: Is the mean or median of sample frequencies to be favoured when estimating a probability?
This I believe because you imply, in your reply to the coin analogy (in which N is constant and number of FrS values grows without bound), that you would favour the median (in the absence of the granularity issue), and argue for using a cost function for that corresponds to the median estimator (Loss function: "the median is the estimator that minimizes expected loss experienced under the absolute-difference loss function").
Accordingly, I will resolve this issue by showing that the mean must be favoured.
THEORY:
Frequentist probability: "As the number of trials approaches infinity, the relative frequency will converge exactly to the true probability". The mean of sample frequencies equals the total relative frequency. Therefore, as the number of trials approaches infinity, the mean of sample frequencies converges exactly to the true probablity. Thus, the mean of sample frequencies is a strongly consistent estimator of probability.
A reasonable estimator should be consistent, but the median of sample frequency does not converge to the true probability. This is easy to see, because the mean does that exactly, and the mean and the median converge to a different value (for an asymmetric distribution). Hence, the median of sample frequencies is an inconsistent estimator of probability.
In fact, in our problem, we take the mean over the whole parameter space S, and that mean ES[Frp(S)] equals probability by definition. Obviously, no estimator can do a better job. This is the reason why we should not even consider formulas other than R-6 when estimating quantiles by a sample.
NUMERICAL EXAMPLE:
In the figure enclosed, the probabilistic bin-analysis, described on 22th November, is applied to a Monte Carlo simulation for a uniform distribution between 0 and 1. Three random values of x are chosen, the first two of them are order-ranked (x1 and x2), a straight line L is drawn through points (x1,p1) and (x2,p2) where p1 and p2 are the plotting positions, and the probability estimate of the third value, i.e. L(x3), is taken from this line. This cycle is repeated 50 000 times both for Weibull formula (mean) and Jenkinson's formula (median). Frequency of hits of L(x3) in bins 1 (probability interval [0,1/3], 2 ((1/3,2/3] and 3 ((2/3,1] is recorded. "Exact" refers to the original distribution. The method based on the mean criterion works as expected, whereas the method based on the median criterion is far from ideal.
RATLAM (talk) 21:56, 3 December 2015 (UTC)
Because p that minimizes the expected loss experienced under a loss function is an estimator of p. Our problem is to find p which is close to Prx[x ≤ Qp(S)], not vice versa.
We know the ordered x-values xh of a sample, but we do not know F(xh). Therefore, we need to associate to each xh a certain probability ph, called plotting position. This ph depends on the rank h but not on the actual value of xh. The crucial point here is that it is ph we have to find, not xh. Take e.g. the loss function ∑S[F(xh) - ph]2. The relevant question is, how to choose ph to minimize the loss function when xh gets all possible values, and the answer is ph = ES(F(xh)) = h/(N+1). The reverse question: With a given p, how to choose x to minimize the loss function, results in nothing useful.
Different criteria have historically been used to determine ph. They each result in different ph-values. This is easy to see by taking uniform distribution between 0 and 1, assuming that x1=1/3, x2=2/3 and plotting the Qp:s according to the methods R-4 - R-9 on paper. It is statistically impossible to have two or more quantiles for a given distribution and given p. Thus, only one of the corresponding criteria can be correct. As I showed in my previous reply, it is R-6.
RATLAM (talk) 19:54, 4 December 2015 (UTC)
I am sorry, but I have no resources to continue this discussion any further. It seems to be endless and keeps diverging into irrelevant issues. The important aspects have been outlined above, e.g. in "THEORY" and there is no reason to repeat them. There are also journal publications that include clear explanations of this topic, in particular, Makkonen and Pajari (2014). Numerical simulations attached to this discussion also speak for themselves.
None of the items of your list are both accurate and relevant to our problem. I mention a few problems. The formalism is not clear, e.g., Qp(S) cannot be both an estimator (item 1.) and an estimate (item 4). "Number of data points" in items 5 and 11 should read "number of samples". Items 13 and 14 are irrelevant to our problem, which is to estimate quantiles by a sample, i.e. our N is fixed. The estimators of item 15 are not useful for anything, so that their bias is irrelevant. 16 is true, but hides the most important issue, which is that only R-6 provides a consistent estimator. 17 and 18 are very misleading statements. Generally speaking, they are true, but there is no "If" in our problem. For our problem, we definitely need to use the mean as the estimator, because it is the only consistent estimator of probability. This means that the loss function minimized must be the squared error. We do not want to minimize anything else, because that would lead to an estimator which is not the mean, and to an incorrect quantile.
I believe that the main reason for these difficulties is that thinking of this problem in terms of the reverse of the CDF is difficult. It is much easier to consider this in terms of estimating the CDF. Below is this explanation.
There it is in its simplicity. I doubt if anyone has been following our extensive discussion, but if so, I would advice them to neglect all of it, and consider this explanation only.
I would like to see the Quantile page improved, and accordingly, also the Q-Q plot page, where the plotting positions are discussed. I would be pleased to comment on concrete suggestions towards that goal. But this discussion, I need to close on my part now. Thank you.
RATLAM (talk) 20:50, 9 December 2015 (UTC)
This long-lasting discussion between two parties is related to a more than century long disputation about plotting positions. So far, it seems to be closed without agreement. I wonder whether the following contribution could help in finding some kind of convergence.
In methods R-4 … R-9, a probability estimate of the form ph= (h+a)/(N+b) is associated to each observed, order-ranked value xh. a and b are constants specific to each method. The different values of a and b form the essential difference between the methods. ph is an estimate of F(xh), the true probability of x not exceeding xh. We can concentrate on the properties of (xh,ph) pairs where h is 1,...,N, because all other (x,p) pairs predicted by the methods are obtained either by interpolation or extrapolation.
By a loose definition, a consistent estimator of a parameter is a sequence which converges to the correct value of the parameter when the number of samples “grows to infinity”. Leegrc states that all methods R-4 … R-9 are consistent estimators of Qp(S). I disagree. It is true and nice to know that when N increases indefinitely, all such estimates Qp(S) converge in probability to G(p), but this does not mean that the methods are consistent estimators because N is the size of one sample, not the number of samples.
Instead of Qp(S) I would like to use notation Qi(p,Sj) where i = 4,…,9 refers to the index of methods R-4 … R-9 and j to the index of the sample. In order statistics, a sample consists of a set of order-ranked values x1,...,XN. Increasing the number of samples (=M) means increasing the number of such sets, each of size N, not increasing N. Given a probability p, the sequence provided by any of the methods above is Qi(p,SM) which does not converge with increasing M. It follows that none of the methods is a consistent estimator of G(p) as such.
We can construct a consistent estimator A e.g. by calculating the mean of successive values, i.e. A = [Qi(p,S1+...+Qi(p,SM]/M which is a consistent estimator of parameter E(F(xh)) when ph= h/(N+1). In the same way, B =Median[Qi(p,S1,...,Qi(p,SM] can be regarded as an approximately consistent estimator of Median(F(xh)) when ph is = (h-1/3)/(h+1/3). When we estimate two different parameters, the estimators are different and the existence of a consistent estimator does not tell which method is better.
The previous discussion has dealt with the question: If Qi(p,Sj) is close to G(p), does it matter how we define “close to”? Is it e.g. OK to minimize the sum of absolute values of deviations p-F(xh) as in R-8 or should we minimize the sum of squared deviations as in R-6? When the loss functions are minimized, the resulting ph,6 and ph,8 are different. The original question can now be reformulated: Does it matter whether we are close to p1 or close to p2 when p1 ≠ p2. The answer should be obvious but let’s continue. It follows that Q6(ph,6,S) = Q8(ph,8,S)=xh. Any estimator e applied to M samples gives e(Q6) = e(Q8) which both converge to the same value, say x0. If G(p) is the parameter to be estimated and e is a consistent estimator of G(p), then x0 = G(ph,6) = G(ph,8). This is impossible because ph,6 ≠ ph,8. Consequently, either R-6 or R-8 is wrong or they both are.
Since both LEEGRC and RATLAM approve R-6, there is no need to promote it here anymore. All other methods applying plotting positions other than h/(N+1) contradict R-6 in the same way as R-8. If we prefer estimates of G(p) which are unbiased and consistent for N probability values, we choose R-6 from the Table. It is very difficult to find a situation in which a biased estimate, given by any other method and converging to a parameter other than G(p) should be preferred. Therefore, the article could be improved by adding a warning and, in the long run, by introducing new methods, e.g. R-6 modified in such a way that the interpolation is carried out on probability paper, and by moving the methods other than R-6 to a historical review with explanation. BERKOM 18:59, 18 December 2015 (UTC) BERKOM 13:10, 22 December 2015 (UTC) — Preceding unsigned comment added by BERKOM (talk • contribs)
It is true that when N increases indefinitely, all methods R-4 ... R-9 converge to the same value. More generally, when b is an arbitrary positive real number, the probability h/(N+b) associated to each observed order-ranked value xh also produces an estimator which converges to the same value when N increases indefinitely. It is obvious that such a convergence alone is not enough to justify any of the methods in any relevant case, i.e. when N is definite. BERKOM 14:16, 22 November 2016 (UTC)
Hello fellow Wikipedians,
I have just modified one external link on Quantile. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at {{Sourcecheck}}
).
This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}}
(last update: 5 June 2024).
Cheers.—InternetArchiveBot (Report bug) 12:35, 21 July 2016 (UTC)
The central message in the "Notes" column was obscured by extra sentences on handling values outside the range of the sample data (for example "When p < 1 / (N+1), use x1. When p ≥ N / (N + 1), use xN.")
Besides being a distraction, there was also a question of accuracy. There is no general agreement on how to extrapolate outside the range of data elements — different packages have made different choices. For example, MS Excel's PERCENTILE.EXC refuses to guess and returns @NA. Linear extrapolation from the two nearest end points is also reasonable given that linear interpolation is being used for all the other segments. The article previously described flattening the curve and assuming the underlying distribution is constant at the end-points (contrary to Hyndman & Fan's property P5 which posits "that for a continuous distribution, we expect there to be a positive probability for values beyond the range of the data").
To table make table clearer and to improve accuracy, those notes were moved to a bullet-point comment underneath the table. Additionally, the notes no longer take a position on how or whether to extrapolate beyond the range of the sample data.
Rdhettinger (talk) 17:59, 31 May 2019 (UTC)
I just made a preliminary edit where I made it clear that the word percentile (and probably also quantile) has two distinct meanings. It's either the scalar as this article talks about, or one of the intervals that the scalar values limit. In my commit comment I wrote:
"The word percentile (and, I guess, any quantile) can also mean the interval that the scalar percentiles (or, more generally, the scalar quantiles) limit. They're just different meanings used in different contexts. I guess this should really be stated more prominently, as the interval meaning is in fact being used in scientific papers, and the existence of two separate but equally valid meanings depending on context seems to be a source of confusion."
I mention it here as well, as I believe this is in fact an important distinction to make the readers aware of, to avoid confusion. Words can have different meanings depending on context, and that is fine, as long as it's clearly defined and clearly understood.
I do believe, as mentioned, that we should state this distinction more prominently than under the "discussion", as it is quite relevant. I just don't know how. Maybe someone else has an idea of how to do it? It should of course also be stated more precisely. My edit was just meant to at least not write that it is wrong to use the interval meaning of percentile or quantile. It is in fact correct in the proper context and with a universally agreed upon meaning.
Isn't the definition of k-th q-quantile wrong? Shouldn't it be "Pr[X ≥ x] ≥ 1- k/q" instead of "Pr[X ≥ x] ≥ k/q"?
The current definition reads that "q-quantiles are values that partition a finite set of values into q subsets of (nearly) equal sizes". Which does not sound that sane. Take, for example, the normal distribution. Not only is its value set infinite, it is not even bounded. And even if we were to take some bounded interval from it, that interval would still be an infinite set, because it is continuous. Should the word "finite" be removed from this definition? 188.242.96.147 (talk) 13:57, 8 October 2023 (UTC)
Seamless Wikipedia browsing. On steroids.