|
|
| Author |
Message |
standelds science forum addict
Joined: 09 Sep 2005
Posts: 55
|
Posted: Sun Jun 11, 2006 6:46 am Post subject:
Variance
|
|
|
A friend of mine asked me this the other day, and I was stuck not knowing
the answer. I was slightly embarassed as I am in the senior year of a BS in
math.
In the formula for sample variance,
s^2 = sum(x - mean(x))^2/(n-1)
why do you subtract one from the sample size?
Thanks,
Dustin |
|
| Back to top |
|
 |
matt271829-news@yahoo.co. science forum Guru
Joined: 11 Sep 2005
Posts: 846
|
Posted: Sun Jun 11, 2006 10:53 am Post subject:
Re: Variance
|
|
|
standelds wrote:
| Quote: | A friend of mine asked me this the other day, and I was stuck not knowing
the answer. I was slightly embarassed as I am in the senior year of a BS in
math.
In the formula for sample variance,
s^2 = sum(x - mean(x))^2/(n-1)
why do you subtract one from the sample size?
Thanks,
Dustin
|
If you calculate the mean from the sample, and then use this sample
mean to calculate the sample variance, then the obvious calculation s^2
= sum(x - mean(x))^2/n will be biased, in the sense that if you took a
sample of size n very many times, calculated the variance in this way
for each sample, and then averaged the variances, the result would not
converge to the true variance of the population.
Intuitively, the reason is that the sample mean will tend to be biased
towards the "centre" of the sample values, so the deviations from the
sample mean will tend to be lower than from the true mean.
The "n - 1 correction" adjusts for this underestimation, and gives an
unbiased estimate of the true population variance.
However, if in the variance calculation you use the true mean (as, say,
known from theory) rather than the sample mean, then the "n - 1
correction" should *not* be used. |
|
| Back to top |
|
 |
Stan Brown science forum Guru Wannabe
Joined: 06 May 2005
Posts: 279
|
Posted: Sun Jun 11, 2006 2:14 pm Post subject:
Re: Variance
|
|
|
Sun, 11 Jun 2006 06:46:05 GMT from standelds
<standelds@hawaii.rr.com>:
| Quote: | A friend of mine asked me this the other day, and I was stuck not knowing
the answer. I was slightly embarassed as I am in the senior year of a BS in
math.
In the formula for sample variance,
s^2 = sum(x - mean(x))^2/(n-1)
why do you subtract one from the sample size?
|
Short answer: to adjust for the fact that this is a sample and not
the whole population.
Longer answer: http://www.childrens-mercy.org/stats/ask/df.asp
--
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://OakRoadSystems.com/ |
|
| Back to top |
|
 |
Virgil science forum Guru
Joined: 24 Mar 2005
Posts: 5536
|
Posted: Sun Jun 11, 2006 5:12 pm Post subject:
Re: Variance
|
|
|
In article <1150023213.497484.26170@y43g2000cwc.googlegroups.com>,
matt271829-news@yahoo.co.uk wrote:
| Quote: | standelds wrote:
A friend of mine asked me this the other day, and I was stuck not knowing
the answer. I was slightly embarassed as I am in the senior year of a BS in
math.
In the formula for sample variance,
s^2 = sum(x - mean(x))^2/(n-1)
why do you subtract one from the sample size?
Thanks,
Dustin
If you calculate the mean from the sample, and then use this sample
mean to calculate the sample variance, then the obvious calculation s^2
= sum(x - mean(x))^2/n will be biased, in the sense that if you took a
sample of size n very many times, calculated the variance in this way
for each sample, and then averaged the variances, the result would not
converge to the true variance of the population.
Intuitively, the reason is that the sample mean will tend to be biased
towards the "centre" of the sample values, so the deviations from the
sample mean will tend to be lower than from the true mean.
The "n - 1 correction" adjusts for this underestimation, and gives an
unbiased estimate of the true population variance.
However, if in the variance calculation you use the true mean (as, say,
known from theory) rather than the sample mean, then the "n - 1
correction" should *not* be used.
|
Note also, that when using this correction for the standard deviation
instead of the variance, it introduces a slight bias of its own, which
is quite small and is usually ignored. |
|
| Back to top |
|
 |
Jasen Betts science forum Guru Wannabe
Joined: 31 Jul 2005
Posts: 176
|
Posted: Sun Jun 11, 2006 8:46 pm Post subject:
Re: Variance
|
|
|
On 2006-06-11, standelds <standelds@hawaii.rr.com> wrote:
| Quote: | A friend of mine asked me this the other day, and I was stuck not knowing
the answer. I was slightly embarassed as I am in the senior year of a BS in
math.
In the formula for sample variance,
s^2 = sum(x - mean(x))^2/(n-1)
why do you subtract one from the sample size?
|
That's the formula for predicting the variance of a population given only a
sample of the population, and since every sample is somewhat biased the -1
compensates for that bias.
With a BS in math you may be able to proove that, I did only one year of stat
and recall that expllanation from the first semester.
--
Bye.
Jasen |
|
| Back to top |
|
 |
matt271829-news@yahoo.co. science forum Guru
Joined: 11 Sep 2005
Posts: 846
|
Posted: Mon Jun 12, 2006 9:58 am Post subject:
Re: Variance
|
|
|
Jasen Betts wrote:
| Quote: | On 2006-06-11, standelds <standelds@hawaii.rr.com> wrote:
A friend of mine asked me this the other day, and I was stuck not knowing
the answer. I was slightly embarassed as I am in the senior year of a BS in
math.
In the formula for sample variance,
s^2 = sum(x - mean(x))^2/(n-1)
why do you subtract one from the sample size?
That's the formula for predicting the variance of a population given only a
sample of the population, and since every sample is somewhat biased the -1
compensates for that bias.
|
It's not that the *sample* is (systematically) biased - we assume here
that it isn't. The point is that, even with an unbiased sampling
technique, the sample *variance* is systematically biased if you use
the sample mean to calculate it.
| Quote: |
With a BS in math you may be able to proove that, I did only one year of stat
and recall that expllanation from the first semester.
--
Bye.
Jasen |
|
|
| Back to top |
|
 |
David C. Ullrich science forum Guru
Joined: 28 Apr 2005
Posts: 2250
|
Posted: Mon Jun 12, 2006 11:20 am Post subject:
Re: Variance
|
|
|
On Sun, 11 Jun 2006 06:46:05 GMT, "standelds"
<standelds@hawaii.rr.com> wrote:
| Quote: | A friend of mine asked me this the other day, and I was stuck not knowing
the answer. I was slightly embarassed as I am in the senior year of a BS in
math.
In the formula for sample variance,
s^2 = sum(x - mean(x))^2/(n-1)
why do you subtract one from the sample size?
|
The reason there's a "correction" is that the variance
in the sample is going to be less than the real variance.
(Suppose you just take one sample. Then there's no
variation at all in your sampled data. Does that mean
you want to estimate the actual variance to be 0?
No, taking just one sample gives no information at all
about the real variance.)
To see that the correction should be exactly what
it is you do the math:
Let's suppose to simplify things that the real mean is 0
and the real variance in the underlying distribution is 1.
Say we take N independent samples X_1, .. X_N. Now, we
don't know that the real mean is 0. Our best guess
for the real mean is
m = (X_1 + ... + X_N)/N.
So our estimate for the variance is going to have
_something_ to do with the expected value of
the sum of (X_j - m)^2. Let's see what that is.
First,
E(X_1 - m)^2 = E((N-1)X_1 - X_2 - X_3 ... - X_N)^2/N^2
= ((N-1)^2 + 1 + 1 + ... + 1)/N^2
= ((N-1)^2 + (N-1))/N^2
= N(N-1)/N^2 = (N-1)/N.
The other terms are all the same, so we get
E(sum(X_j-m)^2) = N-1.
So if we define the sample variance to be
sv = sum(X_j-m)^2/(N-1)
then we get
E[sv] = 1.
So. Defining the sample variance in that funny way
makes the expected value of the sample variance
exactly equal to the real variance.
************************
David C. Ullrich |
|
| Back to top |
|
 |
Paul@Methuselah.fsnet.co. science forum beginner
Joined: 12 Jun 2006
Posts: 1
|
Posted: Mon Jun 12, 2006 11:11 pm Post subject:
Re: Variance
|
|
|
standelds wrote:
| Quote: | A friend of mine asked me this the other day, and I was stuck not knowing
the answer. I was slightly embarassed as I am in the senior year of a BS in
math.
In the formula for sample variance,
s^2 = sum(x - mean(x))^2/(n-1)
why do you subtract one from the sample size?
Thanks,
Dustin
|
If memory serves, that S^2 thing is termed "the sample unbiased
estimate of the population variance": if you use the bog-standard
sample variance, V, where V = {sum(x - mean(x))^2}/n, you find that the
expectation of V, E[V], comes out as E[V] = sigma^2 * (n - 1) / n,
where sigma^2 is the *actual* population variance. (The proof isn't
too bad.) Hence V is *not* an unbiased estimator for the true
population variance (remember that in order for a sample statistic, T,
to be an unbiased estimator for some population parameter, theta, its
expectation must be equal to theta, i.e. E[T] = theta).
But, by using S^2 = V * n / (n - 1) instead of plain old V, we get
E[S^2] = E[V] * n / (n - 1) = sigma^2, which is just what the doctor
ordered.
And as for being embarrassed, don't worry - better to look a bit daft
now rather than at exam time. I remember my own chagrin at being the
only student in my quantum mechanics tutorial who didn't know what the
idea of valency was all about. My tutor, a kindly man, turned to me
with a beneficent look and began slowly, "well, you can think of it as
if atoms have little sticks poking out of them...". Oh dear.
Best wishes,
Paul. |
|
| Back to top |
|
 |
Rob Johnson science forum Guru
Joined: 26 May 2005
Posts: 318
|
Posted: Sat Jun 17, 2006 5:09 pm Post subject:
Re: Variance
|
|
|
In article <njiq82912brblgl6sgr80i5c120f6k4rit@4ax.com>,
David C. Ullrich <ullrich@math.okstate.edu> wrote:
| Quote: | On Sun, 11 Jun 2006 06:46:05 GMT, "standelds"
standelds@hawaii.rr.com> wrote:
A friend of mine asked me this the other day, and I was stuck not knowing
the answer. I was slightly embarassed as I am in the senior year of a BS in
math.
In the formula for sample variance,
s^2 = sum(x - mean(x))^2/(n-1)
why do you subtract one from the sample size?
The reason there's a "correction" is that the variance
in the sample is going to be less than the real variance.
(Suppose you just take one sample. Then there's no
variation at all in your sampled data. Does that mean
you want to estimate the actual variance to be 0?
No, taking just one sample gives no information at all
about the real variance.)
To see that the correction should be exactly what
it is you do the math:
Let's suppose to simplify things that the real mean is 0
and the real variance in the underlying distribution is 1.
Say we take N independent samples X_1, .. X_N. Now, we
don't know that the real mean is 0. Our best guess
for the real mean is
m = (X_1 + ... + X_N)/N.
So our estimate for the variance is going to have
_something_ to do with the expected value of
the sum of (X_j - m)^2. Let's see what that is.
First,
E(X_1 - m)^2 = E((N-1)X_1 - X_2 - X_3 ... - X_N)^2/N^2
= ((N-1)^2 + 1 + 1 + ... + 1)/N^2
= ((N-1)^2 + (N-1))/N^2
= N(N-1)/N^2 = (N-1)/N.
The other terms are all the same, so we get
E(sum(X_j-m)^2) = N-1.
So if we define the sample variance to be
sv = sum(X_j-m)^2/(N-1)
then we get
E[sv] = 1.
So. Defining the sample variance in that funny way
makes the expected value of the sample variance
exactly equal to the real variance.
|
This was the only post in this thread that really did the math and
did not wave hands making reference to degrees of freedom.
Another way to look at this is to consider the sample mean, m_s, vs
the distribution mean, m_d. If the sample size is n, then the sum
of the samples is n m_s and the expected value of the sum of the
samples is n m_d.
Suppose the variance of the distribution is v_d. Since n m_s is just
the sum of n variates, we know that the variance of n m_s is n times
the variance of one variate; that is, n v_d.
Since, as we mentioned previously, the expected value of n m_s is
n m_d, we can state this as E((n m_s - n m_d)^2) = n v_d. Using the
linearity of the expected value, we get E((m_s - m_d)^2) = 1/n v_d.
Compute the expected sample variance, E(v_s), using the last equation
(as is done in <http://www.whim.org/nebula/math/varn-1.html>), and we
get that E(v_s) = (n-1)/n v_d. Same result, slightly different
approach.
Rob Johnson <rob@trash.whim.org>
take out the trash before replying
to view any ASCII art, display article in a monospaced font |
|
| Back to top |
|
 |
Google
|
|
| Back to top |
|
 |
|
|
The time now is Wed Oct 19, 2011 8:21 pm | All times are GMT
|
|
Copyright © 2004-2005 DeniX Solutions SRL
|
|
Other DeniX Solutions sites:
Electronics forum |
Medicine forum |
Unix/Linux blog |
Unix/Linux documentation |
Unix/Linux forums |
send newsletters
|
| |
|
Powered by phpBB © 2001, 2005 phpBB Group
|
|