Author 
Message 
standelds science forum addict
Joined: 09 Sep 2005
Posts: 55

Posted: Sun Jun 11, 2006 6:46 am Post subject:
Variance



A friend of mine asked me this the other day, and I was stuck not knowing
the answer. I was slightly embarassed as I am in the senior year of a BS in
math.
In the formula for sample variance,
s^2 = sum(x  mean(x))^2/(n1)
why do you subtract one from the sample size?
Thanks,
Dustin 

Back to top 


matt271829news@yahoo.co. science forum Guru
Joined: 11 Sep 2005
Posts: 846

Posted: Sun Jun 11, 2006 10:53 am Post subject:
Re: Variance



standelds wrote:
Quote:  A friend of mine asked me this the other day, and I was stuck not knowing
the answer. I was slightly embarassed as I am in the senior year of a BS in
math.
In the formula for sample variance,
s^2 = sum(x  mean(x))^2/(n1)
why do you subtract one from the sample size?
Thanks,
Dustin

If you calculate the mean from the sample, and then use this sample
mean to calculate the sample variance, then the obvious calculation s^2
= sum(x  mean(x))^2/n will be biased, in the sense that if you took a
sample of size n very many times, calculated the variance in this way
for each sample, and then averaged the variances, the result would not
converge to the true variance of the population.
Intuitively, the reason is that the sample mean will tend to be biased
towards the "centre" of the sample values, so the deviations from the
sample mean will tend to be lower than from the true mean.
The "n  1 correction" adjusts for this underestimation, and gives an
unbiased estimate of the true population variance.
However, if in the variance calculation you use the true mean (as, say,
known from theory) rather than the sample mean, then the "n  1
correction" should *not* be used. 

Back to top 


Stan Brown science forum Guru Wannabe
Joined: 06 May 2005
Posts: 279

Posted: Sun Jun 11, 2006 2:14 pm Post subject:
Re: Variance



Sun, 11 Jun 2006 06:46:05 GMT from standelds
<standelds@hawaii.rr.com>:
Quote:  A friend of mine asked me this the other day, and I was stuck not knowing
the answer. I was slightly embarassed as I am in the senior year of a BS in
math.
In the formula for sample variance,
s^2 = sum(x  mean(x))^2/(n1)
why do you subtract one from the sample size?

Short answer: to adjust for the fact that this is a sample and not
the whole population.
Longer answer: http://www.childrensmercy.org/stats/ask/df.asp

Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://OakRoadSystems.com/ 

Back to top 


Virgil science forum Guru
Joined: 24 Mar 2005
Posts: 5536

Posted: Sun Jun 11, 2006 5:12 pm Post subject:
Re: Variance



In article <1150023213.497484.26170@y43g2000cwc.googlegroups.com>,
matt271829news@yahoo.co.uk wrote:
Quote:  standelds wrote:
A friend of mine asked me this the other day, and I was stuck not knowing
the answer. I was slightly embarassed as I am in the senior year of a BS in
math.
In the formula for sample variance,
s^2 = sum(x  mean(x))^2/(n1)
why do you subtract one from the sample size?
Thanks,
Dustin
If you calculate the mean from the sample, and then use this sample
mean to calculate the sample variance, then the obvious calculation s^2
= sum(x  mean(x))^2/n will be biased, in the sense that if you took a
sample of size n very many times, calculated the variance in this way
for each sample, and then averaged the variances, the result would not
converge to the true variance of the population.
Intuitively, the reason is that the sample mean will tend to be biased
towards the "centre" of the sample values, so the deviations from the
sample mean will tend to be lower than from the true mean.
The "n  1 correction" adjusts for this underestimation, and gives an
unbiased estimate of the true population variance.
However, if in the variance calculation you use the true mean (as, say,
known from theory) rather than the sample mean, then the "n  1
correction" should *not* be used.

Note also, that when using this correction for the standard deviation
instead of the variance, it introduces a slight bias of its own, which
is quite small and is usually ignored. 

Back to top 


Jasen Betts science forum Guru Wannabe
Joined: 31 Jul 2005
Posts: 176

Posted: Sun Jun 11, 2006 8:46 pm Post subject:
Re: Variance



On 20060611, standelds <standelds@hawaii.rr.com> wrote:
Quote:  A friend of mine asked me this the other day, and I was stuck not knowing
the answer. I was slightly embarassed as I am in the senior year of a BS in
math.
In the formula for sample variance,
s^2 = sum(x  mean(x))^2/(n1)
why do you subtract one from the sample size?

That's the formula for predicting the variance of a population given only a
sample of the population, and since every sample is somewhat biased the 1
compensates for that bias.
With a BS in math you may be able to proove that, I did only one year of stat
and recall that expllanation from the first semester.

Bye.
Jasen 

Back to top 


matt271829news@yahoo.co. science forum Guru
Joined: 11 Sep 2005
Posts: 846

Posted: Mon Jun 12, 2006 9:58 am Post subject:
Re: Variance



Jasen Betts wrote:
Quote:  On 20060611, standelds <standelds@hawaii.rr.com> wrote:
A friend of mine asked me this the other day, and I was stuck not knowing
the answer. I was slightly embarassed as I am in the senior year of a BS in
math.
In the formula for sample variance,
s^2 = sum(x  mean(x))^2/(n1)
why do you subtract one from the sample size?
That's the formula for predicting the variance of a population given only a
sample of the population, and since every sample is somewhat biased the 1
compensates for that bias.

It's not that the *sample* is (systematically) biased  we assume here
that it isn't. The point is that, even with an unbiased sampling
technique, the sample *variance* is systematically biased if you use
the sample mean to calculate it.
Quote: 
With a BS in math you may be able to proove that, I did only one year of stat
and recall that expllanation from the first semester.

Bye.
Jasen 


Back to top 


David C. Ullrich science forum Guru
Joined: 28 Apr 2005
Posts: 2250

Posted: Mon Jun 12, 2006 11:20 am Post subject:
Re: Variance



On Sun, 11 Jun 2006 06:46:05 GMT, "standelds"
<standelds@hawaii.rr.com> wrote:
Quote:  A friend of mine asked me this the other day, and I was stuck not knowing
the answer. I was slightly embarassed as I am in the senior year of a BS in
math.
In the formula for sample variance,
s^2 = sum(x  mean(x))^2/(n1)
why do you subtract one from the sample size?

The reason there's a "correction" is that the variance
in the sample is going to be less than the real variance.
(Suppose you just take one sample. Then there's no
variation at all in your sampled data. Does that mean
you want to estimate the actual variance to be 0?
No, taking just one sample gives no information at all
about the real variance.)
To see that the correction should be exactly what
it is you do the math:
Let's suppose to simplify things that the real mean is 0
and the real variance in the underlying distribution is 1.
Say we take N independent samples X_1, .. X_N. Now, we
don't know that the real mean is 0. Our best guess
for the real mean is
m = (X_1 + ... + X_N)/N.
So our estimate for the variance is going to have
_something_ to do with the expected value of
the sum of (X_j  m)^2. Let's see what that is.
First,
E(X_1  m)^2 = E((N1)X_1  X_2  X_3 ...  X_N)^2/N^2
= ((N1)^2 + 1 + 1 + ... + 1)/N^2
= ((N1)^2 + (N1))/N^2
= N(N1)/N^2 = (N1)/N.
The other terms are all the same, so we get
E(sum(X_jm)^2) = N1.
So if we define the sample variance to be
sv = sum(X_jm)^2/(N1)
then we get
E[sv] = 1.
So. Defining the sample variance in that funny way
makes the expected value of the sample variance
exactly equal to the real variance.
************************
David C. Ullrich 

Back to top 


Paul@Methuselah.fsnet.co. science forum beginner
Joined: 12 Jun 2006
Posts: 1

Posted: Mon Jun 12, 2006 11:11 pm Post subject:
Re: Variance



standelds wrote:
Quote:  A friend of mine asked me this the other day, and I was stuck not knowing
the answer. I was slightly embarassed as I am in the senior year of a BS in
math.
In the formula for sample variance,
s^2 = sum(x  mean(x))^2/(n1)
why do you subtract one from the sample size?
Thanks,
Dustin

If memory serves, that S^2 thing is termed "the sample unbiased
estimate of the population variance": if you use the bogstandard
sample variance, V, where V = {sum(x  mean(x))^2}/n, you find that the
expectation of V, E[V], comes out as E[V] = sigma^2 * (n  1) / n,
where sigma^2 is the *actual* population variance. (The proof isn't
too bad.) Hence V is *not* an unbiased estimator for the true
population variance (remember that in order for a sample statistic, T,
to be an unbiased estimator for some population parameter, theta, its
expectation must be equal to theta, i.e. E[T] = theta).
But, by using S^2 = V * n / (n  1) instead of plain old V, we get
E[S^2] = E[V] * n / (n  1) = sigma^2, which is just what the doctor
ordered.
And as for being embarrassed, don't worry  better to look a bit daft
now rather than at exam time. I remember my own chagrin at being the
only student in my quantum mechanics tutorial who didn't know what the
idea of valency was all about. My tutor, a kindly man, turned to me
with a beneficent look and began slowly, "well, you can think of it as
if atoms have little sticks poking out of them...". Oh dear.
Best wishes,
Paul. 

Back to top 


Rob Johnson science forum Guru
Joined: 26 May 2005
Posts: 318

Posted: Sat Jun 17, 2006 5:09 pm Post subject:
Re: Variance



In article <njiq82912brblgl6sgr80i5c120f6k4rit@4ax.com>,
David C. Ullrich <ullrich@math.okstate.edu> wrote:
Quote:  On Sun, 11 Jun 2006 06:46:05 GMT, "standelds"
standelds@hawaii.rr.com> wrote:
A friend of mine asked me this the other day, and I was stuck not knowing
the answer. I was slightly embarassed as I am in the senior year of a BS in
math.
In the formula for sample variance,
s^2 = sum(x  mean(x))^2/(n1)
why do you subtract one from the sample size?
The reason there's a "correction" is that the variance
in the sample is going to be less than the real variance.
(Suppose you just take one sample. Then there's no
variation at all in your sampled data. Does that mean
you want to estimate the actual variance to be 0?
No, taking just one sample gives no information at all
about the real variance.)
To see that the correction should be exactly what
it is you do the math:
Let's suppose to simplify things that the real mean is 0
and the real variance in the underlying distribution is 1.
Say we take N independent samples X_1, .. X_N. Now, we
don't know that the real mean is 0. Our best guess
for the real mean is
m = (X_1 + ... + X_N)/N.
So our estimate for the variance is going to have
_something_ to do with the expected value of
the sum of (X_j  m)^2. Let's see what that is.
First,
E(X_1  m)^2 = E((N1)X_1  X_2  X_3 ...  X_N)^2/N^2
= ((N1)^2 + 1 + 1 + ... + 1)/N^2
= ((N1)^2 + (N1))/N^2
= N(N1)/N^2 = (N1)/N.
The other terms are all the same, so we get
E(sum(X_jm)^2) = N1.
So if we define the sample variance to be
sv = sum(X_jm)^2/(N1)
then we get
E[sv] = 1.
So. Defining the sample variance in that funny way
makes the expected value of the sample variance
exactly equal to the real variance.

This was the only post in this thread that really did the math and
did not wave hands making reference to degrees of freedom.
Another way to look at this is to consider the sample mean, m_s, vs
the distribution mean, m_d. If the sample size is n, then the sum
of the samples is n m_s and the expected value of the sum of the
samples is n m_d.
Suppose the variance of the distribution is v_d. Since n m_s is just
the sum of n variates, we know that the variance of n m_s is n times
the variance of one variate; that is, n v_d.
Since, as we mentioned previously, the expected value of n m_s is
n m_d, we can state this as E((n m_s  n m_d)^2) = n v_d. Using the
linearity of the expected value, we get E((m_s  m_d)^2) = 1/n v_d.
Compute the expected sample variance, E(v_s), using the last equation
(as is done in <http://www.whim.org/nebula/math/varn1.html>), and we
get that E(v_s) = (n1)/n v_d. Same result, slightly different
approach.
Rob Johnson <rob@trash.whim.org>
take out the trash before replying
to view any ASCII art, display article in a monospaced font 

Back to top 


Google


Back to top 



The time now is Sat Sep 23, 2017 4:28 pm  All times are GMT

