FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups 
 ProfileProfile   PreferencesPreferences   Log in to check your private messagesLog in to check your private messages   Log inLog in 
Forum index » Science and Technology » Math » Undergraduate
Variance
Post new topic   Reply to topic Page 1 of 1 [9 Posts] View previous topic :: View next topic
Author Message
standelds
science forum addict


Joined: 09 Sep 2005
Posts: 55

PostPosted: Sun Jun 11, 2006 6:46 am    Post subject: Variance Reply with quote

A friend of mine asked me this the other day, and I was stuck not knowing
the answer. I was slightly embarassed as I am in the senior year of a BS in
math.

In the formula for sample variance,

s^2 = sum(x - mean(x))^2/(n-1)

why do you subtract one from the sample size?

Thanks,
Dustin
Back to top
matt271829-news@yahoo.co.
science forum Guru


Joined: 11 Sep 2005
Posts: 846

PostPosted: Sun Jun 11, 2006 10:53 am    Post subject: Re: Variance Reply with quote

standelds wrote:
Quote:
A friend of mine asked me this the other day, and I was stuck not knowing
the answer. I was slightly embarassed as I am in the senior year of a BS in
math.

In the formula for sample variance,

s^2 = sum(x - mean(x))^2/(n-1)

why do you subtract one from the sample size?

Thanks,
Dustin

If you calculate the mean from the sample, and then use this sample
mean to calculate the sample variance, then the obvious calculation s^2
= sum(x - mean(x))^2/n will be biased, in the sense that if you took a
sample of size n very many times, calculated the variance in this way
for each sample, and then averaged the variances, the result would not
converge to the true variance of the population.

Intuitively, the reason is that the sample mean will tend to be biased
towards the "centre" of the sample values, so the deviations from the
sample mean will tend to be lower than from the true mean.

The "n - 1 correction" adjusts for this underestimation, and gives an
unbiased estimate of the true population variance.

However, if in the variance calculation you use the true mean (as, say,
known from theory) rather than the sample mean, then the "n - 1
correction" should *not* be used.
Back to top
Stan Brown
science forum Guru Wannabe


Joined: 06 May 2005
Posts: 279

PostPosted: Sun Jun 11, 2006 2:14 pm    Post subject: Re: Variance Reply with quote

Sun, 11 Jun 2006 06:46:05 GMT from standelds
<standelds@hawaii.rr.com>:
Quote:
A friend of mine asked me this the other day, and I was stuck not knowing
the answer. I was slightly embarassed as I am in the senior year of a BS in
math.

In the formula for sample variance,

s^2 = sum(x - mean(x))^2/(n-1)

why do you subtract one from the sample size?

Short answer: to adjust for the fact that this is a sample and not
the whole population.

Longer answer: http://www.childrens-mercy.org/stats/ask/df.asp

--
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://OakRoadSystems.com/
Back to top
Virgil
science forum Guru


Joined: 24 Mar 2005
Posts: 5536

PostPosted: Sun Jun 11, 2006 5:12 pm    Post subject: Re: Variance Reply with quote

In article <1150023213.497484.26170@y43g2000cwc.googlegroups.com>,
matt271829-news@yahoo.co.uk wrote:

Quote:
standelds wrote:
A friend of mine asked me this the other day, and I was stuck not knowing
the answer. I was slightly embarassed as I am in the senior year of a BS in
math.

In the formula for sample variance,

s^2 = sum(x - mean(x))^2/(n-1)

why do you subtract one from the sample size?

Thanks,
Dustin

If you calculate the mean from the sample, and then use this sample
mean to calculate the sample variance, then the obvious calculation s^2
= sum(x - mean(x))^2/n will be biased, in the sense that if you took a
sample of size n very many times, calculated the variance in this way
for each sample, and then averaged the variances, the result would not
converge to the true variance of the population.

Intuitively, the reason is that the sample mean will tend to be biased
towards the "centre" of the sample values, so the deviations from the
sample mean will tend to be lower than from the true mean.

The "n - 1 correction" adjusts for this underestimation, and gives an
unbiased estimate of the true population variance.

However, if in the variance calculation you use the true mean (as, say,
known from theory) rather than the sample mean, then the "n - 1
correction" should *not* be used.

Note also, that when using this correction for the standard deviation
instead of the variance, it introduces a slight bias of its own, which
is quite small and is usually ignored.
Back to top
Jasen Betts
science forum Guru Wannabe


Joined: 31 Jul 2005
Posts: 176

PostPosted: Sun Jun 11, 2006 8:46 pm    Post subject: Re: Variance Reply with quote

On 2006-06-11, standelds <standelds@hawaii.rr.com> wrote:
Quote:
A friend of mine asked me this the other day, and I was stuck not knowing
the answer. I was slightly embarassed as I am in the senior year of a BS in
math.

In the formula for sample variance,

s^2 = sum(x - mean(x))^2/(n-1)

why do you subtract one from the sample size?


That's the formula for predicting the variance of a population given only a
sample of the population, and since every sample is somewhat biased the -1
compensates for that bias.

With a BS in math you may be able to proove that, I did only one year of stat
and recall that expllanation from the first semester.

--

Bye.
Jasen
Back to top
matt271829-news@yahoo.co.
science forum Guru


Joined: 11 Sep 2005
Posts: 846

PostPosted: Mon Jun 12, 2006 9:58 am    Post subject: Re: Variance Reply with quote

Jasen Betts wrote:
Quote:
On 2006-06-11, standelds <standelds@hawaii.rr.com> wrote:
A friend of mine asked me this the other day, and I was stuck not knowing
the answer. I was slightly embarassed as I am in the senior year of a BS in
math.

In the formula for sample variance,

s^2 = sum(x - mean(x))^2/(n-1)

why do you subtract one from the sample size?


That's the formula for predicting the variance of a population given only a
sample of the population, and since every sample is somewhat biased the -1
compensates for that bias.

It's not that the *sample* is (systematically) biased - we assume here
that it isn't. The point is that, even with an unbiased sampling
technique, the sample *variance* is systematically biased if you use
the sample mean to calculate it.

Quote:

With a BS in math you may be able to proove that, I did only one year of stat
and recall that expllanation from the first semester.

--

Bye.
Jasen
Back to top
David C. Ullrich
science forum Guru


Joined: 28 Apr 2005
Posts: 2250

PostPosted: Mon Jun 12, 2006 11:20 am    Post subject: Re: Variance Reply with quote

On Sun, 11 Jun 2006 06:46:05 GMT, "standelds"
<standelds@hawaii.rr.com> wrote:

Quote:
A friend of mine asked me this the other day, and I was stuck not knowing
the answer. I was slightly embarassed as I am in the senior year of a BS in
math.

In the formula for sample variance,

s^2 = sum(x - mean(x))^2/(n-1)

why do you subtract one from the sample size?

The reason there's a "correction" is that the variance
in the sample is going to be less than the real variance.
(Suppose you just take one sample. Then there's no
variation at all in your sampled data. Does that mean
you want to estimate the actual variance to be 0?
No, taking just one sample gives no information at all
about the real variance.)

To see that the correction should be exactly what
it is you do the math:

Let's suppose to simplify things that the real mean is 0
and the real variance in the underlying distribution is 1.

Say we take N independent samples X_1, .. X_N. Now, we
don't know that the real mean is 0. Our best guess
for the real mean is

m = (X_1 + ... + X_N)/N.

So our estimate for the variance is going to have
_something_ to do with the expected value of
the sum of (X_j - m)^2. Let's see what that is.
First,

E(X_1 - m)^2 = E((N-1)X_1 - X_2 - X_3 ... - X_N)^2/N^2
= ((N-1)^2 + 1 + 1 + ... + 1)/N^2
= ((N-1)^2 + (N-1))/N^2
= N(N-1)/N^2 = (N-1)/N.

The other terms are all the same, so we get

E(sum(X_j-m)^2) = N-1.

So if we define the sample variance to be

sv = sum(X_j-m)^2/(N-1)

then we get

E[sv] = 1.

So. Defining the sample variance in that funny way
makes the expected value of the sample variance
exactly equal to the real variance.

Quote:
Thanks,
Dustin



************************

David C. Ullrich
Back to top
Paul@Methuselah.fsnet.co.
science forum beginner


Joined: 12 Jun 2006
Posts: 1

PostPosted: Mon Jun 12, 2006 11:11 pm    Post subject: Re: Variance Reply with quote

standelds wrote:

Quote:
A friend of mine asked me this the other day, and I was stuck not knowing
the answer. I was slightly embarassed as I am in the senior year of a BS in
math.

In the formula for sample variance,

s^2 = sum(x - mean(x))^2/(n-1)

why do you subtract one from the sample size?

Thanks,
Dustin

If memory serves, that S^2 thing is termed "the sample unbiased
estimate of the population variance": if you use the bog-standard
sample variance, V, where V = {sum(x - mean(x))^2}/n, you find that the
expectation of V, E[V], comes out as E[V] = sigma^2 * (n - 1) / n,
where sigma^2 is the *actual* population variance. (The proof isn't
too bad.) Hence V is *not* an unbiased estimator for the true
population variance (remember that in order for a sample statistic, T,
to be an unbiased estimator for some population parameter, theta, its
expectation must be equal to theta, i.e. E[T] = theta).

But, by using S^2 = V * n / (n - 1) instead of plain old V, we get
E[S^2] = E[V] * n / (n - 1) = sigma^2, which is just what the doctor
ordered.

And as for being embarrassed, don't worry - better to look a bit daft
now rather than at exam time. I remember my own chagrin at being the
only student in my quantum mechanics tutorial who didn't know what the
idea of valency was all about. My tutor, a kindly man, turned to me
with a beneficent look and began slowly, "well, you can think of it as
if atoms have little sticks poking out of them...". Oh dear.

Best wishes,


Paul.
Back to top
Rob Johnson
science forum Guru


Joined: 26 May 2005
Posts: 318

PostPosted: Sat Jun 17, 2006 5:09 pm    Post subject: Re: Variance Reply with quote

In article <njiq82912brblgl6sgr80i5c120f6k4rit@4ax.com>,
David C. Ullrich <ullrich@math.okstate.edu> wrote:
Quote:
On Sun, 11 Jun 2006 06:46:05 GMT, "standelds"
standelds@hawaii.rr.com> wrote:

A friend of mine asked me this the other day, and I was stuck not knowing
the answer. I was slightly embarassed as I am in the senior year of a BS in
math.

In the formula for sample variance,

s^2 = sum(x - mean(x))^2/(n-1)

why do you subtract one from the sample size?

The reason there's a "correction" is that the variance
in the sample is going to be less than the real variance.
(Suppose you just take one sample. Then there's no
variation at all in your sampled data. Does that mean
you want to estimate the actual variance to be 0?
No, taking just one sample gives no information at all
about the real variance.)

To see that the correction should be exactly what
it is you do the math:

Let's suppose to simplify things that the real mean is 0
and the real variance in the underlying distribution is 1.

Say we take N independent samples X_1, .. X_N. Now, we
don't know that the real mean is 0. Our best guess
for the real mean is

m = (X_1 + ... + X_N)/N.

So our estimate for the variance is going to have
_something_ to do with the expected value of
the sum of (X_j - m)^2. Let's see what that is.
First,

E(X_1 - m)^2 = E((N-1)X_1 - X_2 - X_3 ... - X_N)^2/N^2
= ((N-1)^2 + 1 + 1 + ... + 1)/N^2
= ((N-1)^2 + (N-1))/N^2
= N(N-1)/N^2 = (N-1)/N.

The other terms are all the same, so we get

E(sum(X_j-m)^2) = N-1.

So if we define the sample variance to be

sv = sum(X_j-m)^2/(N-1)

then we get

E[sv] = 1.

So. Defining the sample variance in that funny way
makes the expected value of the sample variance
exactly equal to the real variance.

This was the only post in this thread that really did the math and
did not wave hands making reference to degrees of freedom.

Another way to look at this is to consider the sample mean, m_s, vs
the distribution mean, m_d. If the sample size is n, then the sum
of the samples is n m_s and the expected value of the sum of the
samples is n m_d.

Suppose the variance of the distribution is v_d. Since n m_s is just
the sum of n variates, we know that the variance of n m_s is n times
the variance of one variate; that is, n v_d.

Since, as we mentioned previously, the expected value of n m_s is
n m_d, we can state this as E((n m_s - n m_d)^2) = n v_d. Using the
linearity of the expected value, we get E((m_s - m_d)^2) = 1/n v_d.

Compute the expected sample variance, E(v_s), using the last equation
(as is done in <http://www.whim.org/nebula/math/varn-1.html>), and we
get that E(v_s) = (n-1)/n v_d. Same result, slightly different
approach.

Rob Johnson <rob@trash.whim.org>
take out the trash before replying
to view any ASCII art, display article in a monospaced font
Back to top
Google

Back to top
Display posts from previous:   
Post new topic   Reply to topic Page 1 of 1 [9 Posts] View previous topic :: View next topic
The time now is Wed Oct 19, 2011 8:21 pm | All times are GMT
Forum index » Science and Technology » Math » Undergraduate
Jump to:  

Similar Topics
Topic Author Forum Replies Last Post
No new posts point estimation of variance Eli Luong Math 1 Thu Jun 29, 2006 12:23 am
No new posts Variance for a linear combination of three (normal) rando... Konrad Viltersten Math 8 Wed Jun 28, 2006 12:16 pm
No new posts variance jib jib Math 0 Sun May 21, 2006 3:40 am
No new posts Analysis of variance question. Binesh Bannerjee Math 0 Wed Apr 19, 2006 11:33 am
No new posts covariance from mean and variance??? Christoph Math 3 Wed Mar 22, 2006 1:17 pm

Copyright © 2004-2005 DeniX Solutions SRL
Other DeniX Solutions sites: Electronics forum |  Medicine forum |  Unix/Linux blog |  Unix/Linux documentation |  Unix/Linux forums  |  send newsletters
 


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.1999s ][ Queries: 16 (0.1439s) ][ GZIP on - Debug on ]