Search   Memberlist   Usergroups
 Page 1 of 1 [9 Posts]
Author Message
standelds

Joined: 09 Sep 2005
Posts: 55

Posted: Sun Jun 11, 2006 6:46 am    Post subject: Variance

A friend of mine asked me this the other day, and I was stuck not knowing
the answer. I was slightly embarassed as I am in the senior year of a BS in
math.

In the formula for sample variance,

s^2 = sum(x - mean(x))^2/(n-1)

why do you subtract one from the sample size?

Thanks,
Dustin
matt271829-news@yahoo.co.
science forum Guru

Joined: 11 Sep 2005
Posts: 846

Posted: Sun Jun 11, 2006 10:53 am    Post subject: Re: Variance

standelds wrote:
 Quote: A friend of mine asked me this the other day, and I was stuck not knowing the answer. I was slightly embarassed as I am in the senior year of a BS in math. In the formula for sample variance, s^2 = sum(x - mean(x))^2/(n-1) why do you subtract one from the sample size? Thanks, Dustin

If you calculate the mean from the sample, and then use this sample
mean to calculate the sample variance, then the obvious calculation s^2
= sum(x - mean(x))^2/n will be biased, in the sense that if you took a
sample of size n very many times, calculated the variance in this way
for each sample, and then averaged the variances, the result would not
converge to the true variance of the population.

Intuitively, the reason is that the sample mean will tend to be biased
towards the "centre" of the sample values, so the deviations from the
sample mean will tend to be lower than from the true mean.

The "n - 1 correction" adjusts for this underestimation, and gives an
unbiased estimate of the true population variance.

However, if in the variance calculation you use the true mean (as, say,
known from theory) rather than the sample mean, then the "n - 1
correction" should *not* be used.
Stan Brown
science forum Guru Wannabe

Joined: 06 May 2005
Posts: 279

Posted: Sun Jun 11, 2006 2:14 pm    Post subject: Re: Variance

Sun, 11 Jun 2006 06:46:05 GMT from standelds
<standelds@hawaii.rr.com>:
 Quote: A friend of mine asked me this the other day, and I was stuck not knowing the answer. I was slightly embarassed as I am in the senior year of a BS in math. In the formula for sample variance, s^2 = sum(x - mean(x))^2/(n-1) why do you subtract one from the sample size?

Short answer: to adjust for the fact that this is a sample and not
the whole population.

--
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
Virgil
science forum Guru

Joined: 24 Mar 2005
Posts: 5536

Posted: Sun Jun 11, 2006 5:12 pm    Post subject: Re: Variance

matt271829-news@yahoo.co.uk wrote:

 Quote: standelds wrote: A friend of mine asked me this the other day, and I was stuck not knowing the answer. I was slightly embarassed as I am in the senior year of a BS in math. In the formula for sample variance, s^2 = sum(x - mean(x))^2/(n-1) why do you subtract one from the sample size? Thanks, Dustin If you calculate the mean from the sample, and then use this sample mean to calculate the sample variance, then the obvious calculation s^2 = sum(x - mean(x))^2/n will be biased, in the sense that if you took a sample of size n very many times, calculated the variance in this way for each sample, and then averaged the variances, the result would not converge to the true variance of the population. Intuitively, the reason is that the sample mean will tend to be biased towards the "centre" of the sample values, so the deviations from the sample mean will tend to be lower than from the true mean. The "n - 1 correction" adjusts for this underestimation, and gives an unbiased estimate of the true population variance. However, if in the variance calculation you use the true mean (as, say, known from theory) rather than the sample mean, then the "n - 1 correction" should *not* be used.

Note also, that when using this correction for the standard deviation
instead of the variance, it introduces a slight bias of its own, which
is quite small and is usually ignored.
Jasen Betts
science forum Guru Wannabe

Joined: 31 Jul 2005
Posts: 176

Posted: Sun Jun 11, 2006 8:46 pm    Post subject: Re: Variance

On 2006-06-11, standelds <standelds@hawaii.rr.com> wrote:
 Quote: A friend of mine asked me this the other day, and I was stuck not knowing the answer. I was slightly embarassed as I am in the senior year of a BS in math. In the formula for sample variance, s^2 = sum(x - mean(x))^2/(n-1) why do you subtract one from the sample size?

That's the formula for predicting the variance of a population given only a
sample of the population, and since every sample is somewhat biased the -1
compensates for that bias.

With a BS in math you may be able to proove that, I did only one year of stat
and recall that expllanation from the first semester.

--

Bye.
Jasen
matt271829-news@yahoo.co.
science forum Guru

Joined: 11 Sep 2005
Posts: 846

Posted: Mon Jun 12, 2006 9:58 am    Post subject: Re: Variance

Jasen Betts wrote:
 Quote: On 2006-06-11, standelds wrote: A friend of mine asked me this the other day, and I was stuck not knowing the answer. I was slightly embarassed as I am in the senior year of a BS in math. In the formula for sample variance, s^2 = sum(x - mean(x))^2/(n-1) why do you subtract one from the sample size? That's the formula for predicting the variance of a population given only a sample of the population, and since every sample is somewhat biased the -1 compensates for that bias.

It's not that the *sample* is (systematically) biased - we assume here
that it isn't. The point is that, even with an unbiased sampling
technique, the sample *variance* is systematically biased if you use
the sample mean to calculate it.

 Quote: With a BS in math you may be able to proove that, I did only one year of stat and recall that expllanation from the first semester. -- Bye. Jasen
David C. Ullrich
science forum Guru

Joined: 28 Apr 2005
Posts: 2250

Posted: Mon Jun 12, 2006 11:20 am    Post subject: Re: Variance

On Sun, 11 Jun 2006 06:46:05 GMT, "standelds"
<standelds@hawaii.rr.com> wrote:

 Quote: A friend of mine asked me this the other day, and I was stuck not knowing the answer. I was slightly embarassed as I am in the senior year of a BS in math. In the formula for sample variance, s^2 = sum(x - mean(x))^2/(n-1) why do you subtract one from the sample size?

The reason there's a "correction" is that the variance
in the sample is going to be less than the real variance.
(Suppose you just take one sample. Then there's no
variation at all in your sampled data. Does that mean
you want to estimate the actual variance to be 0?
No, taking just one sample gives no information at all

To see that the correction should be exactly what
it is you do the math:

Let's suppose to simplify things that the real mean is 0
and the real variance in the underlying distribution is 1.

Say we take N independent samples X_1, .. X_N. Now, we
don't know that the real mean is 0. Our best guess
for the real mean is

m = (X_1 + ... + X_N)/N.

So our estimate for the variance is going to have
_something_ to do with the expected value of
the sum of (X_j - m)^2. Let's see what that is.
First,

E(X_1 - m)^2 = E((N-1)X_1 - X_2 - X_3 ... - X_N)^2/N^2
= ((N-1)^2 + 1 + 1 + ... + 1)/N^2
= ((N-1)^2 + (N-1))/N^2
= N(N-1)/N^2 = (N-1)/N.

The other terms are all the same, so we get

E(sum(X_j-m)^2) = N-1.

So if we define the sample variance to be

sv = sum(X_j-m)^2/(N-1)

then we get

E[sv] = 1.

So. Defining the sample variance in that funny way
makes the expected value of the sample variance
exactly equal to the real variance.

 Quote: Thanks, Dustin

************************

David C. Ullrich
Paul@Methuselah.fsnet.co.
science forum beginner

Joined: 12 Jun 2006
Posts: 1

Posted: Mon Jun 12, 2006 11:11 pm    Post subject: Re: Variance

standelds wrote:

 Quote: A friend of mine asked me this the other day, and I was stuck not knowing the answer. I was slightly embarassed as I am in the senior year of a BS in math. In the formula for sample variance, s^2 = sum(x - mean(x))^2/(n-1) why do you subtract one from the sample size? Thanks, Dustin

If memory serves, that S^2 thing is termed "the sample unbiased
estimate of the population variance": if you use the bog-standard
sample variance, V, where V = {sum(x - mean(x))^2}/n, you find that the
expectation of V, E[V], comes out as E[V] = sigma^2 * (n - 1) / n,
where sigma^2 is the *actual* population variance. (The proof isn't
too bad.) Hence V is *not* an unbiased estimator for the true
population variance (remember that in order for a sample statistic, T,
to be an unbiased estimator for some population parameter, theta, its
expectation must be equal to theta, i.e. E[T] = theta).

But, by using S^2 = V * n / (n - 1) instead of plain old V, we get
E[S^2] = E[V] * n / (n - 1) = sigma^2, which is just what the doctor
ordered.

And as for being embarrassed, don't worry - better to look a bit daft
now rather than at exam time. I remember my own chagrin at being the
only student in my quantum mechanics tutorial who didn't know what the
idea of valency was all about. My tutor, a kindly man, turned to me
with a beneficent look and began slowly, "well, you can think of it as
if atoms have little sticks poking out of them...". Oh dear.

Best wishes,

Paul.
Rob Johnson
science forum Guru

Joined: 26 May 2005
Posts: 318

Posted: Sat Jun 17, 2006 5:09 pm    Post subject: Re: Variance

In article <njiq82912brblgl6sgr80i5c120f6k4rit@4ax.com>,
David C. Ullrich <ullrich@math.okstate.edu> wrote:
 Quote: On Sun, 11 Jun 2006 06:46:05 GMT, "standelds" standelds@hawaii.rr.com> wrote: A friend of mine asked me this the other day, and I was stuck not knowing the answer. I was slightly embarassed as I am in the senior year of a BS in math. In the formula for sample variance, s^2 = sum(x - mean(x))^2/(n-1) why do you subtract one from the sample size? The reason there's a "correction" is that the variance in the sample is going to be less than the real variance. (Suppose you just take one sample. Then there's no variation at all in your sampled data. Does that mean you want to estimate the actual variance to be 0? No, taking just one sample gives no information at all about the real variance.) To see that the correction should be exactly what it is you do the math: Let's suppose to simplify things that the real mean is 0 and the real variance in the underlying distribution is 1. Say we take N independent samples X_1, .. X_N. Now, we don't know that the real mean is 0. Our best guess for the real mean is m = (X_1 + ... + X_N)/N. So our estimate for the variance is going to have _something_ to do with the expected value of the sum of (X_j - m)^2. Let's see what that is. First, E(X_1 - m)^2 = E((N-1)X_1 - X_2 - X_3 ... - X_N)^2/N^2 = ((N-1)^2 + 1 + 1 + ... + 1)/N^2 = ((N-1)^2 + (N-1))/N^2 = N(N-1)/N^2 = (N-1)/N. The other terms are all the same, so we get E(sum(X_j-m)^2) = N-1. So if we define the sample variance to be sv = sum(X_j-m)^2/(N-1) then we get E[sv] = 1. So. Defining the sample variance in that funny way makes the expected value of the sample variance exactly equal to the real variance.

This was the only post in this thread that really did the math and
did not wave hands making reference to degrees of freedom.

Another way to look at this is to consider the sample mean, m_s, vs
the distribution mean, m_d. If the sample size is n, then the sum
of the samples is n m_s and the expected value of the sum of the
samples is n m_d.

Suppose the variance of the distribution is v_d. Since n m_s is just
the sum of n variates, we know that the variance of n m_s is n times
the variance of one variate; that is, n v_d.

Since, as we mentioned previously, the expected value of n m_s is
n m_d, we can state this as E((n m_s - n m_d)^2) = n v_d. Using the
linearity of the expected value, we get E((m_s - m_d)^2) = 1/n v_d.

Compute the expected sample variance, E(v_s), using the last equation
(as is done in <http://www.whim.org/nebula/math/varn-1.html>), and we
get that E(v_s) = (n-1)/n v_d. Same result, slightly different
approach.

Rob Johnson <rob@trash.whim.org>
take out the trash before replying
to view any ASCII art, display article in a monospaced font

 Display posts from previous: All Posts1 Day7 Days2 Weeks1 Month3 Months6 Months1 Year Oldest FirstNewest First
 Page 1 of 1 [9 Posts]
 The time now is Fri Sep 04, 2015 12:48 pm | All times are GMT
 Jump to: Select a forum-------------------Forum index|___Science and Technology    |___Math    |   |___Research    |   |___num-analysis    |   |___Symbolic    |   |___Combinatorics    |   |___Probability    |   |   |___Prediction    |   |       |   |___Undergraduate    |   |___Recreational    |       |___Physics    |   |___Research    |   |___New Theories    |   |___Acoustics    |   |___Electromagnetics    |   |___Strings    |   |___Particle    |   |___Fusion    |   |___Relativity    |       |___Chem    |   |___Analytical    |   |___Electrochem    |   |   |___Battery    |   |       |   |___Coatings    |       |___Engineering        |___Control        |___Mechanics        |___Chemical

 Topic Author Forum Replies Last Post Similar Topics point estimation of variance Eli Luong Math 1 Thu Jun 29, 2006 12:23 am Variance for a linear combination of three (normal) rando... Konrad Viltersten Math 8 Wed Jun 28, 2006 12:16 pm variance jib jib Math 0 Sun May 21, 2006 3:40 am Analysis of variance question. Binesh Bannerjee Math 0 Wed Apr 19, 2006 11:33 am covariance from mean and variance??? Christoph Math 3 Wed Mar 22, 2006 1:17 pm