Statistics and series


Here is a nice application of series which appeared in my Calculus II course.

In class we defined a probability density function (pdf) {p(x)} to be a function {p:\mathbb R\rightarrow\mathbb R} such that

  • {p(x) \geq 0} for all {x}, and
  • {\int_{-\infty}^\infty p(x)dx =1}.

We then discussed how to describe a pdf using two important quantities:

  • the mean {\mu}, which is a measure of the center and is given by

    \displaystyle  \mu = \int_{-\infty}^\infty x\, p(x)dx,

    and

  • the standard deviation {\sigma}, which is a measure of how far from {\mu} typical measurements tend to be (or how “spread out” the pdf is) and is given by

    \displaystyle  \sigma^2 = \int_{-\infty}^\infty( x-\mu)^2\, p(x)dx.

In all of this we are assuming that any real number is a possible value for {x} (though we can restrict this to intervals by defining {p(x)=0} for {x} otherwise).

Discrete distributions

Suppose that we are measuring some variable, but the only possible values for the variable are integers. We want to repeat what we did earlier, defining the mean and standard deviation of a probability distribution.

Let’s call the possible measurement values {k} and let {p_k} be the probability of measuring a value of {k}. We require two properties of {p_k}:

  • We must have {p_k\geq 0} for each {k} and
  • the total probability must add up to one:

    \displaystyle \sum_k p_k = 1.

As before we define the mean {\mu} to be the sum of each possible value, multiplied by the probability of that value:

\displaystyle \mu = \sum_k k\, p_k

and the standard deviation {\sigma} as the “distance from being all average”:

\displaystyle \sigma^2 = \sum_k (k-\mu)^2p_k.

These formulas are particular interesting when {p_k\neq 0} for an infinite number of integers {k}, as we need to deal with convergence issues.

Example

Suppose that we have {k=1,2,3,\dots} and that the probabilities of measuring each value is given by

\displaystyle p_1=\frac{1}{2},\quad p_2 = \frac{1}{4}, \quad p_3 = \frac{1}{8}, \quad \text{etc.}

We know from geometric series that the total probability is {1} because {p_k = \frac{1}{2^k}} and thus

\displaystyle  \sum_{k=1}^\infty p_k = \sum_{k=1}^\infty \frac{1}{2^k} = 1.

We can therefore try to compute the mean for this probability distribution, which would be given by

\displaystyle  \mu = \sum_{k=1}^\infty k \,p_k = \sum_{k=1}^\infty \frac{k}{2^k}.

It is easy to see (do it!) that this series converges, but what does it converge to?

In fact, it is possible to compute the mean exactly. To do this, we first write

\displaystyle \mu = \frac12 \sum_{k=1}^\infty k \left(\frac{1}{2}\right)^{k-1}.

We then define function {f(x)} by

\displaystyle f(x) = \sum_{k=1}^\infty k x^k

so that {\mu = f(\frac{1}{2})}.

One can check that {f(x)} converges absolutely for {|x|<1}. Furthermore,

\displaystyle  f(x) = x\sum_{k=1}^\infty k x^{k-1} = x\frac{d}{dx} \sum_{k=0}^\infty x^k = x \frac{d}{dx}\left[ \frac{1}{1-x}\right] = \frac{x}{(1-x)^2}

Therefore {\mu = f(\frac{1}{2}) = 2}.

We now consider the standard deviation, given by

\displaystyle  \sigma^2 = \sum_{k=1}^\infty \frac{(k-2)^2 }{2^k} = \sum_{k=1}^\infty (k^2 - 4k + 4) \left( \frac{1}{2}\right)^k.

We can split this in to three sums, the second and third of which can be computed using what we know already:

\displaystyle  \sum_{k=1}^\infty (-4k)\left( \frac{1}{2}\right)^k = -4\sum_{k=1}^\infty k\left( \frac{1}{2}\right)^k = -4\mu = -8

and

\displaystyle  \sum_{k=1}^\infty (4) \left( \frac{1}{2}\right)^k = 4\sum_{k=1}^\infty \left( \frac{1}{2}\right)^k = 4.

(In fact, there is a general phenomena going on here — see below.)

For the first sum in the expression for {\sigma^2}, we repeat the trick from before: Let

\displaystyle  g(x) = \sum_{k=1}^\infty k^2 x^k = x\sum_{k=1}^\infty k x^{k-1} = x\frac{d}{dx} \sum_{k=1}^\infty k x^k = x f^\prime(x).

Thus we conclude that

\displaystyle  g(x) = x\frac{1+x}{(1-x)^3}

and hence

\displaystyle  \sigma^2 = g(\tfrac{1}{2}) - 8 + 4 = 2.

Exercises

  1. Suppose {p(x)} is the pdf given for {x\geq 0} by

    \displaystyle  p(x) = e^{-x}.

    Find {\mu} and {\sigma}.

  2. Let {p_k} be any discrete probability distribution. Use the fact that {\sum_k p_k =1} to deduce that

    \displaystyle \sigma^2 = \sum_{k} k^2 p_k -2\mu+ \mu^2.

    What is the integration version of this identity?

  3. Suppose that {p_k} is given, for {k=1,2,3,\dots}, by

    \displaystyle p_k = \frac{2}{3^k}.

    Find the mean {\mu} and the deviation {\sigma} using the methods of the example.

  4. Let {p_k} be given by

    \displaystyle  p_k = \frac{2^k}{e^2\, k!}

    for {k=0,1,2,3,\dots}. Explain why {p_k} is a valid probability distribution, then find the mean {\mu} and deviation {\sigma}.

Advertisements
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a comment here

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s