The roots of *stereology*, which may be defined as the statistical inference of geometric parameters from sampled information, are found in the fields of statistics and geometrical probability. Whenever stereology is used to estimate a geometric parameter, it is important to have some idea of how *accurate* and *precise* the estimate is. *Bias* is a measure of the accuracy of an estimator. After an infinite number of trials, an unbiased estimator that exhibits finite variance will converge on the correct (true) result.

The archery targets shown in Figure 4.1 illustrate the concepts of bias and precision. The objective is to hit the centre of the target. The shots fired at the uppermost targets are unbiased. As the number of shots fired increases, their mean position tends towards the centre of the target. The shots fired at the left-hand targets are precise. Alternatively put, the variance of the set of numbers representing the distances between each shot and their mean position is small.

**Figure 4.1:** Illustration of bias and precision. The shots fired at the uppermost targets are unbiased, while those fired at the left-hand targets are precise.

Stereological estimators described in this chapter are unbiased. The estimation of precision is a more complex matter and is the subject of Section 4.10.

Stereology works by interrogating objects under investigation with geometric test probes such as points, straight lines and curved lines. There are two approaches to stereology - *design* and *model* based (Cruz-Orive, 1997; Karlsson and Cruz-Orive, 1992 section 2.2). The design-based approach requires that test probes hit an object (or finite, non-random population of objects) with a well-defined mechanism of randomness. The model-based approach is applicable when the target object (or structure) can be regarded as a realisation of a random set of particles which is homogeneous. For an object to be homogeneous, the mean contents of the object in some window must not depend on the position of the window in space. In this case, test probes may be arbitrarily located as the randomness required to estimate geometric parameters is “contained” within the target object.

As an example of these concepts, consider the problem of determining the area, \(A\), of the unit circle, \(C\). The area is, of course, \(\pi\). One method for tackling the problem is shown in Figure 4.2(a). \(C\) is broken into strips of width \(\mathrm{\delta}x\) and height \(L = 2 \sqrt{1 - x^2} \). The area of the circle is determined by the equation $$ \begin{align} A = 2 \int_{-1}^{1} \sqrt{1 - x^2} \mathrm{d}x = \pi. \tag{4.1} \end{align} $$

Suppose a grid of points is overlain on the circle, as shown in Figure 4.2(b). The total number of points falling within the circle multiplied by the area per point is an unbiased estimate of the area of \(A\). In effect, the systematic sampling of the circle by a grid of points is a numerical integration of (4.1). The design-based stereological approach, which is employed throughout this thesis, stipulates the grid of points must hit the object with a well-defined mechanism of randomness. This is achieved by randomly positioning the grid of points with respect to the stationary circle.

Refinement of the grid, by reducing the distance between points, or taking the average of repeated estimations are two ways of increasing precision. Both methods involve counting more points.

**Figure 4.2:** (a) The area of the unit circle can be determined by solving the appropriate integral.

**Figure 4.2:** (b) The area of the unit circle can be estimated by point counting. The dotted square shows the area associated with each point.

This chapter summarises some of the fundamental results regarding the stereology of single, bounded objects (Cruz-Orive, 1997). Much of the work is based on a book called "*Geometrical Probability*" by Kendall and Moran (1963). Section 4.1 describes random variables and this is followed by a discussion concerning the representation of orientations in the plane and in \(\mathbb{R}^3\). This groundwork is followed by the derivations of various stereological estimators.

Section 4.3 opens by considering the problems of area and volume estimation by point counting. An important result, named after the Italian mathematician Bonaventura Cavalieri, is arrived at. In Section 4.4, the classical two-dimensional (2D) problem of Buffon's needle (1777) is tackled. Section 4.5 applies the theory of Buffon's needle to the problem of estimating the unknown length of a bounded, rectifiable curve in the plane (Steinhaus, 1930). A curve is rectifiable if it has a well-defined length. Curves that are smooth (infinitely differentiable or \(C^{\infty}\)) are rectifiable. An example of a non-rectifiable curve is the coastline of Britain which exhibits some degree of self-similarity and is said to be fractal (Mandelbrot, 1967).

The extension of Buffon's needle to three dimensions (Section 4.6) motivates the problems of estimating the length of a bounded rectifiable curve in \(\mathbb{R}^3\) (Saltykov, 1946) (Section 4.7) and of estimating the area of a bounded piecewise smooth surface in three dimensional (3D) space (Saltykov, 1945) (Section 4.8). Section 4.9 describes the problem of estimating the length of a bounded rectifiable curve in \(\mathbb{R}^3\) from total vertical projections. Finally, the precision of various stereological estimators is investigated in Section 4.10.

## 4.1 Random variables

In this section probability theory and random variables are introduced. Key results that underpin further sections are given here. If more information is required, the book "*Probability and Random Processes*" by Grimmett & Stirzaker (1982) is a good place to start.

Suppose a needle is dropped onto a floor made up of planks of wood. The needle may or may not intersect one of the joints between the planks. A single throw of the needle is called an *experiment* or *trial*. There are two possible *outcomes*. Either the needle intersects a joint(s) or it lands between the joints. By repeating the experiment a large number of times, the probability \(\mathbf{P}\) of a particular *outcome* or *event* can be calculated. Let \(A\) be the event "needle intersects a joint". Let \(N\left(A\right)\) be the number of occurrences of \(A\) over \(n\) trials. As \(n \to \infty\), \(N\left(A\right) / n\) converges to the probability that \(A\) occurs, \(\mathbf{P}\left(A\right)\), on any particular trial. On occasions, a probability can be assigned to an outcome without experiment. Section 4.4 describes such an approach, attributable to Buffon (1777), to the needle throwing experiment. The set of all possible outcomes of an experiment is called the *sample space* and is denoted by \(\Omega\). A *random variable* is a function \(X : \Omega \to \mathbb{R}\).

Uppercase letters will be used to represent generic random variables, whilst lowercase letters will be used to represent possible numerical values of these variables. To describe the probability of possible values of \(X\), consider the following definition. The *distribution function* of a random variable \(X\) is the function \(F_X : \mathbb{R} \to \left[0, 1\right]\) given by \(F_X\left(x\right) = \mathbf{P}\left(X \leq x\right)\).

### 4.1.1 Discrete random variables

The random variable \(X\) is *discrete* if it takes values in some countable subset \(\left\{x_1, x_2, ...\right\}\), only, of \(\mathbb{R}\). The distribution function of such a random variable has jump discontinuities at the values \(x_1, x_2, ...\) and is constant in between. The function \(f_X : \mathbb{R} \to \left[0, 1\right]\) given by \(f_X\left(x\right) = \mathbf{P}\left(X = x\right)\) is called the *(probability) mass function* of \(X\). The *mean value*, or *expectation*, or *expected value* of \(X\) with mass function \(f_X\), is defined to be $$ \begin{align} \mathbf{E}\left(X\right) &= \sum\limits_{x} x f_X\left(x\right) \\ &= \sum\limits_{x} x \mathbf{P}\left(X = x\right). \tag{4.2} \end{align} $$ The expected value of \(X\) is often written as \(\mu\).

It is often of great interest to measure the extent to which a random variable \(X\) is dispersed. The *variance* of \(X\) or \(\mathrm{Var}\left(X\right)\) is defined as follows: $$ \mathrm{Var}\left(X\right) = \mathbf{E}\left(\left(X - \mathbf{E}\left(X\right)\right)^2\right). \tag{4.3} $$ The variance of \(X\) is often written as \(\sigma^2\), while its positive square root is called the *standard deviation*. Since \(X\) is discrete, (4.3) can be re-expressed accordingly: $$ \begin{align} \sigma^2 &= \mathrm{Var}\left(X\right) \\ &= \mathbf{E}\left(\left(X - \mu\right)^2\right) \\ &= \sum\limits_{x}\left(x - \mu\right)^2 f_X\left(x\right). \tag{4.4} \end{align} $$ In the special case where the mass function \(f_X\left(x\right)\) is constant and \(X\) takes \(n\) real values, (4.4) reduces to a well known equation determining the variance of a set of \(n\) numbers: $$ \sigma^2 = \frac{1}{n} \sum\limits_{X}\left(x - \mu\right)^2. \tag{4.5} $$

Events \(A\) and \(B\) are said to be *independent* if and only if the incidence of \(A\) does not change the probability of \(B\) occurring. An equivalent statement is \(\mathbf{P}\left(A \cap B\right) = \mathbf{P}\left(A\right)\mathbf{P}\left(B\right)\). Similarly, the discrete random variables \(X\) and \(Y\) are called *independent* if the numerical value of \(X\) does not affect the distribution of \(Y\). In other words, the events \(\left\{X = x\right\}\) and \(\left\{Y = y\right\}\) are independent for all \(x\) and \(y\). The *joint distribution function* \(F_{X, Y} : \mathbb{R}^2 \to \left[0, 1\right]\) of \(X\) and \(Y\) is given by \(F_{X, Y}\left(x, y\right) = \mathbf{P}\left(X \leq x \text{ and } Y \leq y\right)\). Their *joint mass function* \(f_{X, Y} : \mathbb{R}^2 \to \left[0, 1\right]\) is given by \(f_{X, Y}\left(x, y\right) = \mathbf{P}\left(X = x \text{ and } Y = y\right)\). \(X\) and \(Y\) are independent if and only if \(f_{X, Y}\left(x, y\right) = f_X\left(x\right)f_Y\left(y\right)\) for all \(x, y \in \mathbb{R}\).

Consider an archer, shooting arrows at the target shown in Figure 4.3. Suppose the archer is a very poor shot and hits the target randomly - in other words, target regions of equal area will have the same probability of being hit. For simplicity, it is assumed the archer always hits the target. If the archer is allowed to fire two arrows, the sample space $$ \Omega = \left\{ \begin{array}{ccccc} AA, & AB, & AC, & AD, & AE, \\ BA, & BB, & \text{...} & DD, & DE,\\ EA, & EB, & EC, & ED, & EE \end{array} \right\}. $$

**Figure 4.3:** An archery target. A hit in region A scores 4 points, B scores 3 points, C scores 2 points, D scores 1 point and E scores nothing.

Let the variable \(X\left(\omega\right)\) represent the score of a particular outcome. The scoring guidelines outlined in Figure 4.3 imply $$ \begin{array}{rcl} X\left(AA\right) & = & 8, \\ X\left(AB\right) = X\left(BA\right) & = & 7, \\ X\left(AC\right) = X\left(BB\right) = X\left(CA\right) & = & 6, \\ \text{...} & & \\ X\left(CE\right) = X\left(DD\right) = X\left(EC\right) & = & 2, \\ X\left(DE\right) = X\left(ED\right) & = & 1, \\ X\left(EE\right) & = & 0. \end{array} $$ Clearly \(X\) is a discrete random variable, mapping the sample space \(\Omega\) to scores (real numbers).

The probability that an arrow hits a target region is directly proportional to the area of the region. The regions A to E are annuli with inner and outer radii as shown in Figure 4.3. The probabilities of hitting A to E are 1/25, 3/25, 5/25, 7/25 and 9/25 respectively. The mass function of \(X\), \(f_X\left(x\right)\), is then $$ \begin{array}{rcl} f_X\left(0\right) & = & \mathbf{P}\left(X = 0\right) \\ & = & \mathbf{P}\left(\text{Hit E}\right)\mathbf{P}\left(\text{Hit E}\right) \\ & = & 81 / 625, \\ f_X\left(1\right) & = & \mathbf{P}\left(X = 1\right) \\ & = & 2 \cdot \mathbf{P}\left(\text{Hit D}\right)\mathbf{P}\left(\text{Hit E}\right) \\ & = & 126 / 625, \\ f_X\left(2\right) & = & \mathbf{P}\left(X = 2\right) \\ & = & 2 \cdot \mathbf{P}\left(\text{Hit C}\right)\mathbf{P}\left(\text{Hit E}\right) + \\ & & \mathbf{P}\left(\text{Hit D}\right)\mathbf{P}\left(\text{Hit D}\right) \\ & = & 139 / 625, \\ & & \text{...} \end{array} $$ From (4.2), the expected value of \(X\) is $$ \begin{array}{rcl} \mathbf{E}\left(X\right) & = & 0 \cdot 81 / 625 + \\ & & 1 \cdot 126 / 625 + \\ & & 2 \cdot 139 / 625 + \\ & & 3 \cdot 124 / 625 + \\ & & \text{...} \\ & = & 2.4. \end{array} $$ From (4.4), the variance of \(X\) is $$ \begin{array}{rcl} \mathrm{Var}\left(X\right) & = & 2.4^2 \cdot 81/625 + \\ & & 1.4^2 \cdot 126/625 + \\ & & 0.4^2 \cdot 139/625 + \\ & & 0.6^2 \cdot 124/625 + \\ & & \text{...} \\ & = & 2.72. \end{array} $$ The distribution function of \(X\), \(F_X\left(x\right)\), is then $$ \begin{array}{rcccl} F_X\left(0\right) & = & \mathbf{P}\left(X \leq 0\right) & = & f_X\left(0\right), \\ F_X\left(1\right) & = & \mathbf{P}\left(X \leq 1\right) & = & f_X\left(1\right) + f_X\left(0\right), \\ F_X\left(2\right) & = & \mathbf{P}\left(X \leq 2\right) & = & f_X\left(2\right) + f_X\left(1\right) + \\ & & & & f_X\left(0\right), \\ & & & & \text{...} \end{array} $$ The distribution function \(F_X\left(x\right)\) is shown in Figure 4.4.

**Figure 4.4:** The distribution function \(F_X\) of \(X\) for the archery target.

### 4.1.2 Continuous random variables

The random variable \(X\) is *continuous* if its distribution function can be expressed as $$ \begin{align} F_X\left(x\right) &= \mathbf{P}\left(X \leq x\right) \\ &= \int_{-\infty}^{x} f_X\left(u\right) \mathrm{d}u, x \in \mathbb{R}, \tag{4.6} \end{align} $$ for some integrable function \(f_X : \mathbb{R} \to \left[0, \infty\right)\). In this case, \(f_X\) is called the *(probability) density function* of \(X\). The fundamental theorem of calculus and (4.6) imply $$ \begin{align} \mathbf{P}\left(a \leq X \leq b\right) &= F_X\left(b\right) - F_X\left(a\right) \\ &= \int_a^b f_X\left(x\right) \mathrm{d}x. \end{align} $$ \(f\left(x\right)\delta x\) can be thought of as the element of probability \(\mathbf{P}\left(x \leq X \leq x + \delta x\right)\) where $$ \begin{align} \mathbf{P}\left(x \leq X \leq x + \delta x\right) &= F_X\left(x + \delta x\right) - F_X\left(x\right) \\ &\approx f_X\left(x\right) \delta x. \tag{4.7} \end{align} $$ If \(B_1\) is a measurable subset of \(\mathbb{R}\) (such as a line segment or union of line segments) then $$ \mathbf{P}\left(x \in B_1\right) = \int_{B_1} f_X\left(X\right) \mathrm{d}x \tag{4.8} $$ where \(\mathbf{P}\left(X \in B_1\right)\) is the probability that the outcome of this random choice lies in \(B_1\). The expected value (or *expectation*) of \(X\) with density function \(f_X\) is $$ \mu = \mathbf{E}X = \int_{-\infty}^{\infty} x f_X\left(x\right) \mathrm{d}x \tag{4.9} $$ whenever this integral exists. The *variance* of \(X\) or \(\mathrm{Var}\left(X\right)\) is defined by the already familiar (4.3). Since \(X\) is continuous, (4.3) can be re-expressed accordingly: $$ \begin{align} \sigma^2 &= \mathrm{Var}\left(X\right) \\ &= \mathbf{E}\left(\left(X - \mu\right)^2\right) \\ &= \int_{-\infty}^{\infty}\left(x - \mu\right)^2 f_X\left(x\right) \mathrm{d}x. \tag{4.10} \end{align} $$

The *joint distribution function* of the continuous random variables \(X\) and \(Y\) is the function \(F_{X, Y} : \mathbb{R}^2 \to \left[0, 1\right]\) given by \(F_{X, Y}\left(x, y\right) = \mathbf{P}\left(X \leq x, Y \leq y\right)\). \(X\) and \(Y\) are *(jointly) continuous* with *joint (probability) density function* \(f_{X, Y} : \mathbb{R}^2 \to \left[0, \infty\right)\) if $$ F_{X, Y}\left(x, y\right) = \int_{-\infty}^y \int_{-\infty}^x f_{X, Y}\left(u, v\right) \mathrm{d}u \mathrm{d}v $$ for each \(x, y \in \mathbb{R}\). The fundamental theorem of calculus suggests the following result $$ \mathbf{P}\left(a \leq X \leq b, c \leq Y \leq d\right) $$ $$ = F_{X, Y}\left(b, d\right) - F_{X, Y}\left(a, d\right) $$ $$ - F_{X, Y}\left(b, c\right) + F_{X, Y}\left(a, c\right) $$ $$ = \int_{c}^{d} \int_{d}^{b} f_{X, Y}\left(x, y\right) \mathrm{d}x \mathrm{d}y. $$ \(f_{X, Y}\left(x, y\right) \delta x \delta y\) can be thought of as the element of probability \(\mathbf{P}\left(x \leq X \leq x + \delta x, y \leq Y \leq y + \delta y\right)\) where $$ \mathbf{P}\left(x \leq X \leq x + \delta x, y \leq Y \leq y + \delta y\right) $$ $$ = F_{X, Y}\left(x + \delta x, y + \delta y\right) - F_{X, Y}\left(x, y + \delta y\right) $$ $$ - F_{X, Y}\left(x + \delta x, y\right) + F_{X, Y}\left(x, y\right) $$ $$ = f_{X, Y}\left(x, y\right) \delta x \delta y. \tag{4.11} $$ If \(B_2\) is a measurable subset of \(\mathbb{R}^2\) (such as a rectangle or union of rectangles and so on) then $$ \mathbf{P}\left(\left(X, Y\right) \in B_2\right) = \int \int_{B_2} f_{X, Y}\left(x, y\right) \mathrm{d}x \mathrm{d}y \tag{4.12} $$ where \(\mathbf{P}\left(\left(X, Y\right) \in B_2\right)\) is the probability that the outcome of this random choice lies in \(B_2\). \(X\) and \(Y\) are independent if and only if \(\left\{X \leq x\right\}\) and \(\left\{Y \leq y\right\}\) are independent events for all \(x, y \in \mathbb{R}\). If \(X\) and \(Y\) are independent, \(F_{X, Y}\left(x, y\right) = F_X\left(x\right) F_Y\left(y\right)\) for all \(x, y \in \mathbb{R}\). An equivalent condition is \(f_{X, Y}\left(x, y\right) = f_X\left(x\right)f_Y\left(y\right)\) whenever \(F_{X, Y}\) is differentiable at \(\left(x, y\right)\).

An example of a continuous random variable can be found in the needle throwing described earlier. A needle is thrown onto the floor and lands with random angle \(\omega\) relative to some fixed axis. The sample space \(\Omega = \left[0, 2\pi\right)\). The angle \(\Omega\) is equally likely in the real interval \(\left[0, 2\pi\right)\). Therefore, the probability that the angle lies in some interval is directly proportional to the length of the interval. Consider the continuous random variable \(X\left(\omega\right) = \omega\). The distribution function of \(X\), shown graphically in Figure 4.5, is $$ \begin{array}{rclcl} F_X\left(0\right) & = & \mathbf{P}\left(X \leq 0\right) & = & 0, \\ F_X\left(x\right) & = & \mathbf{P}\left(X \leq x\right) & = & x / 2\pi \\ & & & & \left(0 \leq x \lt 2\pi\right), \\ F_X\left(2\pi\right) & = & \mathbf{P}\left(X \leq 2\pi\right) & = & 1. \\ \end{array} $$ The density function, \(f_X\), of \(F_X\) is as follows: $$ F_X\left(x\right) = \int_{-\infty}^{x} f_X\left(u\right) \mathrm{d}u $$ where $$ f_X\left(u\right) = \begin{cases} 1/2\pi & \text{ if $0 \leq u \leq 2\pi$ } \\ 0 & \text{ otherwise } \end{cases} $$

**Figure 4.5:** The distribution function \(F_X\) of \(X\) for the needle.

### 4.1.3 Further properties of random variables

In this section, fundamental results regarding the expectation and variance of random variables (discrete or continuous) are stated and proved. For some constant \(c \in \mathbb{R} \) and random variable \(X\), $$ \mathbf{E}\left(cX\right) = c\mathbf{E}\left(X\right). \tag{4.13} $$ The proof of (4.13) is trivial. From (4.2), for \(X\) discrete $$ \begin{align} \mathbf{E}\left(cX\right) &= \sum_x cx \cdot f_X\left(x\right) \\ &= c \sum_x x \cdot f_X\left(x\right) \\ &= c \mathbf{E}\left(X\right), \end{align} $$ while from (4.9), for \(X\) continuous $$ \begin{align} \mathbf{E}\left(cX\right) &= \int_{-\infty}^{\infty} cx \cdot f_X\left(x\right) \mathrm{d}x \\ &= c \int_{-\infty}^{\infty} x \cdot f_X\left(x\right) \mathrm{d}x \\ &= c \mathbf{E}\left(X\right). \end{align} $$ $$\tag*{$\blacksquare$}$$

For discrete or continuous random variables \(X\) and \(Y\), $$ \mathbf{E}\left(X + Y\right) = \mathbf{E}\left(X\right) + \mathbf{E}\left(Y\right). \tag{4.14} $$ The proof of (4.14) is as follows. Suppose \(X\) and \(Y\) have joint mass function \(f_{X,Y} : \mathbb{R}^2 \to \left[0, 1\right]\) given by \(f_{X,Y}\left(x, y\right) = \mathbf{P}\left(X = x \text{ and } Y = y\right)\). Then, for \(X\) and \(Y\) discrete, an extension of (4.2) gives $$ \begin{array}{rcl} \mathbf{E}\left(X + Y\right) & = & \sum_x \sum_y \left(x + y\right) \cdot f_{X, Y}\left(x, y\right) \\ & = & \sum_x \sum_y x \cdot f_{X, Y}\left(x, y\right) + \\ & & \sum_x \sum_y y \cdot f_{X, Y}\left(x, y\right) \\ & = & \sum_x x \sum_y f_{X, Y}\left(x, y\right) + \\ & & \sum_y y \sum_x f_{X, Y}\left(x, y\right) \\ & = & \sum_x x \cdot f_X\left(x\right) + \sum_y y \cdot f_Y\left(y\right) \\ & = & \mathbf{E}X + \mathbf{E}Y. \end{array} $$ Noting (4.9), for \(X\) and \(Y\) continuous the proof begins $$ \mathbf{E}\left(X + Y\right) = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \left(x + y\right) \cdot f_{X, Y}\left(x, y\right) \mathrm{d}x \mathrm{d}y $$ where \(f_{X, Y}\left(x, y\right) : \mathbb{R}^2 \to \left[0, \infty\right)\) is the joint density function of \(X\) and \(Y\). The proof then proceeds in a similar way to the discrete case with summations replaced by integrations. $$\tag*{$\blacksquare$}$$

For discrete or continuous independent random variables \(X\) and \(Y\), $$ \mathbf{E}\left(XY\right) = \mathbf{E}\left(X\right) \cdot \mathbf{E}\left(Y\right). \tag{4.15} $$ The proof of (4.15) is first presented for discrete random variables \(X\) and \(Y\). Let \(X\) and \(Y\) have joint mass function \(f_{X,Y}\left(x, y\right) : \mathbb{R}^2 \to \left[0, 1\right]\) given by \(f_{X,Y} = \mathbf{P}\left(X = x \text{ and } Y = y\right)\). If \(X\) and \(Y\) are independent, then (by definition) the probability of \(Y\) occurring is not affected by the occurrence or non-occurrence of \(X\). For \(X\) and \(Y\) independent, $$ \begin{align} \mathbf{P}\left(X = x \text{ and } Y = y\right) &= \mathbf{P}\left(\left(X = x\right) \cap \left(Y = y\right)\right) \\ &= \mathbf{P}\left(X = x\right) \cdot \mathbf{P}\left(Y = y\right), \end{align} $$ so that \(f_{X,Y}\left(x, y\right) = f_X\left(x\right) \cdot f_Y\left(y\right)\). Therefore, $$ \begin{align} \mathbf{E}\left(XY\right) &= \sum_x \sum_y xy \cdot f_{X, Y}\left(x, y\right) \\ &= \sum_x \sum_y xy \cdot f_X\left(x\right) f_Y\left(y\right) \\ &= \sum_x \left\{ x \cdot f_X\left(x\right) \cdot \sum_y y \cdot f_Y\left(y\right) \right\} \\ &= \sum_x \left\{ x \cdot f_X\left(x\right) \cdot \mathbf{E}Y\right\} \\ &= \mathbf{E}X \cdot \mathbf{E}Y. \end{align} $$ For \(X\) and \(Y\) continuous, the proof begins $$ \mathbf{E} = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} xy \cdot f_{X, Y}\left(x, y\right) \mathrm{d}x \mathrm{d}y $$ where \(f_{X, Y} : \mathbb{R}^2 \to \left[0, \infty\right)\) is the joint density function of \(X\) and \(Y\). The proof then proceeds in a similar way to the discrete case with summations replaced by integrations. $$\tag*{$\blacksquare$}$$

For the discrete or continuous random variable \(X\), $$ \begin{align} \sigma^2 &= \mathbf{E}\left(\left(X - \mathbf{E}\left(X\right)\right)^2\right) \\ &= \mathbf{E}\left(X^2\right) - \left(\mathbf{E}\left(X\right)\right)^2. \tag{4.16} \end{align} $$ The proof of (4.16) holds for \(X\) discrete or continuous. As a shorthand notation, let \(\mu = \mathbf{E}\left(X\right)\). Then, $$ \begin{array}{rcl} \mathbf{E}\left(\left(X - \mu\right)^2\right) & = & \mathbf{E}\left(X^2 - 2 \cdot \mu \cdot X + \mu^2\right) \\ & = & \mathbf{E}\left(X^2\right) - 2 \cdot \mu \cdot \mathbf{E}\left(X\right) + \mu^2 \\ & & \text {from (1) and (2)} \\ & = & \mathbf{E}\left(X^2\right) - \mu^2 \\ & = & \mathbf{E}\left(X^2\right) - \left(\mathbf{E}\left(X\right)\right)^2. \end{array} $$ $$\tag*{$\blacksquare$}$$

For the discrete or continuous random variable \(X\) and the constant \(c \in \mathbb{R} \), $$ \mathrm{Var}\left(cX\right) = c^2 \cdot \mathrm{Var}\left(X\right). \tag{4.17} $$ The proof of (4.17) holds for \(X\) discrete or continuous. Again, let \(\mu = \mathbf{E}\left(X\right)\). Then, $$ \begin{align} \mathrm{Var}\left(cX\right) &= \mathbf{E}\left(\left(cX - c\mu\right)^2\right) \\ &= \mathbf{E}\left(c^2 \cdot \left(X - \mu\right)^2\right) \\ &= c^2 \cdot \mathbf{E}\left(\left(X - \mu\right)^2\right) \\ &= c^2 \cdot \mathrm{Var}\left(X\right). \end{align} $$ $$\tag*{$\blacksquare$}$$

Finally, for discrete or continuous independent random variables \(X\) and \(Y\), $$ \mathrm{Var}\left(X + Y\right) = \mathrm{Var}\left(X\right) + \mathrm{Var}\left(Y\right). \tag{4.18} $$ The proof of (4.18) holds for \(X\) and \(Y\) discrete or continuous. As a shorthand notation, let \(\mu_X = \mathbf{E}\left(X\right)\) and \(\mu_Y = \mathbf{E}\left(Y\right)\). Then, $$ \mathrm{Var}\left(X + Y\right) = \mathbf{E}\left\{\left(\left(X + Y\right) - \left(\mu_X + \mu_Y\right)\right)^2\right\} $$ $$ \begin{array}{rcl} & = & \mathbf{E}\left\{\left(\left(X - \mu_X\right) + \left(Y - \mu_Y\right)\right)^2\right\} \\ & = & \mathbf{E}\left\{\begin{array}{c}\left(X - \mu_X\right)^2 + \\ 2\left(X - \mu_X\right) \cdot \left(Y - \mu_Y\right) + \\ \left(Y - \mu_Y\right)^2\end{array}\right\} \\ & = & \mathbf{E}\left(\left(X - \mu_X\right)^2\right) + \\ & & 2\mathbf{E}\left(\left(X - \mu_X\right) \cdot \left(Y - \mu_Y\right)\right) + \\ & & \mathbf{E}\left(\left(Y - \mu_Y\right)^2\right) \\ & = & \mathrm{Var}\left(X\right) + \\ & & 2\mathbf{E}\left(\left(X - \mu_X\right) \cdot \left(Y - \mu_Y\right)\right) + \\ & & \mathrm{Var}\left(Y\right). \end{array} $$ However, from (4.15), if \(X\) and \(Y\) are independent random variables, then $$ \mathbf{E}\left(\left(X - \mu_X\right) \cdot \left(Y - \mu_Y\right)\right) $$ $$ = \mathbf{E}\left(X - \mu_X\right) \cdot \mathbf{E}\left(Y - \mu_Y\right) = 0. $$ $$\tag*{$\blacksquare$}$$

### 4.1.4 Sampling theory

As an introduction to sampling theory, consider the problem of estimating the average IQ of students attending the University of Liverpool. To test the entire group, or *population*, of students would take too long. Instead, it is decided that tests should be handed out to a *sample* of the student population. From the sample, results regarding the population can be *statistically inferred*. The reliability of the survey depends on whether the sample is properly chosen.

IQ scores range between 0 and 200. The set of all possible scores can be represented by the sample space \(\Omega = \left\{ 0, 1, 2, ..., 200\right\}\). Let the variable \(X\left(\omega\right) = \omega\) represent a particular outcome after completing a test. Clearly \(X\) is a discrete random variable. An alphabetic roll call of students is used to select a *systematic *sample. The list is first split into groups of \(k\) students (where \(k\) is an integer greater than 1). If \(k\) is not a multiple of the population size, \(N\), the last group has a smaller size than \(k\). Next a random integer, \(r\), between 0 and \(k\) – 1 is chosen. Students are included in the sample if their position in the roll call is \(r\) modulo \(k\).

Let the size of the sample be \(n\). On receipt of \(n\) tests, each student in the sample is assigned a score, \(x_i\), in the range 0 to 200 where \(x_i\) is the value of a random variable \(X_i\). The *sample mean* is a random variable defined by $$ \bar{X} = \frac{1}{n} \sum_{i=1}^n X_i $$ whose value is $$ \bar{x} = \frac{1}{n} \sum_{i=1}^n x_i. $$ \(X_1\), ..., \(X_n\) are independent random variables whose distribution functions are the same as the population which has mean \(\mu\) and variance \(\sigma^2\). The expected value of the sample mean, \(\mathbf{E}\left(\bar{X}\right)\), is the population mean, \(\mu\), because $$ \mathbf{E}\left(\bar{X}\right) = \frac{1}{n}\left(\sum_{i=1}^n \mathbf{E}\left(X_i\right)\right) = \frac{1}{n}\left(n\mu\right) = \mu. $$ Furthermore, \(X_1\), ..., \(X_n\) have variance \(\sigma^2\) and so the variance of \(\bar{X}\), \(\mathrm{Var}\left(\bar{X}\right)\), is $$ \begin{align} \mathbf{E}\left(\left(\bar{X} - \mu\right)^2\right) &= \mathrm{Var}\left(\frac{1}{n}\sum_{i=1}^n X_i\right) \\ &= \frac{1}{n^2}\sum_{i=1}^n \mathrm{Var}\left(X_i\right) \\ &= \frac{1}{n^2} n \sigma^2 \\ &= \frac{\sigma^2}{n}. \tag{4.19} \end{align} $$ As the sample size increases the variation or scatter of the sample means tends to zero.

From (4.5) the random variable, \(S^2\), giving the *sample variance* is $$ S^2 = \frac{1}{n} \sum_{i=1}^n \left(X_i - \bar{X}\right)^2. $$

It turns out that sample variance, \(S^2\), is not an unbiased estimator of population variance, \(\sigma^2\). \(S^2\) underestimates \(\sigma^2\) by a factor of \((n – 1)/n\) so that $$ \mathbf{E}\left(S^2\right) = \frac{n-1}{n} \sigma^2. \tag{4.20} $$ The proof of (4.20) is as follows. Consider the term \(X_i - \bar{X} = \left(X_i - \mu\right) - \left(\bar{X} - \mu\right)\). Then, \(\left(X_i - \bar{X}\right)^2 = \left(X_i - \mu\right)^2 - 2\left(X_i - \mu\right)\left(\bar{X} - \mu\right) + \left(\bar{X} - \mu\right)^2\) and so $$ \begin{align} \sum_{i=1}^n \left(X_i - \bar{X}\right)^2 &= \sum_{i=1}^n\left(X_i - \mu\right)^2 - \\ & 2 \left(\bar{X} - \mu\right) \sum_{i=1}^n\left(X_i - \mu\right) + \\ & \sum_{i=1}^n\left(\bar{X} - \mu\right)^2 \\ &= \sum_{i=1}^n\left(X_i - \mu\right)^2 - \\ & 2n\left(\bar{X} - \mu\right)^2 + \\ & n\left(\bar{X} - \mu\right)^2. \tag{4.21} \end{align} $$ Equation (4.19) together with the expectation of (4.21) give $$ \begin{align} \mathbf{E}\left(\sum_{i=1}^n\left(X_i - \bar{X}\right)^2\right) &= \mathbf{E}\left(\sum_{i=1}^n\left(X_i - \mu\right)^2\right) - \\ & n\mathbf{E}\left(\left(\bar{X} - \mu\right)^2\right) \\ &= n\sigma^2 - n\left(\frac{\sigma^2}{n}\right) \\ &= (n - 1)\sigma^2 \end{align} $$ so that $$ \mathbf{E}\left(S^2\right) = \frac{n-1}{n}\sigma^2 $$ and $$ \sigma^2 = \frac{n}{n-1}\mathbf{E}\left(S^2\right). \tag{4.22} $$ Equation (4.22) is an important result that is relevant to the experimental work of Chapters 7 and 8. It states that the population variance, \(\sigma^2\), is equal to the expected sample variance, \(\mathbf{E}\left(S^2\right)\), multiplied by \(n/(n - 1)\).

## 4.2 Random position and orientation

An orientation in the plane can be represented by points on the boundary of a circle. An orientation is simply an angle \(\mathit{\Phi} \in \left[0, 2\pi\right)\). The orientation is said to be *isotropic random* (IR) if and only if the angle \(\mathit{\Phi}\) is equally likely, i.e. *uniform random* (UR), in the interval \(\left[0, 2\pi\right)\) (written \(\mathit{\Phi} \in \mathrm{UR}\left[0, 2\pi\right)\)). An equivalent statement is that the orientation, \(\mathit{\Phi}\), is IR if and only if the *density function* of \(\mathit{\Phi}\) is constant.

To determine the probability density function of \(\mathit{\Phi}\), consider an arc on the unit circle given by \(\left\{ \mathit{\Phi} : \phi \leq \mathit{\Phi} \leq \phi + \delta\phi \right\}\). The arc, of length \(\delta\phi\), is shown in Figure 4.6(a). The probability of an orientation falling inside the arc is $$ \mathbf{P}\left(\phi \leq \mathit{\Phi} \leq \phi + \delta\phi\right) $$ $$ = \frac{\text{boundary length of arc}}{\text{boundary length of circle}} $$ $$ = \frac{\delta\phi}{2\pi} \tag{4.23} $$ Equations (4.7) and (4.23) imply the density function is \(1/2\pi\). Furthermore, the function is constant.

The idea can be extended to three dimensions. Orientations in \(\mathbb{R}^3\) can be represented by points on the surface of the unit sphere. Figure 4.6(b) shows how points on the unit sphere can be obtained by choosing \(\mathit{\Phi} \in \left[0, 2\pi\right)\) and \(\mathit{\Theta} \in \left[0, \pi\right)\). To determine an IR orientation, choose a point uniform randomly on the surface of the sphere (so that regions of equal area will have the same probability to contain the point). It does not suffice to choose \(\mathit{\Phi}\) and \(\mathit{\Theta}\) uniform randomly in the intervals \(\left[0, 2\pi\right)\) and \(\left[0, \pi\right)\) respectively. Figure 4.7(a) shows 10,000 orientations chosen this way. The orientations cluster about the north and south poles.

**Figure 4.6:** (a) An orientation in the plane (represented by the angle \(\phi\)). (b) An orientation in \(\mathbb{R}^3\) (represented by the angles \(\phi\) and \(\theta\)).

Consider a surface patch on the unit sphere given by \(\left\{\left(\mathit{\Phi}, \mathit{\Theta}\right) : \phi \leq \mathit{\Phi} \leq \phi + \delta\phi, \theta \leq \mathit{\Theta} \leq \theta + \delta\theta\right\}\). For \(\delta\phi\) and \(\delta\theta\) small, the patch approximates a rectangle with dimensions \(\sin\left(\theta\right)\delta\phi\) by \(\delta\theta\). Suppose \(\mathit{\Phi}\) and \(\mathit{\Theta}\) are chosen uniform randomly in the intervals \(\left[0, 2\pi\right)\) and \(\left[0, \pi\right)\) respectively. The probability of the chosen orientation falling inside the patch is $$ \mathbf{P}\left(\phi \leq \mathit{\Phi} \leq \phi + \delta\phi, \theta \leq \mathit{\Theta} \leq \theta + \delta\theta\right) $$ $$ = \frac{\text{surface area of patch}}{\text{surface area of sphere}} $$ $$ = \frac{\sin\left(\theta\right)\delta\phi\delta\theta}{4\pi} \tag{4.24} $$ Equations (4.11) and (4.24) imply the joint density function of \(\mathit{\Phi}\) and \(\mathit{\Theta}\) is \(\sin\left(\theta\right) / 4\pi\). To pick IR orientations the density function must remain constant. Clearly \(\mathbf{P}\left(\delta\theta\right) = \sin\left(\theta\right)\delta\theta\). Consider the *sine-weighting* or change of variable \(\theta = \arccos\left(1 - 2u\right)\), where \(u\) is uniform random in the interval [0, 1). Differentiating, $$ \mathrm{d}\theta = - \frac{1}{\sqrt{1 - u^2}} \mathrm{d}u $$ and \(\sin\left(\theta\right)\) becomes \(\sin\left(\arccos\left(1 - 2u\right)\right) = \sqrt{1 - u^2}\). Now \(\mathbf{P}\left(\delta\theta\right) = \sin\left(\theta\right)\delta\theta = -\delta u\). Figure 4.7(b) shows 10,000 IR orientations.

An object (such as a straight line or curve) in some domain \(D \subset \mathbb{R}^2\) is said to be *Isotropic Uniform Random* (IUR), if it is IR and has UR position within \(D\). The latter condition holds if a single identifiable point \(P\) on the object has UR position within \(D\). This is equivalent to asking that the coordinates of \(P\) be chosen so that regions of equal area have the same probability of containing \(P\). Similarly an object such as a plane in some domain \(D \subset \mathbb{R}^3\) is said to be IUR if it is IR and has UR position within \(D\).

**Figure 4.7:** 10,000 orientations chosen (a) non-isotropically and (b) isotropically.