ABC Statistics
All of mathematics index and
home page The ABC Study Guide, University education in plain English alphabetically indexed. Click here to go to the main index the ABC Study Guide home page home page to all of Andrew
Roberts' web site

ABC Statistics


Statistics is the study of large
numbers, such as those produced by government departments, with the aim of extracting some approximate truth from them.

There are descriptive statistics and inferential statistics

Statistics: Origin and Meaning see timeline


One of the origins of statistics origins is state arithmetic

See moral statistics
History of statistics: Roman census - 1086 - Probability - John Graunt - 1710 - 1801 British census - the begining of crime statistics - Benjamin Gompertz - British Association - 1833: Statistical Societies formed - Quetelet's average person - 1837: births, deaths and marriages - objective statistics - 1841 British census - suicide statistics - 1857: Henry Thomas Buckle - Alexander von Oettingen - 1878?: cycles and sunspots - Enrico Morselli on suicide statistics - Charles Booth - 1913: Class - 1922 Fisher: Mathematical Foundations - 1927: Carr-Saunders and Caradog Jones social structure statistics - Wartime Social Survey - 1960s: computers and statistics - Statistical Package for the Social Sciences - 1970: Office of Population Censuses and Surveys - 1976: Radical statistics - 1992: UK Cochrane Centre - 1996: Office for National Statistics


Descriptive
Statistics

Descriptive statistics allow us to describe groups of many numbers. One way to do this is by reducing them to a few numbers that are typical of the groups, or describe their characteristics. The average is one kind of descriptive statistic. Measures of spread are another kind.

Grouping numbers into frequency distributions and drawing charts to illustrate frequency distributions, are other examples of descriptive statistics.

Inferential
Statistics , Statistical Tests or Inductive Statistics

Inferential statistics do not just describe numbers, they infer causes. We use them to draw inferences (informed guesses) about situations where we have only gathered part of the information that exists. The part of the information is called a sample. The whole body of information from which it is taken is called the population. In a basic statistical test we would have two samples and would try to establish if they are significantly different.

Statistical inference is the use of samples to reach conclusions about the population from which those samples have been drawn.

For example, twenty people who have not read this passage would be one sample. The same twenty people after they have read the passage is another sample. The population from which both samples are drawn is all the people who could (in theory) read this passage. If we found that,

  1. before reading this passage, many of the twenty people could explain what inferential statistics is.

  2. after reading this passage, none of the twenty people could explain what inferential statistics is.
we might reasonably conclude (infer, guess) that reading the passage caused the deterioration in their understanding of inferential statistics.


Statistical analysis refers to doing something useful with data (letting its
meaning free). The two main ways that data is analysed are:

  1. by bringing out the information it contains. This is part of descriptive statistics

  2. testing or retesting a hypothesis. Hypothesis testing is part of inferential statistics.
Advice about Statistical Analysis
The different meanings of analysis

Primary analysis is the original analysis of data in a research study.

Secondary analysis is the re-analyis of data. This could be to answer new questions or to use improved statistical techniques,

meta-analyis is the statistical analysis of a large collection of analysis results to integrate the findings.



Statistical
Data

Raw Data
The raw data of statistics is groups of numbers that the statistician intends to work on. They are the numbers before they have been organised and processed.

For example:
6, 6, 7, 9, 3, 4, 4, 7, 2, 3, 8, 5, 6, 6, 7, 1, 2, 3, 1, 7, 3, 4, 8, 5, 5, 8, 2, 9, 5, 4, 4, 5, 6, 5

Array
In mathematics, an array is an arrangement of numbers or symbols in rows and columns. In statistics it is a group of numbers in rows and columns with the smallest at the beginning and the rest in order of size up to the largest at the end.

For example:

1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 9, 9



Frequency Distribution
A frequency distribution is an arrangement of a group of numbers in a pattern that shows how frequently each occurs. This is done by grouping (classifying) the numbers and arranging them in a table according to size:

For example:

1, 1
2, 2, 2
3, 3, 3, 3
4, 4, 4, 4, 4
5, 5, 5, 5, 5, 5
6, 6, 6, 6, 6
7, 7, 7, 7
8, 8, 8
9, 9



Statistically Significant
Statistical significance is a way of estimating the likelihood that a difference between two
samples indicates a real difference between the populations from which the samples are taken, rather than being due to chance. If a result is very unlikely to have arisen by chance we say it is statistically significant.

If a result would only arrive by chance once in a hundred times, we would call it significant. If only once in 1,000 times, it would be highly significant. If once in twenty times, only probably significant

Statistical significance is express in terms of probability. We say, for example, that:
  • "This result would arise by chance one in twenty times" ( p=0.05)
See Statistical Experiments



Average: Averages are ways of describing what is typical, normal or usual.

People come in lots of sizes, but the average adult
is not 8 feet tall or 3 foot small.

How can we measure what the average is? The way that most of us know for making an average is technically called:

The arithmetic mean. To work out one of these you:

First
add a group of numbers together
Then divide by the number of numbers in the group.

For example:

The (mean) average of 3, and 8 is the total (11)
divided by the number of numbers (2),
giving 5.5 as the average (arithmetic mean).

The (mean) average of 3, 4, 5, 5, 6, 6, 6, 7 and 8 is the total (50)
divided by the number of numbers (9),
giving 5.55 as the average (arithmetic mean).

Another kind of average is:

The median

Instead of adding the group of numbers together, we arrange them in order of size from the smallest to the largest. The median is the middle number.

For example:

The median of
3, 4, 5, 5, 6, 6, 6, 7, 8
is 6

If the number of numbers is even, there will be two middle numbers.
In these cases, the median is the number half-way between the two middle numbers.

For example:

The median of 3 and 8 is 5.5

The median of 4, 5, 5, 5, 6, 6, 6, 7 is also 5.5

The median of 1, 2, 3, 4, 5, 6, 7, 8 is 4.5

Another kind of
average is:

The mode

The mode (or modal number) is the number that occurs most frequently.

For example:

The mode of
3, 4, 5, 5, 6, 6, 6, 7, 8
is 6

If two or more numbers occur the same number of times, and more often than the other numbers, there is more than one mode.

For example:

The modes of
3, 4, 5, 5, 5, 5.5, 6, 6, 6, 7, 8
are 5 and 6
  • If there is only one mode, the group of numbers is unimodal.
  • If there is more than one mode the group of numbers is multimodal.

    If each number occurs the same number of time, the group of numbers has no mode. There is no mode in this group: 3, 3, 4, 4, 5, 5.



    Census

    A Census is an official counting of people (or sometimes things) in a state or other political area. It is one of the oldest, and most important, forms of descriptive statistics.

    The birth of Christ is said to have taken place in Bethlehem because of a Roman census. The regular ten year census of all the British population began in 1800



    Combinations and Permutations

    A Combination is the selection of a number of items from a larger number in which the order of selection does not matter.

    For example, if we have five items, how many ways can we select three from the five if we are not bothered about what order the items appear in?

    We will call the five items A,B,C,D,E

    We can now set them out to show that there are ten ways in which they can be combined in groups of three:

    ABC ABD ABE
    ACD ACE
    ADE
    BCD BCE
    BDE
    CDE

    A Permutation is the selection of a number of items from a larger number when the order the items appears in does matter.

    For example, ABC, ACB, BAC, BCA, CAB and CBA are all the same combination, because they contain the same three items, but they are six different permutations, because the items are in a different order.


    Control Group

    A scientific control group is a group that is matched with an experimental group on everything believed relevant - except the thing to be tested.

    William Farr used the patients in Gloucester County Asylum as his control group when trying to find out if the death rate in London madhouses was abnormally high.

    This illustrates one of the reasons for a control group. Farr could not could not say if there was anything abnormal about a high death rate in London madhouses unless he could show it lower in another asylum.

    If you were trying to show that criminals come from broken-homes, you would need a control group of non-criminals to compare. If 20% of criminals came from broken homes and 25% of non-criminals, you could not argue (on the basis of those statistics) that broken homes are a cause of criminal behaviour.

    See Statistical Experiments


    Correlation

    If two things correlate, one changes in relation to changes in the other. The change can be positive or negative. For example, the height of a pile of coins increases as extra coins are added (positive correlation). On the other hand, pain may decrease with an increase in pain killer (negative correlation)

    Correlation is a measure of the association between the two things that change (the variables). The closer to 1 means a stronger positive correlation. The closer to -1 means a stronger negative correlation



    Population
    In
    statistics, population does not necessarily mean people. It is a technical term for the whole group of people or objects the results apply to. In a statistical survey of London pigeons, the population is all the pigeons in London. As there are a lot of pigeons in London, a statistician would take a sample rather than inspecting every one.



    Probability
    Probability is the chance that something will happen
    See 1622 - 1710 - 1824 - 1835 -

    If you toss a coin into the air it can land on one of two sides
    The probability that it will land on a particular side is 50%, or one chance in two.

    Probability is measured on a scale going from 0 for an absolute impossibility to 1 for an absolute certainty.

    I never buy lottery tickets so:

  • The probability of my winning the lottery appears to be 0
  • The probability of saving the lottery ticket price is 1

    No possibility is written p = 0
    One chance in 1,000 is written p = 0.001
    One chance in 100 is written p = 0.01
    One chance in 20 is written p = 0.05
    One chance in 10 is written p = 0.1
    One chance in 2 is written p = 0.5
    7 chances in 10 is written p = 0.7
    99 chances in 10 is written p = 0.99
    Absolute certainty is written p = 1


    Sample and example

    Example and sample originally meant more or less the same. A sample was a particular incident or story that was supposed to support a general statement. An example was a typical fact that illustrated a general statement. The difficulty of making a fair "sample" in this sense are obvious. We would not (I hope) accept "all men are nasty - take Hitler" as conclusive proof.

    The difficulties of selecting one example to illustrate the others are illustrated by the use of fair sample and excellent sample in the speeches of Gordon and Ashley. Gordon took the best to illustrate how bad things were, Ashley took one of the worst to illustrate how bad things were. Statistical sampling tries to overcome the bias inherent in taking one example.

    Sample in statistics: A statistical sample is a selection of part of a population chosen to provide information about the whole population.

    See Wikipedia article on sampling in statistics

    A random sample is one where any item in the population was as likely as any other to be in the sample, and where the selection of one item did not affect the selection of any other. In everyday life people try to make random samples by putting tickets in a hat, shaking them up, and then asking someone to pick a few out without looking into the hat. Random samples are made by blind chance.

    See Statistical Experiments

    An opportunity sample is one where the sample is selected because it is convenient for the researcher. Interviewees, for example, might be selected because they are easily accessible, willing to participate and available at a convenient time. An example would be a student researcher using fellow students to interview.



    Spread or Dispersion.

    Spread, or dispersion, refers to how far out our figures go. A chunk of butter on a slice of bread is hardly spread at all. The same chunk can be spread out thickly or thinly. The larger area it covers, he more its spread. It is the same with figures.

    The first array of figures is thinly spread:

    1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 9, 9

    The second array of figures is thickly spread:

    4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6

    Nevertheless, the averages of the figures are the same: The arithmetical mean is 170, the median and the mode are both 5.


    Time series

    Some of the statistics with the oldest history are figures year by year (or some other period) showing how something has changed. These are time series. Examples are balance of trade series - crime series and suicides series. The time series you will see by clicking the above links are composed of absolute figures, but time series may also be composed of ratios

    External link STEPS Glossary



    Study links outside this site
    Andrew Roberts' web Study Guide
    Picture introduction to this site
    Top of Page Take a Break - Read a Poem
    Click coloured words to go where you want

    Andrew Roberts likes to hear from users:
    To contact him, please use the Communication Form

    © Andrew Roberts



  • Maths index

    Figure it out

    *****************

    Statistics index

    Blue: this page. Red: other pages

    Analysis

    Array

    Averages

    Census

    Combinations

    Control

    Data

    indicators

    Population

    Permutations

    Probability

    Raw Data

    reliable indicators

    Sample

    Series

    Time series

    Unimodal

    valid indicators