Student: I notice that people sometimes use the words statistics and probability when talking about
the same things. Are these two words just different names for the same concept?
Mentor: What do you think?
Student: I want to check a dictionary first and see what it says.
Mentor: Check several dictionaries and based on what you find, make a definition for each word. A
scientific or mathematical dictionary will give you more detailed information.
Probability 1: being probable
2: something that is probable
3:a ratio expressing the chances that a certain event will occur
4:a branch of mathematics studying chances of random events.
Statistics 1: facts or data assembled and classified so as to present significant information
2: collection, calculation, description, manipulation, and interpretation of the mathematical
attributes of large sets or populations
3: a branch of mathematics dealing with collection, analysis and interpretation of data.
Student: So statistics is all about data, and probability is all about chance.
Mentor: Exactly. Let me talk about probability as the measure of chance. Specialists look at this
meaning of probability in two different ways that are called
Frequency View and
Personal View (or Subjective View, as philosophers call it).
Frequency View
vs.
Personal View
Example: To find the chances (probability) of getting 3 on a six-sided die, you roll the die
1,000,000 times. For 166,549 times, the roll is a 3. You find the proportion of 3's by
dividing:
166,549 / 1,000,000 = 0.166549
It is approximately 1/6, so you conclude that the probability of getting 3 on this
particular die is 1/6.
vs.
Example:To find the chances (probability) of getting 3 on a six-sided die, you sit down and
think. You reason that all the sides of the die are the same, and that you can believe
that the die does not have holes or heavy objects inserted into it. You conclude that
each side of the die should have the same chance of landing face up, and therefore, that
when you roll the die, you have one chance in six to get a 3. Your answer is that the
probability of getting 3 is 1/6.
Definition: Probability of an event in an experiment is the proportion (or frequency) of that event
when the
same exact experiment is repeated many times.
vs.
Definition: Probability of an event is what a person who studies it
believes about the chances of the event. People who define probabilities use their knowledge
about the world to make "the best possible guess."
Who likes it:Scientists, mathematicians.
vs.
Who likes it: Philosophers, economists, mathematicians.
Mentor: Which of these two ways of looking at probability is closer to statistics?
Student: The Frequency View, because it talks about collecting data.
Mentor: A very important part of the Frequency View definition is that you need to repeat the same
exact experiment to find the probability. It is almost never possible where humans are
concerned, for example, in sports or medicine. I would like to offer you several quotes, and
you can find and correct the errors in them.
Student: Sounds like fun. When I learn to do it, I can find quotes in journals or on TV and correct
them, too!
Quote
Error
Conclusion that may be true
"Our team won about 3/4 of the games in every season so far. I tell you, the probability
of us winning the next game is 3 out of 4!"
Each game is different from other games. Maybe the opposing team will be much stronger
than usual next time. Maybe the weather will be different. Maybe a key player will be
sick. And so on. Also, the team may always win against a particular team (the one that
is going to play tomorrow), which will affect the chances.
"Our team won about 3/4 of the games in every season so far. If nothing major changes, I
believe we are going to win about 3/4 of the games in this season, too."
"One out of eight women in the USA develops breast cancer during her lifetime.
Therefore, if you are female, the probability of you having this form of cancer is1/8."
You are unique (just like everybody else). There is no way for a person to know her
exact chances in anything that is connected with health. Studies show that body
proportions, diet, weight, clothes preferences, number of pregnancies and breastfeeding
all affect breast cancer rates in women. Even though "one out of eight" is the average
for the USA, it does not tell much about each particular person.
"One out of eight women in the USA develops breast cancer during her lifetime. If we
randomly select 1,000,000 women and look at their medical histories, we can expect about
125,000 (not exactly!) of them to develop breast cancer."
"On the average, drivers have accidents once every two years. Your last accident was 3
years ago, so you can expect an accident any time now."
Rates of accidents vary greatly with experience, car type, age and health of the driver,
driving habits, and so on. National average says close to nothing about your chances of
having an accident.
"On the average, drivers have accidents once every two years. If you randomly choose
1000 drivers, you can expect them all together to have had about 5000 accidents over the
previous 10 years."
Student: All these errors are of the same type. They take data about large numbers of people, and try
to use it in personal cases.
Mentor: Collecting data about large numbers of people (or other objects), and using this data for
studying other large groups of people as you did in the "Conclusion that may be true" column,
belongs to statistics. The only time it can be used for probability, that is, for studying
chances in individual cases, is when all the experiments are the same (or almost the same).
You can use data (statistics) from rolling a six-sided die one million times (in exactly the
same manner!) to find the chances (probability) of rolling 5 on your next try. You can not use
data (statistics) from studying driving records of a million people to find the chances
(probability) of yourself having an accident today.
Student: So statistics deals with data that may or may not be useful for finding probability.
Mentor: Yes. Data can also be useful by itself, without any connection to probability. For example,
you need to know, at least approximately, how many voters live in a particular city in order
to prepare for elections. You may want to know the average amount of hazardous chemicals each
factory discharges into a particular water basin per month in order to find out if there is a
serious environmental problem. You might want to know the proportion of people who get the flu
during each year in order to compare several years and to try to find out what may cause
increases in flu rates.
Student: I am just glad there are computers to help us to deal with all that data!