##### Citation preview

Cambridge International AS & A Level Mathematics:

Probability & Statistics 2 Practice Book

Contents How to use this book 1 Hypothesis testing 2 The Poisson distribution 3 Linear combinations of random variables 4 Continuous random variables 5 Sampling 6 Estimation The standard normal distribution function Answers

How to use this book Throughout this book you will notice particular features that are designed to help your learning. This section provides a brief overview of these features.

■ Find means and variances of linear combinations of random variables. ■ Calculate probabilities of linear combinations of random variables. ■ Solve problems involving linear combinations of random variables. Learning objectives indicate the important concepts within each chapter and help you to navigate through the practice book WORKED EXAMPLE 2.2 On average, flaws occur in a roll of cloth at the rate of 3.6 per metre. Assuming a Poisson distribution is appropriate, find the probability of: a

exactly nine flaws in three metres of cloth

b

less than three flaws in half a metre of cloth.

Answer Use the interval to determine the mean, λ. For three metres, λ = 3 × 3.6 = 10.8.

a

Worked examples provide stepby- step approaches to answering questions. The left side shows a fully worked solution, while the right side contains a commentary explaining each step in the working. Throughout each chapter there are multiple exercises containing practice questions. The questions are coded: PS

These questions focus on problem-solving.

P

These questions focus on proofs.

M

These questions focus on modelling.

The End-of-chapter review exercise contains exam-style questions covering all topics in the chapter. You can use this to check your understanding of the topics you have covered.

END-OF-CHAPTER REVIEW EXERCISE 2 1

2

The random variable X follows a Poisson distribution. Given that P(X ⩽ 1) = 0.4, find: a

the mean of the distribution

b

P(2 < X < 6).

Patients arrive at random at an emergency room in a hospital at the rate of 14 per hour throughout the day.

Chapter 1 Hypothesis testing ■ Understand the nature of a hypothesis test, the difference between one-tailed and two-tailed ■

■ ■ ■

tests, and the terms null hypothesis, alternative hypothesis, significance level, rejection region (or critical region), acceptance region and test statistic. Formulate hypotheses and carry out a hypothesis test in the context of a single observation from a population which has a binomial distribution, using: □ direct evaluation of probabilities □ a normal approximation to the binomial. Interpret outcomes of hypothesis tests in context. Understand the terms Type I error and Type II error in relation to hypothesis tests. Calculate the probabilities of making Type I and Type II errors in specific situations involving tests based on a normal distribution or direct evaluation of binomial probabilities.

Chapter 1 Hypothesis testing ■ Understand the nature of a hypothesis test, the difference between one-tailed and two-tailed ■

■ ■ ■

tests, and the terms null hypothesis, alternative hypothesis, significance level, rejection region (or critical region), acceptance region and test statistic. Formulate hypotheses and carry out a hypothesis test in the context of a single observation from a population which has a binomial distribution, using: □ direct evaluation of probabilities □ a normal approximation to the binomial. Interpret outcomes of hypothesis tests in context. Understand the terms Type I error and Type II error in relation to hypothesis tests. Calculate the probabilities of making Type I and Type II errors in specific situations involving tests based on a normal distribution or direct evaluation of binomial probabilities.

1.2 One-tailed and two-tailed hypothesis tests WORKED EXAMPLE 1.2 A newspaper reported that 75% of students regularly cycle to college. A college dean believes that figure to be different at his college. He asks a sample of 160 students; 109 say they do cycle. Test the dean’s belief at the 5% level of significance. Answer H0 : p = 0.75, H1 : p ≠ 0.75

State the hypotheses.

X ~ B(160, 0.75) ≈ N(120, 30) P(X ⩽ 109) = P(X < 109.5) =

Remember to use a continuity correction.

= 0.0277 > 0.025

For a two-tailed test compare the test result with half the significance level.

Accept H0. There is insufficient evidence to show that the proportion cycling at the college is different.

EXERCISE 1B 1

2

PS

PS

3

4

Write down the null and alternative hypotheses for the following tests, defining the meaning of any parameters. a

Sofia has a coin that she thinks is biased, and wants to use a hypothesis test to check this.

b

Max knows that, last year, 26% of entries in AS Level Psychology were graded ‘A’ and wants to check whether this proportion has changed.

In each question you are given null and alternative hypotheses (where p stands for the population proportion), the significance level and the observed data. Decide whether or not there is sufficient evidence to reject the null hypothesis. a

H0 : p = 0.4, H1 : p ≠ 0.4, significance level 8%, observed 35 successes out of 100 trials

b

H0 : p = 0.8, H1 : p ≠ 0.8, significance level 5%, observed 17 successes out of 20 trials.

A magazine article reported that 70% of computer owners use the internet regularly. Marie believed that the true figure was different and she consulted 12 of her friends who owned computers. Twelve said that they were regular users of the internet. a

Test Marie’s belief at the 10% significance level.

b

Comment on the reliability of the test in the light of Marie’s sample.

In the last general election, Party Z won 36% of the vote. An opinion poll surveys 100 people and finds that 45 support Party Z. Does this provide sufficient evidence at the 10% significance level that the proportion of voters who support Party Z has changed?

PS

5

A teacher knows that in his old school, a third of all final-year students had a younger sibling at the school. He moves to a new school and wants to find out whether this proportion is different. He asks a sample of 60 final-year students, and finds that 27 of them have a younger sibling at the school. Conduct a hypothesis test at the 5% significance level to decide whether there is evidence that the proportion of final-year students with a younger sibling at the new

school is different to that of the old school. PS

6

A large athletics club had the same running coach for several years. Records show that 28% of his athletes could run 100 metres in under 12 seconds. The club brings in a new coach and over the following year 26 out of a sample of 75 athletes recorded 100-metre times under 12 seconds. Do these data support the hypothesis that the proportion of athletes who can run 100 metres in under 12 seconds has changed? Use the 5% significance level for your test.

PS

7

A jar contains a large number of coloured beads, some of which are red. A random sample of 80 of these beads is selected and 19 are found to be red. Test, at the 10% significance level, whether 30% of the beads in the jar are red.

PS

8

In order to test a coin for bias it is tossed 12 times. The result is 9 heads and 3 tails. Test, at the 10% significance level, whether the coin is biased.

PS

9

If births are equally likely on any day of the week then the proportion of babies born at the weekend should be . Out of a random sample of 490 children it was found that 132 were born at the weekend. Does this provide evidence, at the 5% significance level, that the proportion of babies born at the weekend differs from ?

PS

10 A test is constructed to see if a coin is biased. It is tossed 10 times and if there are 10 heads, 9 heads, 1 head or 0 heads it is declared to be biased. For each of the following, explain whether or not it could be the significance level for this test: a

1%

b

2%

c

10%

d

20%

1.3 Type I and Type II errors WORKED EXAMPLE 1.3 A researcher claims that when people estimate values, 70% of the time the values estimated are a multiple of 10. Jinan believes the percentage should be higher. She conducts her own experiment with a sample of 19 people and tests the claim at the 5% significance level. a

Show that the claim will be rejected if the estimates of 17 people are a multiple of 10, but not if the estimates of 16 people are a multiple of 10.

b

Calculate the probability of a Type I error.

c

If the proportion of people who estimate a multiple of 10 is 80%, find the probability that the test will result in a Type II error.

H0 : p = 0.7, H1 : p > 0.7

p = 0.7. It is a one-tailed test because it is believed that the percentage should be higher.

P(X ⩾ 17) =

Calculate the test statistics.

X ~ B(19, 0.7)

= 0.0462

b

P(X ⩾ 16) =

Use the previous working if you can.

0.0462 < 0.05, whereas 0.133 > 0.05 H0 is rejected if X ⩾ 17, but not if X ⩾ 16.

A statement fully answers the question and shows you understand the calculations.

P(Type I error) = P(X ⩾ 17 | p = 0.7) = 0.0462

c

For a Type I error, H0 will be rejected when it should be accepted. We have found this probability in part a.

X ~ B(19, 0.8)

Use p = 0.8.

P(X < 17) = 1 − P(X ⩾ 17)

For a Type II error, H0 is accepted when it should be rejected. H0 is rejected for X ⩾ 17 so we need to find P(X < 17).

=

= 1 − 0.237 = 0.763

EXERCISE 1C PS

1

A supplier of orchid seeds claims that their germination rate is 0.95. A purchaser of the seeds suspects that the germination rate is lower than this. In order to test this claim the purchaser plants 20 seeds in similar conditions and counts the number, X, which germinate. He rejects the claim if X ⩽ 17.

PS

PS

PS

PS

PS

PS

2

3

4

5

6

7

a

Formulate suitable null and alternative hypotheses to test the seed supplier’s claim.

b

What is the probability of a Type I error using this test?

c

Calculate P(Type II error) if the probability that a seed germinates is in fact 0.80.

A manufacturer claims that the probability that an electric fuse is faulty is no more than 0.03. A purchaser tests this claim by testing a box of 500 fuses. A significance test is carried out at the 5% level using X, the number of faulty fuses in a box of 500, as the test statistic. a

For what values of X would you conclude that the probability that a fuse is faulty is greater that 0.03?

b

Estimate P(Type I error) for this test.

c

For this test estimate P(Type II error) if the probability that a fuse is faulty is, in fact, 0.06.

A newspaper reported that 55% of households own more than one television set. Each of a random sample of 12 households in a town is contacted and the number of households owning more than one television set is denoted by N. A test of whether the proportion p of households in the town owning more than one television set is greater than 55% is carried out. It is decided to accept that p is greater than 55% if N > 9. a

Calculate P(Type I error).

b

Calculate P(Type II error) when the actual value of p is 60%.

It is suspected that the die used in a board game is biased away from a 6. In order to test this theory, the die is rolled 30 times and the number, X, of 6s is counted. If the number of 6s is less than 3 it is accepted that the die is biased away from a 6. a

Set up suitable null and alternative hypotheses for testing the theory that the die is biased away from a 6.

b

Calculate the significance level of the test.

c

State the probability of a Type I error.

d

If, in fact, the probability of getting a 6 with the die is 0.1, calculate the probability of a Type II error.

A drug for treating phlebitis has proved effective in 75% of cases when it has been used. A new drug has been developed which, it is believed, will be more successful and it is used on a sample of 16 patients with phlebitis. A test is carried out to determine whether the new drug has a greater success rate than 75%. The test statistic is X, the number of patients cured by the new drug. It is decided to accept that the new drug is more effective if X > 14. a

Find α, the probability of making a Type I error.

b

Find β, the probability of making a Type II error when the actual success rate is 80%.

Of a certain make of electric toaster, 10% have to be returned for repair within three months of purchase. A modification to the toaster is made in the hope that it will be more reliable. Out of 24 modified toasters sold in a store, none was returned for repair within three months of purchase. The proportion of all the modified toasters that are returned for repair within three months of purchase is denoted by p. a

State, in terms of p, suitable hypotheses for a test.

b

Test whether there is evidence, at a nominal 10% significance level, that the modified toaster is more reliable than the previous model in that it requires fewer repairs.

c

What is the probability of making a Type I error in the test?

It is known that many crimes are committed by people with backgrounds of drug abuse. A proportion of 60% has been suggested and, to investigate this, a researcher undertakes a study of 100 criminals and will carry out a test at a nominal 10% significance level. The null hypothesis is that the proportion of such criminals is 60% and the alternative hypothesis is that the proportion differs from 60%.

PS

8

a

Find the rejection region of the test.

b

Find P(Type I error) for the test.

c

Find P(Type II error) for the test when the actual proportion is 40%.

An investigator suspects that operatives using a spring balance are reluctant to give 0 as the last value of a recorded weight, for example, 4.10 or 0.30. In order to test her theory she takes a random sample of 40 recorded weights and counts the number, X,

which end in 0. a

State suitable hypotheses, involving a probability, for a hypothesis test which could indicate whether the operatives avoid ending a recorded weight with 0.

b

Show that, for a test at the 10% significance level, the null hypothesis will be rejected if X = 1 but not if X = 2.

c

State the rejection region for the test in terms of X.

d

Calculate the value of P(Type I error).

END-OF-CHAPTER REVIEW EXERCISE 1 PS

1

Angela is playing a board game with her friends, but thinks the die is biased and that a 6 is rolled too infrequently. In the subsequent 40 rolls of the die she got only three 6s. Test Angela’s belief at the 10% significance level.

PS

2

The proportion of 18- to 19-year-old students in favour of a new uniform is known to be 70%. Rhianna wants to find out whether the proportion of 16- to 17-year-old students is the same. She proposes to test the null hypothesis H0 : p = 0.7 against two different alternative hypotheses, H1 : p < 0.7 and H2 : p ≠ 0.7, using the 10% significance level. Rhianna asks a sample of 25 16- to 17-year-old students. The data give her sufficient evidence to reject H0 in favour of H1, but not in favour of H2. How many students in Rhianna’s sample were in favour of the new uniform?

PS

3

A student tests the hypothesis H0 : p = 0.4 against H1 : p > 0.4, where p is the proportion of brown cats of a particular breed. In a sample of 80 cats of this breed, 40 were brown. This leads him to reject the null hypothesis. What can you say about the significance level he used for his test?

PS

4

Sean has an eight-sided die and wants to check whether it is biased by looking at the probability, p, of rolling a 4. He sets up the following hypotheses:

To test them he decides to roll the die 80 times and reject the null hypothesis if the number of 4s is greater than 15 or fewer than 5. a

Let X be the number of 4s observed out of the 80 rolls. State the name given to the region 5 ⩽ X ⩽ 15.

b

What is the probability that John incorrectly rejects the null hypothesis?

PS

5

A new cold relief drug is tested for effectiveness on 150 volunteers, and 124 of them found the drug beneficial. The manufacturers believe that more than 75% of people suffering from a cold will find the drug beneficial. Test the manufacturers’ belief at the significance level.

PS

6

A supermarket buys a large batch of plastic bags from a manufacturer to be used in the store. In previous batches 7% of the bags were defective. A quality control manager wishes to test whether the batch has a higher defective rate than 7%, in which case the batch will be returned to the manufacturer. He examines 125 randomly selected bags and finds that 14 are defective. Carry out the manager’s test at the 3% significance level and state whether he should return the batch.

PS

7

A doctor knows that 20% of people suffer from side effects when treated with a certain drug. He wants to see if the proportion of people suffering from side effects is lower with a new drug. He looks at a random sample of 30 people treated with the new drug. What is the largest number of people suffering from side effects that would still allow the doctor to conclude at 5% significance that the new drug has a lower proportion of side effects?

PS

8

Aneka is investigating attitudes to sport among students at her school. She decides to carry out a survey using a sample of 70 students. One of Aneka’s questions is about participation in school sports teams. She wants to find out whether more than 40% of students play for a school team. She sets up the following hypotheses: H0 : p = 0.4, H1 : p > 0.4, where p is the proportion of students who play for a school sports team.

9

a

Find the critical region for the hypothesis test at the 10% significance level, using a sample of 70 students.

b

What is the probability of incorrectly rejecting the null hypothesis?

c

In Aneka’s sample, 32 students play for a school team. State the conclusion of the test.

The proportion of students getting an A in AS Mathematics is currently 33%. A publisher produces a new textbook that they hope will lead to improved performance. They trial

their textbook with a sample of 120 students and want to test their hypothesis at the 5% significance level. Find the critical region for this test. 10 A machine produces smartphone parts. Previous experience suggests that, on average, 7 in every 200 parts are faulty. After the machine was accidentally moved, a technician suspects that the proportion of faulty parts may have increased. She decides to test this hypothesis using a random sample of 85 parts. a

State suitable null and alternative hypotheses. The technician decides that the critical region for the test should be X ≥ 5. After checking her sample, she finds that 4 parts are faulty.

b

c

What is the probability of incorrectly rejecting the null hypothesis?

Chapter 2 The Poisson distribution ■ ■ ■ ■ ■ ■

Understand the Poisson distribution as a probability model. Calculate probabilities using the Poisson distribution. Solve problems involving linear combinations of independent Poisson distributions. Use the Poisson distribution as an approximation to the binomial distribution. Use the normal distribution as an approximation to the Poisson distribution. Carry out hypothesis tests of a Poisson model.

2.1 Introduction to the Poisson distribution WORKED EXAMPLE 2.1 The mean number of text messages Joel receives per day is 3.4. a

What assumptions do you need to make to use the Poisson distribution as a model for the number of text messages received per day by Joel?

b

Assuming the Poisson distribution is a suitable model, find the probability of receiving, on one day: i

exactly five text messages

ii

less than two text messages.

Text messages are received independently and at random.

Write clearly and in the context of the question.

b

i

Use

ii

Factorise e−λ to simplify the working.

.

EXERCISE 2A 1

2

A random variable X follows a Poisson distribution with mean 1.7. Copy and complete the following table of probabilities, giving results to three significant figures: x

1

2

3

4

> 4

P(X = x)

Calculate the following probabilities. a

b

c

d

3

4

If X ~ Po(2): i

P(X = 3)

ii

P(X = 1)

If Y ~ Po(1.4): i

P(Y ⩽ 3)

ii

P(Y ⩽ 1)

If X ~ Po(5.9): i

P(Y ⩾ 3)

ii

P(Y > 1)

If X ~ Po(11.4): i

P(8 < X < 11)

ii

P(8 ⩽ X ⩽ 12)

The random variable T has a Poisson distribution with mean 3. Calculate: a

P(T = 2)

b

P(T ⩽ 1)

c

P(T ⩾ 3)

Given that U ~ Po(3.25), calculate: a

P(U = 3)

b

P(U ⩽ 2)

c

P(U ⩾ 2)

5

6

PS

M

7

8

The random variable W has a Poisson distribution with mean 2.4. Calculate: a

P(W ⩽ 3)

b

P(W ⩾ 2)

c

P(W = 3)

The random variable X has a Poisson distribution with mean 5. Calculate: a

P(X ⩽ 5)

b

P(3 < X ⩽ 5)

c

P(X ≠ 4)

The number of eagles observed in a forest in one day follows a Poisson distribution with mean 1.4. a

Find the probability that more than three eagles will be observed on a given day.

b

Given that at least one eagle is observed on a particular day, find the probability that exactly two eagles are seen that day.

Seven observations of the random variable X, the number of power surges per day in a power cable, are shown below: 0, 1, 2, 2, 3, 4, 6

9

a

Estimate the mean and standard deviation of X, based upon these observations.

b

Use your answer to a to explain why the Poisson distribution is a plausible model for X.

The number of mistakes a teacher makes whilst marking homework has a Poisson distribution with a mean of 1.6 errors per piece of homework. Find the probability that there are at least two marking errors in a randomly chosen piece of homework.

10 The number of particles emitted per second by a radioactive source has a Poisson distribution with mean 5. Calculate the probabilities of the following emissions in a time interval of 1 second:

M

a

0

b

1

c

2

d

3 or more.

11 For each of the following situations state whether the Poisson distribution would provide a suitable model. Give reasons for your answers. a

The number of cars per minute passing under a road bridge between 10 a.m. and 11 a.m. when the traffic is flowing freely.

b

The number of cars per minute entering a city-centre car park on a busy Saturday between 9 a.m. and 10 a.m.

c

The number of particles emitted per second by a radioactive source.

d

The number of raisins in cakes sold at a particular baker’s shop on a particular day.

e

The number of blood cells per ml in a dilute solution of blood which has been left standing for 24 hours.

f

The number of blood cells per ml in a well-shaken dilute solution of blood.

2.2 Adapting the Poisson distribution for different intervals WORKED EXAMPLE 2.2 On average, flaws occur in a roll of cloth at the rate of 3.6 per metre. Assuming a Poisson distribution is appropriate, find the probability of: a

exactly nine flaws in three metres of cloth

b

less than three flaws in half a metre of cloth.

Answer Use the interval to determine the mean, λ. For three metres, λ = 3 × 3.6 = 10.8.

a

b

For half a metre,

EXERCISE 2B 1

State the distribution of the variable in each of the following situations. a

b

c

PS

Cars pass under a motorway bridge at an average rate of 6 per 10-second period. i

The number of cars passing under the bridge in one minute.

ii

The number of cars passing under the bridge in 15 seconds.

Leaks occurs in water pipes at an average rate of 12 per kilometre. i

The number of leaks in 200 m.

ii

The number of leaks in 10 km.

12 worms are found on average in a 1 m2 area of a garden. i

The number of worms found in a 0.3m2 area.

ii

The number of worms found in a 2m by 3m area of garden.

2

From a particular observatory, shooting stars are observed in the night sky at an average rate of one every five minutes. Assuming that this rate is constant and that shooting stars occur (and are observed) independently of each other, what is the probability that more than 20 are seen over a period of one hour?

3

A wire manufacturer is looking for flaws. Experience suggests that there are on average 1.8 flaws per metre in the wire.

4

5

6

a

Determine the probability that there is exactly one flaw in one metre of wire.

b

Determine the probability that there is at least one flaw in two metres of wire.

Salah is sowing flower seeds in his garden. He scatters seeds randomly so that the number of seeds falling on any particular region is a random variable with a Poisson distribution, with mean value proportional to the area. He intends to sow fifty thousand seeds over an area of 2 m2. a

Calculate the expected number of seeds falling on a 1 cm2 region.

b

Calculate the probability that a given 1 cm2 area receives no seeds.

On average, 15 customers a minute arrive at the checkouts of a busy supermarket. Assuming that a Poisson distribution is appropriate, calculate: a

the probability that no customers arrive at the checkouts in a given 10-second interval

b

the probability that more than three customers arrive at the checkouts in a 15second interval.

On average, a cycle shop sells 1.8 cycles per week. Assuming that the sales occur at random:

7

PS

8

9

a

find the probability that exactly two cycles are sold in a given week

b

find the probability that exactly four cycles are sold in a given two-week period.

The number of demands for taxis to a taxi firm is Poisson distributed with, on average, 4 demands every 30 minutes. Find the probabilities of: a

one demand in one hour

b

fewer than two demands in 15 minutes.

Assume that cars pass under a bridge at a rate of 100 per hour and that a Poisson distribution is appropriate. a

What is the probability that no cars will pass under the bridge during a given 3minute period?

b

For what time interval is the probability that no car will pass under the bridge during that interval at least 0.25?

If there are an average of 12 buses per hour arriving at a bus stop, find the probability that there are more than 6 buses in 30 minutes.

2.3 The Poisson distribution as an approximation to the binomial distribution WORKED EXAMPLE 2.3 At a factory, items on a production line are defective at random with a constant probability of 0.02. The items are packed in boxes of 250. Find the probability that a randomly chosen box contains fewer than seven defective items. Answer There is a fixed number of items in each box so we can model the situation by a binomial distribution. For the binomial, n is large and p small so we can approximate to a Poisson where λ = np = 250 × 0.02 = 5.

EXERCISE 2C 1

a

There are 1000 pupils in a school. Find the probability that exactly three of them have their birthdays on 1 January, by using: i ii

b

There are 5000 students in a university. Calculate the probability that exactly 15 of them have their birthdays on 1 January, by using: i

a suitable binomial distribution

ii

a suitable Poisson approximation.

For the rest of the exercise, use the Poisson approximation to the binomial distribution where appropriate. 2

Calculate P(X < 3) given that X ∼ B(100, 0.02).

3

If X < B(200, 0.03) find P(X ⩽ 3).

4

If X ∼ B(300, 0.004) find:

5

PS

a

P(X < 3)

b

P(X > 4).

The probability that a patient has a particular disease is 0.008. One day 80 people go to their doctor. a

What is the probability that exactly two of them have the disease?

b

What is the probability that three or more of them have the disease?

6

The probability of success in an experiment is 0.01. Find the probability of four or more successes in 100 trials of the experiment.

7

When eggs are packed in boxes the probability that an egg is broken is 0.008. a

What is the probability that in a box of six eggs there are no broken eggs?

b

Calculate the probability that in a consignment of 500 eggs fewer than four eggs are broken.

PS

M

8

9

When a large number of flashlights leaving a factory is inspected, it is found that the bulb is faulty in 1% of the flashlights and the switch is faulty in 1.5% of them. Assuming that the faults occur independently and at random, find: a

the probability that a sample of 10 flashlights contains no flashlights with a faulty bulb

b

the probability that a sample of 80 flashlights contains at least one flashlight with both a faulty bulb and a faulty switch

c

the probability that a sample of 80 flashlights contains more than two faulty flashlights.

Explain why Poisson is not a suitable approximation to use to calculate the following probabilities. a

P(X < 10) given that X∼B(60, 0.3)

b

P(X < 2) given that X∼B(10, 0.1)

2.4 Using the normal distribution as an approximation to the Poisson distribution WORKED EXAMPLE 2.4 It is thought that the number of potholes on a given stretch of road occur at the rate of 1.6 per week. Assuming that potholes occur independently and at random, find the probability of fewer than 100 potholes occurring on that stretch of road in one year. Answer Let X be the random variable of number of potholes on the given stretch of road.

Define the random variable and its parameters. In one year, 52 weeks, λ = 52 × 1.6 = 83.2. For large values of λ, it is appropriate to use a normal approximation N(λ, λ). For the probability use . Remember to use a continuity correction.

EXERCISE 2D 1

2

If X ~ Po(30) find: a

P(X ⩽ 31)

b

P(35 ⩽ X ⩽ 40)

c

P(29 ⩽ X ⩽ 32)

Accidents occurs in a factory at an average rate of five per month. Find the probability that: a

there will be fewer than four accidents in one month

b

there will be exactly 62 accidents in one year.

3

The number of accidents on a road follows a Poisson distribution with a mean of 8 per week. Find the probability that in one year (assumed to be 52 weeks) there will be fewer than 400 accidents.

4

Insect larvae are distributed at random in a pond at a mean rate of 8 per m3 of pond water. The pond has a volume of 40 m3. Calculate the probability that there are more than 350 insect larvae in the pond.

5

Water taken from a river contains on average 16 bacteria per ml. Assuming a Poisson distribution, find the probability that 5 ml of the water contains:

6

a

between 65 and 85 bacteria, inclusive

b

exactly 80 bacteria.

A company receives an average of 40 telephone calls an hour. The number of calls follows a Poisson distribution. a

Find the probability that there are between 35 and 50 calls (inclusive) in a given hour.

b

Find the probability that there are exactly 42 calls in a given hour.

PS

7

Given X ~ Po(50) and P(X > x) ⩽ 0.05, find the minimum integer value of x.

PS

8

Sales of cooking oil bought in a shop during a week follow a Poisson distribution with mean 100. How many units should be kept in stock to be at least 99% certain that supply will be able to meet demand?

2.5 Hypothesis tests with the Poisson distribution WORKED EXAMPLE 2.5 A company receives complaints at an average rate of 4.2 per day. Following a company reorganisation, there are no complaints received the following day. a

Test at the 5% level of significance whether the number of complaints had decreased.

In the following week (7 days) there are 21 complaints. b

Does this information affect your previous conclusion? Comment on the reliability of your conclusions.

Let X be the random variable for the number of complaints received.

State the random variable and the distribution.

X ~ Po(4.2) H0: λ = 4.2, H1: λ < 4.2

State the null and alternative hypotheses.

P(X = 0) = e−4.2 = 0.015

Calculate the probability.

0.015 < 0.05 Reject H0.

Compare the probability with the significance level and interpret the result.

There is sufficient evidence to show that the number of complaints had decreased. b

X ~ Po(29.4) ≈ N(29.4, 29.4)

State the distribution and the approximating distribution. Mean = 4.2 × 7 = 29.4

H0 : μ = 29.4, H1 : μ < 29.4

State the null and alternative hypotheses. Calculate the probability. Remember to use a continuity correction.

0.0725 > 0.05 Accept H0.

Compare the probability to the significance level and interpret the result.

There is insufficient evidence to suggest that the number of complaints had decreased. The conclusions differ. The second conclusion is likely to be more reliable as the data was collected over a longer period of time.

Make sure you answer all parts of the question fully.

EXERCISE 2E PS

1

The random variable X has a Poisson distribution with mean λ. A single observation of X

has the value 4. Test the null hypothesis λ = 2 against the alternative hypothesis λ > 2 at the 5% significance level. PS

2

The number of car accidents that occur along a certain stretch of road may be assumed to have a Poisson distribution with mean four per week. In the first two weeks after a new warning sign had been erected, three accidents occurred on the road. a

Test, at the 5% significance level, whether this indicates a reduction in the mean accident rate. During the next three weeks, eleven accidents occurred on the road.

b

Does this extra information alter the conclusion of the test?

PS

3

For a particular secretary, the number of errors in a page of text has a Poisson distribution with mean 1.4. A proofreader finds four errors on a randomly chosen page of a document. Test, at the 10% significance level, whether this indicates that the document was not produced by the secretary.

PS

4

Over a period of time it has been found that the mean number of letters per week passing through a small sorting office is 3245. In the week following a campaign to promote letter writing, 3455 letters passed through the office. Assuming that the number of letters per week can be modelled by a Poisson distribution, test, at the 5% level, whether there is evidence that the publicity campaign has been effective.

PS

5

The average number of calls received each day by a telephone help line was 1.5. After a publicity campaign in the press and on radio, it was found that the total number of calls to the line, over a period of two days, was five.

PS

6

a

State a suitable probability distribution to use in a test of whether the daily average number of calls to the line has increased. What must be assumed for the chosen probability distribution to be valid?

b

Carry out the test at the 5% significance level.

A store discovers that its credit card machine rejects, on average, one credit card in every 890 transactions. Let X denote the number of rejections in a randomly chosen 2136 transactions. a

Explain why the distribution of X may be approximated by a Poisson distribution. On a particular day when there were 2136 transactions the number of rejected credit cards was six.

b PS

7

Test, at the 5% significance level, whether there is evidence that the average number of rejected credit cards has increased.

A company manufactures 5 amp fuses and, under normal conditions, 7% of the fuses are faulty. They are packed in boxes of 60. a

Explain why the number of faulty fuses in a randomly chosen box has an approximate Poisson distribution.

b

A box randomly chosen from a day’s production has one faulty fuse. Test, at the 5% significance level, whether the percentage of faulty fuses on that day is lower than 7%.

PS

8

In the past an office photocopier has failed, on average, three times every two weeks. A new, more expensive, photocopier, which the manufacturers claim is more reliable, is on trial. In the first four weeks of use this new photocopier fails once. Assuming that the failures of the photocopier occur independently and at random, test, at the 5% significance level, whether there is evidence that the new photocopier is more reliable than the old one.

PS

9

The number of red cells in a small standard volume of blood of a healthy person is modelled by a Poisson distribution with mean 20. A doctor suspects that Rani has an abnormally high red cell count so she is given a blood test. The number of red cells in a standard volume of her blood is denoted by R. A statistical test of whether the doctor’s suspicion is confirmed is carried out. If R ⩽ 25, Rani’s blood count is accepted as normal. a

Estimate the probability of making a Type I error in the test.

b

Estimate the probability of making a Type II error when the mean is actually 30.

END-OF-CHAPTER REVIEW EXERCISE 2 1

PS

2

3

PS

PS

PS

4

5

6

The random variable X follows a Poisson distribution. Given that P(X ⩽ 0) = 0.1, find: a

the mean of the distribution

b

P(2 < X < 6).

Patients arrive at random at an emergency room in a hospital at the rate of 14 per hour throughout the day. a

Find the probability that exactly four patients will arrive at the emergency room between 18:00 and 18:15.

b

Given that fewer than 15 patients arrive in one hour, find the probability that more than 12 arrive.

Accidents on a busy urban road occur at a mean rate of two per week. Assuming that the number of accidents per week follows a Poisson distribution, calculate the probability that: a

there will be no accidents in a particular week

b

there will be exactly two accidents in a particular week

c

there will be fewer than three accidents in a given two-week period.

During the period from May 2014 to April 2017, 18 laptop computers were lost by employees at an international company. After a vigorous enquiry it was hoped that the rate of loss would drop. a

State what must be assumed for the number of laptop computers lost during a fixed period of time to have a Poisson distribution.

b

Find the greatest number of laptop computers that can be lost during the next year in order to have sufficient evidence, at the 2 % significance level, of a drop in the loss rate.

A machine that weaves a carpet of width 2 m produces slight flaws in the carpet at a rate of 1.8 per metre length. a

State what must be assumed for the number of flaws in a given length of carpet to have a Poisson distribution.

b

After the machine is given an overhaul a random sample of 3 m length of the carpet is examined and found to have two flaws. Test, at the 5% significance level, whether the rate of incidence of the flaws has decreased.

c

A further 8.5 m length of the carpet is found to have six flaws. Pooling the two results, determine whether the conclusion of the above test changes.

Wild flowers of a certain species grow randomly in a forest area and at a uniform rate of 7.6 per 10 000 m2. a

Suggest a suitable probability distribution of the number of flowers of the species that grow in an area of 2500 m2 of the forest. After an unusually busy month, when the forest was visited by a large number of tourists, the forest managers wished to investigate whether the number of flowers of the species had decreased. In a pilot test a randomly chosen 2500 m2 of the forest was studied and found to contain no flower of the species.

PS

7

b

Test whether this indicates, at the 5% significance level, that the number of flowers of the species has decreased.

c

In a further study, the managers examined 50 randomly selected areas of 2500 m2 of forest. In 13 of the areas no flower of the species was found. Using a significance level of 5%, test whether this indicates that the number of flowers of the species has decreased.

In patients given laser surgery to treat astigmatism and short-sightedness, the proportion who suffer complications is reported to be, on average, 1 in 20. A monitoring agency is concerned that a newly-formed company that offers this treatment appears to have a high number of reported complications. Records of the first 60 patients treated by the company are obtained and the agency carries out a test, at a 10% significance level, of whether the true proportion of patients suffering complications is 20. The number of patients who suffer complications is denoted by N.

a

Explain why the distribution of N can be approximated by a Poisson distribution.

b

State, with a reason, whether the agency should carry out a one-tail or a two-tail test.

c

Show that the null hypothesis will be rejected if N = 6, but not if N = 5.

d

State the type of error that might be made in the cases:

e PS

8

9

i

N = 4

ii

N = 10

Estimate the probability of making a Type II error in the test when the actual proportion of complications attributed to the company is .

The number of times that a printing machine stops for attention during a given week has a Poisson distribution with mean 3.7. The machine undergoes some intensive adjustment and a two-tail test is carried out, based on the total number of stoppages, X, that occur over a period of 6 weeks. The test is of whether the mean number of stoppages per week has changed. The nominal significance level of the test is 5%. a

Find, in terms of X, the rejected region of the test.

b

Estimate the actual significance level of the test.

c

State the conclusion of the test for the case X = 18.

d

Estimate the probability of making a Type II error when the actual mean number of weekly stoppages (after the adjustments) is 4.0.

It is thought that the number of serious accidents, X, in a time interval of t weeks, on a given stretch of road, can be modelled by a Poisson distribution with mean 0.4t. Find the probability of: a

one or fewer accidents in a randomly chosen 2-week interval

b

12 or more accidents in a randomly chosen year.

Chapter 3 Linear combinations of random variables ■ Find means and variances of linear combinations of random variables. ■ Calculate probabilities of linear combinations of random variables. ■ Solve problems involving linear combinations of random variables.

3.1 Expectation and variance WORKED EXAMPLE 3.1 The random variable X is the number of heads obtained when two unbiased coins are spun. Work out E(4X − 2) and Var(4X − 2). Answer If E(X) and Var(X) are not given in the question, first work out their values.

Apply the general results E(aX + b) = aE(X) + b and Var(aX + b) = a2Var(X).

EXERCISE 3A 1

The random variable X has the probability distribution given in the table. x P(X = x)

2

3

2

3

5

7

0.3

0.25

0.35

0.1

a

Find E(X) and Var(X).

b

Use your answers to part a to calculate E(3X + 1) and Var(3X + 1).

Let X and Y be two independent variables with E(X) = 4 and Var(X) = 2. Find: a

E(3X)

b

Var(3X)

The random variable X has the probability distribution shown below. x P(X = x)

4

M

5

0

1

2

0.2

0.3

0.5

a

Find E(X) and Var(X). A variable Y is defined by Y = 3X + 2.

b

c

Verify your answers to part b by calculating E(Y) and Var(Y) from the probability distribution of Y.

A random variable X has mean 24 and variance 5. Find the mean and variance of: a

E(20 − X)

b

4X − 7

The random variable X has expectation 5 and variance 0.6. a

b

Find: i

E(2X + 3)

ii

Var(2X + 3)

Find two pairs of values for the constants a and b where E(aX + b) = 14 and Var(aX + b) = 20.

6

7

PS

8

The random variable X ~ B(12, 0.3). Find: a

E(2X − 1)

b

Var(2X − 1).

Given that X is the score from a single roll of a fair six-sided die, find: a

E(2X + 5)

b

Var(2X + 5).

The temperature, in degrees Fahrenheit, on a remote island is a random variable with mean 59 and variance 27. Find the mean and variance of the temperature in degrees Celsius, given that to convert degrees Fahrenheit to degrees Celsius you subtract 32 and then multiply by .

3.2 Sum and difference of independent random variables WORKED EXAMPLE 3.2 The oranges for sale at two shops are of different sizes. The mean mass of an orange at shop A is 154 g and the variance is 13 g2, while at shop B the oranges have mean mass 102 g and variance 4 g2. Joy believes that the total mass of two oranges chosen at random from shop A will be more than three times the mass of a randomly selected orange from shop B. Assuming that all oranges are chosen independently, find the mean and variance of the random variable C, where C is the amount by which two oranges from shop A are heavier than three times an orange from shop B. Answer Let A be the mass of an orange from shop A and B be the mass of an orange from shop B.

Note that the two oranges from shop A are chosen independently, while from shop B you are interested in three times the mass of one orange.

EXERCISE 3B 1

Let X and Y be two independent variables with E(X) = −1, Var(X) = 2, E(Y) = 4 and Var(Y) = 4. Find the expectation and variance of: a b

c

i

X − Y

ii

X + Y

i

3X + 2Y

ii

2X − 4Y

i ii

2

3

P

4

.

Let X and Y be two independent variables with E(X) = 4, Var(X) = 2, E(Y) = 1 and Var(Y) = 6. Find: a

E(3X − Y + 1)

b

Var(3X − Y + 1).

The random variable S is the score when an ordinary fair die is thrown. The random variable T is the number of tails obtained when a fair coin is tossed once. Find: a

E(S)

b

Var(S)

c

E(6T)

d

Var(6T)

e

E(6S + 6T)

f

Var(6S + 6T).

The random variable X is the number of even numbers obtained when two ordinary fair dice are thrown. The random variable Y is the number of even numbers obtained when two fair pentagonal spinners, each numbered 1, 2, 3, 4, 5, are spun simultaneously. a

Copy and complete the following probability distributions. x

0

1

2

P(X = x)

0.25

0

1

2

0.36

y P(Y = y) x + y

P(X + Y = x + y) b

0

1

2

3

4

0.09

By using the probability distributions, find: i

E(X)

ii

Var(X)

iii E(Y) iv Var(Y) v

E(X + Y)

vi Var(X + Y). c 5

Verify that E(X + Y) = E(X) + E(Y) and Var(X + Y) = Var(X) + Var(Y).

X and Y are independent random variables with probability distributions as shown. x P(X = x)

1

2

3

0.4

0.2

0.4

1

2

3

0.3

0.5

0.2

y P(Y = y)

You are given that E(X) = 2, Var(X) = 0.8, E(Y) = 0.9 and Var(Y) =0.49. The random variable T is defined as 2X − Y. Find E(T) and Var(T). 6

The independent random variables W, X and Y have means 10, 8 and 6 respectively and variances 4, 5 and 3 respectively. Find:

7

a

E(W + X + Y)

b

Var(W + X + Y)

c

E(2W − X − Y)

d

Var(2W − X − Y).

The random variable X has the probability distribution shown in the table. x

1

2

3

4

P(X = x) Find the mean and variance of the distribution of the sum of three independent observations of X. PS

8

A piece of laminated plywood consists of three pieces of wood of type A and two pieces of type B. The thickness of A has mean 2 mm and variance 0.04 mm. The thickness of B has mean 1 mm and variance 0.01 mm2. Find the mean and variance of the thickness of the laminated plywood.

PS

9

My journey to work is made up of four stages: a walk to the bus stop, a wait for the bus, a bus journey and a walk at the other end. The times taken for these four stages are independent random variables U, V, W and X with expected values (in minutes) of 4.7, 5.6, 21.6 and 3.7 respectively and standard deviations of 1.1, 1.2, 3.1 and 0.8 respectively. What is the expected time and standard deviation for the total journey?

PS

10 The length, L (in cm), of the rectangular boxes produced by a machine is a random variable with mean 26 and variance 4 and the width, B (in cm), is a random variable with mean 14 and variance 1. The variables L and B are independent. What are the expected value and variance of: a

the perimeter of the boxes,

b

the difference between the length and the width?

3.3 Working with normal distributions WORKED EXAMPLE 3.3 The masses, in grams, of large and small packets of peanuts are denoted by X and Y respectively, where X ~ N(150, 62) and Y ~ N(80, 38). Find the probability that the mass of two randomly chosen small packets of peanuts is more than the mass of a large packet of peanuts. Answer E(Y1 + Y2 − X) = 80 + 80 − 150 = 10

Find the mean and variance.

Var(Y1 + Y2 − X) = 38 + 38 + 62 = 138 If Y1 + Y2 > X then Y1 + Y2 − X > 0.

EXERCISE 3C 1

If X ~ N(12, 16) and Y ~ (8, 25) find: a b c d

PS

2

i

P(X − Y > − 2)

ii

P(X + Y < 24)

i

P(3X + 2Y > 50)

ii

P(2X − 3Y > − 2)

i

P(X > 2Y)

ii

P(2X < 3Y)

i

P(X > 2Y − 2)

ii

P(3X + 1 < 5Y)

An airline has found that the mass of their passengers follows a normal distribution with mean 82.2 kg and variance 10.7 kg2. The mass of their hand luggage follows a normal distribution with mean 9.1kg and variance 5.6 kg2. a

State the distribution of the total mass of a passenger and their hand luggage and find any necessary parameters.

b

What is the probability that the total mass of a passenger and their hand luggage exceeds 100 kg?

PS

3

The heights of a population of male students are distributed normally with mean 178 cm and standard deviation 5 cm. The heights of a population of female students are distributed normally with mean 168 cm and standard deviation 4 cm. Find the probability that a randomly chosen female is taller than a randomly chosen male.

PS

4

W is the mass of lemonade in a fully filled bottle, B is the mass of the bottle and C is the mass of the crate into which 12 filled bottles are placed for transportation. All masses are in grams. It is given that W ~ N(825, 152), B ~ N(400, 102) and C ~ N(1500, 202). Find the probability that a fully filled crate weighs less than 16.1kg.

PS

5

Evidence suggests that the times Aaron takes to run 100 m are normally distributed with mean 13.1 seconds and standard deviation 0.4 seconds. The times Bashir takes to run 100 m are normally distributed with mean 12.8 seconds and standard deviation 0.6 seconds.

PS

6

a

Find the mean and standard deviation of the difference (Aaron − Bashir) between Aaron’s and Bashir’s times.

b

Find the probability that Aaron finishes a 100 m race before Bashir.

c

What is the probability that Bashir beats Aaron by more than one second?

The times of four athletes for the 400 m race are each distributed normally with mean

47 seconds and standard deviation 2 seconds. The four athletes are to compete in a 4 × 400 m relay race. Find the probability that their total time is less than three minutes. PS

7

The diameters of a consignment of bolts are distributed normally with mean 1.05 cm and standard deviation 0.1cm. The diameters of the holes in a consignment of nuts are distributed normally with mean 1.1cm and standard deviation 0.1cm. Find the probability that a randomly chosen bolt will not fit inside a randomly chosen nut.

PS

8

The capacities of small bottles of perfume are distributed normally with mean 50 ml and standard deviation 3 ml. The capacities of large bottles of the same perfume are distributed normally with mean 80 ml and standard deviation 5 ml. Find the probability that the total capacity of three small bottles is greater than the total capacity of two large bottles.

9

The mass of an empty lift cage is 210 kg. If the masses (in kg) of adults are distributed as N(70, 950), what is the probability that the mass of the lift cage containing 10 adults chosen at random exceeds 1000 kg?

3.4 Linear combinations of Poisson distributions WORKED EXAMPLE 3.4 The number of goals scored per match by a hockey team during a season follows a Poisson distribution with mean 2.7. The number of goals scored per match by another team follows a Poisson distribution with mean 2.5. Work out the probability that at least four goals are scored in any one match between the two teams. Answer Let the random variable G be the total number of goals scored.

Define the variable.

G ~ Po(5.2)

2.7 + 2.5 = 5.2 Remember, the probability of ‘greater than‘ is the same as the probability ‘1 − less than or equal to‘.

EXERCISE 3D 1

a

You are given that X ~ Po(3) and Y ~ Po(2). Find the mean and variance of: i

X + Y

ii

X − Y

iii 3X + 2. b 2

3

Which of i, ii and iii has a Poisson distribution?

The independent random variables X and Y are such that X ~ Po(2.5) and Y ~ Po(1.1). a

State the distribution of W, where W = X + Y.

b

Find i

P(W < 6)

ii

P(2 ⩽ W ⩽ 3).

The independent random variables A, B and C are such that A ~ Po(1), B ~ Po(2) and C ~ Po(3). a

State the distribution of V, where V = A + B + C.

b

Work out: i

P(V = 3)

ii

P(V ⩾ 3).

PS

4

The number of vehicles travelling on a particular road towards a town centre has a Poisson distribution with mean 6 per minute. The number of vehicles travelling away from the town centre on the same road at the same time of day has a Poisson distribution with mean 3 per minute. Find the probability that the total number of vehicles seen passing a given point in a one-minute period is less than 6.

PS

5

The number of goals scored per match by a football team during a season has a Poisson distribution with mean 1.5. The number of goals conceded per match by the same team during the same season has a Poisson distribution with mean 1. Find the probability that a match involving the team produced more than three goals.

PS

6

The numbers of emissions per minute from two radioactive sources are modelled by the independent random variables X and Y, which have Poisson distributions with means 5 and 8 respectively. Calculate the probability that, in any minute, the total number of emissons from the two sources is less than 6.

PS

7

The random variable R is the number of robins who visit a bird table each hour. The random variable T is the number of thrushes who visit a bird table each hour. These are the only types of birds to visit the table. It is believed that R ~ Po(1.5) and T ~ Po(2.0). B is the random variable ‘Number of birds visiting the table each hour’. a

Assuming birds arrive singly and at random, write down the distribution of B.

b

Find the probability that no birds visit the table in one hour.

c

Find P(1 < B ⩽ 6).

PS

8

The number of cars arriving at a car park in a five-minute interval follows a Poisson distribution with mean 7, and the number of motorbikes follows a Poisson distribution with mean 2. Find the probability that exactly ten vehicles arrive at the car park in a particular five-minute interval.

M

9

Hywel receives an average of 4.2 emails and 3.1 texts each hour. These are the only types of messages he receives. a

Assuming that the emails and texts each form an independent Poisson distribution, find the probability that he receives more than 4 messages in an hour.

b

Explain why the assumption that the emails and texts form independent Poisson distributions is unlikely to be true.

END-OF-CHAPTER REVIEW EXERCISE 3 PS

1

The mean mass of a man in an office is 85 kg with standard deviation 12 kg. The mean mass of a woman in the office is 68 kg with standard deviation 8 kg. The empty lift has a mass of 500 kg. What is the expectation and standard deviation of the total mass of the lift when three women and four men are inside?

PS

2

The amount of black coffee dispensed by a drinks machine is distributed normally with mean 200 ml and standard deviation 5 ml. If a customer requires white coffee, milk is also dispensed. The amount of milk is distributed normally with mean 20 ml and standard deviation 2 ml. Find the probability that the total amount of liquid dispensed when a customer chooses white coffee is less than 210 ml.

3

Given that X ~ N(μ, 10), Y ~ N(12, σ2) and 3X – 4Y ~ N(0, 234), find μ and σ2.

4

X is a random variable with mean μ and variance σ2. Y is a random variable with mean m and variance s2. Find, in terms of μ, σ, m and s: a

E(X – 2Y)

b

Var(X – 2Y)

c

Var(4X)

c

Var(X1 + X2 + X3 + X4) where Xi is the i th observation of X.

M

5

If X is the random variable ‘mass of a gerbil‘, explain the difference between 2X and X1 + X2.

PS

6

The masses of workers in a factory are known to be normally distributed with mean 80 kg and standard deviation 6 kg. There is an elevator with a maximum recommended load of 600 kg. With seven workers in the elevator, calculate the probability that their combined mass exceeds the maximum recommended load.

PS

7

Davina makes bracelets by threading purple and yellow beads onto a string. Each bracelet consists of seven randomly selected purple beads and four randomly selected yellow beads. The lengths of the beads are normally distributed with standard deviation 0.4 cm. The mean length of a purple bead is 1.5 cm and the mean length of a yellow bead is 2.1 cm. Find the probability that the length of the bracelet is less than 18 cm.

PS

8

The masses of the parents at a school are normally distributed with mean 78 kg and variance 30 kg2, and the masses of the students are normally distributed with mean 33 kg and variance 62 kg2. Let the random variable P represent the combined mass of two randomly chosen parents and the random variable S represent the combined mass of four randomly chosen students.

PS

9

a

Find the mean and variance of S – P.

b

Find the probability that four students weigh more than two parents.

The marks students scored in a Maths test follow a normal distribution with mean 63 and variance 64. The marks of the same group of students in an English test follow a normal distribution with mean 61 and variance 71. Find the probability that a randomly chosen student scored a higher mark in English than in Maths.

PS

10 Jamal rents a phone for a fixed charge of \$8 per month with calls charged at \$0.20 per minute. Selina rents her phone for a fixed charge of \$20 with calls charged at \$0.10 per minute. The number of minutes that Jamal uses his phone in a randomly chosen month is denoted by J and the number of minutes that Selina uses her phone in a randomly chosen month is denoted by S. It is given that J ~ N(120, 49) and S ~ N(130, 25), and that J and S are independent. a

Find the distribution of the amount spent by Jamal on his phone in a randomly chosen month.

b

Find the distribution of the amount spent by Selina on her phone in a randomly chosen month.

c

Find the probability that in a randomly chosen month Jamal pays more for his phone than Selina.

Chapter 4 Continuous random variables ■ ■ ■ ■ ■

Understand the concept of a continuous random variable. Recall and use the properties of a probability density function. Calculate the mean and variance of a continuous distribution. Find the median and other percentiles of a distribution. Solve problems involving probabilities.

4.1 Introduction to continuous random variables WORKED EXAMPLE 4.1 The continuous random variable X has probability density function:

a

Find the value of the constant k.

b

Find P(–0.3 ⩽ x ⩽ 0.6).

c

Use integration to show that the value of a, where Explain how a sketch of f(x) would verify a = 0.

, is 0.

For this to be a probability density function, the area under the curve = 1.

b

Work out the definite integral.

c

Use a as the lower limit of the integral.

A sketch would show that the graph is symmetrical about x = 0.

Your explanation should have sufficient information.

EXERCISE 4A 1

2

A continuous random variable X has the following probability density function:

a

Find the value of the constant c.

b

Find P(X ⩾ 6).

c

Find P(4 ⩽ X ⩽ 6).

A continuous random variable X has the following probability density function:

3

a

Find the value of the constant k.

b

Find P(X ⩽ 2).

c

Find P(1.5 ⩽ X ⩽ 2.5).

d

Given that the probability that X is less than h is 0.2, find the value of h, correct to two decimal places.

The sketch shows the probability density function:

PS

4

a

Use the information given in the sketch and the properties of probability density functions to find the values of a and k.

b

Find

.

The life, X, of the StayBrite light bulb is modelled by the probability density function:

where X is measured in thousands of hours.

PS

PS

PS

5

6

7

a

Find k.

b

Find the probability that a StayBrite bulb lasts longer than 1000 hours.

c

Find the probability that a StayBrite bulb lasts less than 500 hours.

A printer ink cartridge has a life of X hours. The variable X is modelled by the probability density function:

a

Find k.

b

Find the probability that such a cartridge has a life of at least 500 hours.

c

Find the probability that a cartridge will have to be replaced before 600 hours of use.

d

Find the probability that two cartridges will have to be replaced before each has been used for 600 hours.

An internet surfer suggests that the time, t minutes, that he spends on the internet can be modelled by the probability density function:

a

Verify that this is a properly defined probability density function.

b

Find the probability that the surfer spends less than four minutes on the internet.

c

Find the probability that the surfer spends more than ten minutes on the internet.

A model predicts that the angle, G, by which an alpha particle is deflected by a nucleus is modelled by the probability density function:

a

Find the value of the constant k.

b 8

10 000 alpha particles are fired at a nucleus. Assuming that the model is correct, estimate the number of alpha particles deflected by less than .

A random variable Y has the following probability density function:

Find the exact value of P(Y > 2). 9

Given

, find

.

10 D is the random variable ‘distance a seed is found from a tree’. The pdf of D, f(d), is proportional to The minimum distance that a seed is from the tree is 0.5 m. Find the probability of a seed beeing found more than 1 m from the tree. PS

11 It is proposed to model the annual salary, X, measured in thousands of \$, paid to sales persons in a large company by the probability density function:

a

Find the value of c.

b

Find the probability that a person in this company chosen at random earns between \$20 000 and \$30 000 per year.

4.2 Finding the median and other percentiles of a continuous random variable WORKED EXAMPLE 4.2 The continuous random variable X has probability density function:

a

Find the median value of X.

b

Find the value of a such that P(X < a) = 40%.

Let m be the median.

The median value occurs when the probability is 50% or

m3 = 13.5   m = 2.38 Substitute and find a valid solution within the range. The working is similar to finding the median, only this time the probability is 0.4.

b

EXERCISE 4B 1

2

A continuous random variable X has the following probability density function:

a

Sketch the graph of f(x).

b

Find the median value of X.

A continuous random variable X has the following probability density function:

a

Sketch the graph of f(x).

b

Find the median value of X.

c

Find the interquartile range of X.

3

The continuous random variable X has the following probability density function:

4

Given that the continuous random variable X has probability density function:

Find the exact value of the median of X.

Find the interquartile range of X.

5

6

If a

Find b in terms of k if P(b < X < b2) = k.

b

Find a in terms of k if P(2 − a < X ⩽ 2 + a) = k.

The continuous random variable X has the following probability density function:

Find the value of a such that P(X < a) = 20%. 7

The continuous random variable X has the following probability density function:

Find the value of b such that P(X > b) = 30%. 8

The continuous random variable X has the following probability density function:

Find the values of c and d such that P(c < X < d) represents the middle 80% of X. PS

9

Two models are proposed for a garage’s weekly sales, X, of petrol measured in units of 100 000 litres. The first is The second is a

Find the median for the first model.

b

Verify that the median for the second model is the same as the median for the first model.

4.3 Finding the expectation and variance WORKED EXAMPLE 4.3 The continuous random variable X has the following probability density function:

Find the mean and variance of X. Answer Mean or expectation =

where a

and b are the range of values over which f(x) is defined.

Variance =

− mean2

EXERCISE 4C 1

The mass, X kg, of silicon produced in a manufacturing process is modelled by the probability density function:

Find the mean and variance of the mass of silicon produced. 2

The EverOn battery has a life of X hours. The variable X is modelled by the probability density function:

Find the mean and variance of the lives of EverOn batteries. 3

4

A printer ink cartridge has a life of X pages. The variable X is modelled by the probability density function:

a

Show that k = 720.

b

Find the mean and variance of the lives of these cartridges.

The life, X, of the EverBrite light bulb is modelled by the probability density function:

where X is measured in thousands of hours. Find the mean and variance of the lives of EverBrite light bulbs. 5

The radioactivity of krypton decays according to the probability model:

6

7

8

a

Show that λ = k.

b

Find the mean and variance of X in terms of k.

The length, in metres, of offcuts of wood found in a timber yard can be modelled by a continuous uniform distribution with density function:

a

Write down the value of k.

b

State the mean length.

c

Calculate the variance of the length.

A continuous random variable B has the following probability density function:

a

Find the value of the constant a.

b

Find E(B).

A continuous random variable X has pdf

Find the mean of X. 9

Given that X is a continuous random variable with probability density function:

a

find the value of k

b

find the expectation of X

c

find the variance of X.

10 For the continuous random variable X with probability density function defined by:

Find: a

the mean

b

the variance

c

P(μ − σ < X < μ + σ).

END-OF-CHAPTER REVIEW EXERCISE 4 1

P

A printer ink cartridge has a life of X pages. The variable X is modelled by the probability density function:

a

Find the median lifetime of these cartridges.

b

Find the value of b such that P(X ⩽ b) = 0.6.

2

Given that E(X) = 3 find k if:

3

Y is a continuous random variable with probability density function:

4

a

Show that

b

Given that Var(Y) = 5, find the exact value of k.

.

The continuous random variable T has probability density function given by f(t) = 16t3 + 6t for 0 < t < k. Prove that there is only one possible value of k, and state its value.

PS

5

The continuous random variable X has a probability density function f(x) where:

a

Show that k = 1.

b

What is the probability that the random variable X has a value that lies between and ? Give your answer in terms of e.

c

Find the mean and variance of the distribution. Give your answers in terms of e. The random variable X above represents the lifetime, in years, of a certain type of battery.

d

Find the probability that a battery lasts more than six months. A calculator is fitted with three of these batteries. Each battery fails independently of the other two.

e

6

Find the probability that at the end of six months: i

none of the batteries has failed

ii

exactly one of the batteries has failed.

The random variable X has probability density function:

a

Find the median value of X. The above distribution, with a = 0.8, is proposed as a model for the length of life, in years, of a species of bird.

b

7

Find the expected number out of a total of 50 birds that would fall in the class interval 2–3 years.

Given that a

find b in terms of k if P(b < X < b2) = k

b

find a in terms of k if P(2 − a < X ⩽ 2 + a) = k.

8

Given that X is a continuous random variable with probability density function:

and PS

9

, find the value of the constants a and b.

When a measurement is quoted to the nearest kilogram it is equally likely to be anywhere within 0.5 kg of the stated value. A large number of measurements of different objects, all of which round to 42 kg, are made and their accurate values are noted. Find the probability that an object quoted as being 42 kg to the nearest kg is actually more than 0.3 kg away from 42 kg.

Chapter 5 Sampling ■ Understand the distinction between a sample and a population. ■ Understand how to use random numbers and appreciate the necessity for choosing random samples.

■ Explain why a sampling method may be unsatisfactory. ■ Recognise that a sample mean can be regarded as a random variable, and use the facts that and that

.

■ Use the fact that has a normal distribution if X has a normal distribution. ■ Use the Central Limit Theorem where appropriate.

5.1 Introduction to sampling WORKED EXAMPLE 5.1 Mia wishes to choose a random sample of six students from her school year. She considers choosing the first six students she sees at school one day. a

Explain why this method is biased. There are 122 students in Mia’s year at school. Mia numbers them 001 to 122 and uses the random number generator on her calculator to choose her sample. The random number generator produces random numbers between 0 and 1, given to 3 decimal places, for example 0.058.

b

Give one reason why Mia might need to generate more than six random numbers to choose her sample.

If the six students are together they could be friends, and friends are likely to agree with each other.

Give a sufficiently clear explanation.

b

There could be a duplicate number in the first six numbers generated.

Another possible reason is that the number generated may be greater than 0.122.

EXERCISE 5A 1

M

Comment on possible sources of bias in the following samples. a

Determining the attitude towards university tuition fees by asking final-year high school students.

b

Finding out about people’s perception of the cost of food by asking people in supermarkets to estimate the cost of a carton of milk.

c

Measuring the average height of people in a county using a sample taken from a school.

d

Predicting the outcome of the 1948 US presidential election by a telephone survey of US citizens.

2

Larissa wants to find out how many students at her college travel by train. She decides to conduct a survey on a simple random sample of 70 students. Describe how she could obtain such a sample.

3

Mateo needs to take a simple random sample of five students at his college to take part in a psychology experiment. He obtains the list of all 847 students and uses a random number function on his calculator to generate random numbers. The calculator produces three-digit random numbers from 001 to 999. It produces the following numbers: 016, 762, 938, 537, 016, 722, 886, 152, 721 Suggest which five students Mateo should select for his sample.

4

A shop has 40 staff in each of 10 branches in different parts of the country. The owner wants to find out about staff wellbeing by interviewing a sample of 20 staff. The following methods of choosing the sample are suggested. A

Pick four branches at random and then interview five randomly chosen people at each branch.

B Use a random number generator to pick 20 staff from all staff. C

Ask the manager of each branch to select two staff members to send for interview.

D Use a 20-sided die to randomly select two members from each branch to send for interview. a

Which suggestion is likely to give the least biased sample?

b

Why might suggestion A be used instead of suggestion B?

5

A polling firm wants to investigate the voting intentions of a London borough. They have access to details of all registered voters in the borough, so they make a numbered list and use a random number generator to select 100 participants. They send a questionnaire to each person selected and study the responses. Explain why this will not necessarily produce a simple random sample.

6

An ecologist wants to study the proportion of adult fish in the North Sea. She believes that 40% of fish in the North Sea are cod, 40% are haddock and 20% are of other varieties. She catches fish until she has 20 cod, 20 haddock and 10 of other varieties. Explain why a random sampling method is not feasible in this situation.

M

7

Describe briefly how to use random numbers to choose a sample of 120 students from a university with 1246 students.

8

An athletics club has 90 members. Hasina wishes to take a random sample of 10 members of the club. He uses this table of random numbers, starting at 49 on the top row, to select his sample. 49 05 74 64 00 26 07 49 74 37 50 94 13 90 20 26 22 66 98 37 53 48 87 77 66 91 42 98 08 72 87 33 58 12 08 Hasina has an alphabetical list of members and numbers them 01 to 90. Write down the 10 members in his sample.

M

9

A clothing company carries out a survey to find out the average height and the range of heights of 17-year-old girls in the UK. A sample of 200 girls from a large sixth form college has a mean height of 158 cm and a range of 28 cm. a

Is the true population mean more likely to be larger or smaller than 158 cm?

b

State one possible reason why taking a sample from a single college may not result in a good estimate for the mean height.

10 Comment on possible sources of bias in each of these samples: a

The basketball team as a sample of students at a college used to estimate the average height of all students.

b

Dineth is investigating attitudes to healthy eating for his Biology project. He decides to interview ten students while waiting in a queue at an ice cream shop.

c

A sample taken from those people in a waiting room at a doctor’s surgery for a survey to find out how many days people in the country have had off sick this year.

11 A student wants to take a sample of students from his college. He has a list of all students, numbered 1 to 478. He uses a random number generator on his calculator, which can generate three-digit random numbers between 001 and 999, inclusive. The first ten numbers he obtains are: 237, 155, 623, 078, 523, 078, 003, 381, 554, 263 Suggest which could be the first four students in his sample. 12 Explain why the following samples have not been found using simple random sampling. a

A sample is formed by taking a telephone book and calling the person at the top of each page.

b

A market researcher is required to sample 100 men and 100 women in a supermarket to find out how much they are spending on that day.

5.2 The distribution of sample means WORKED EXAMPLE 5.2 The mass of a randomly chosen 11-year-old student at a large school may be modelled by a normal distribution with mean 35 kg and standard deviation 2.4 kg. Five students are chosen from this group. a

Calculate the probability that the mean mass of the five students is greater than 32 kg.

b

How large does n have to be for there to be at most a 5% chance that the mean mass of the sample differs from the mean mass of the population by more than 1kg?

A second sample of size n is chosen from the same group.

State the distribution. Use the normal distribution tables to find the probability.

b

State the distribution.

The difference from the mean is 1.

Use the standardised normal and rearrange to solve for n. Use normal tables and solve for n.

EXERCISE 5B 1

The random variable Y has mean 200 and standard deviation 25. A sample of size n is found. Find, where possible: a

b

i

if n = 100

ii

if n = 200

i

if n = 2

ii

if n = 3.

2

A random variable X has mean 12 and standard deviation 3.5. A sample of 40 independent observations of X is taken. Use the Central Limit Theorem to calculate the probability that the mean of the sample is between 13 and 14.

3

The mass of a breed of dog is known to follow a normal distribution with a mean of 10 kg and a standard deviation of 2.5 kg. A random sample of four dogs is taken. What is the probability that their mean mass is more than 9 kg?

4

The volume of apple juice in a carton follows a normal distribution with a mean of

152 ml and a standard deviation of 4 ml. A quality control process rejects a batch if a random sample of 16 cartons has a mean of less than 150 ml. Find the probability that a batch gets rejected. 5

PS

The mass of a student in a group is Xkg, where X follows the N(70, 9) distribution. In a sample of four observations find the probability that: a

the total mass of all the students is less than 300 kg

b

the heaviest student has a mass less than 75 kg.

6

The number of tickets sold each day at a city railway station has mean 512 and variance 1600. For a randomly chosen period of 60 days, find the probability that the total number of tickets sold is less than 30 000.

7

Random samples of three are drawn from a population of beetles whose lengths have a normal distribution with mean 2.4 cm and standard deviation 0.36 cm. The mean length is calculated for each sample.

8

9

a

State the distribution of

b

Find

c

State which of the numerical values above, if any, depend on the Central Limit Theorem.

, giving the values of its parameters.

.

The masses of kilogram bags of flour produced in a factory have a normal distribution with mean 1.005 kg and standard deviation 0.0082 kg. A shelf in a store is loaded with 22 of these bags, assumed to be a random sample. a

Find the probability that a randomly chosen bag has mass less than 1kg.

b

Find the probability that the mean mass of the 22 bags is less than 1kg.

c

State, giving a reason, which of the above answers would be little changed if the distribution of masses were not normal.

X is the energy (in eV) of beta particles emitted from a radioactive isotope. It is known that X~N(40, 25). is the mean energy of 100 beta particles. a

Stating one necessary assumption, write down the distribution of parameters.

b

Find

along with its

10 Esme eats an average of 1900 kcal each day with a standard deviation of 400 kcal. What is the probability that in a random sample of 31 days she eats more than 2000kcal per day on average? M

11 The average weight of a sheet of A4 paper is 5 g, and the standard deviation of the weights is 0.08 g. a

Find the mean and standard deviation of the weight of a ream of 500 sheets of A4 paper.

a

Find the probability that the weight of a ream of 500 sheets is within 5 g of the expected weight.

c

Explain how you have used the Central Limit Theorem in your answer.

END-OF-CHAPTER REVIEW EXERCISE 5 PS

1

2

PS

3

4

5

6

A goods lift can carry up to 5000 kg and is to be loaded with crates whose masses are normally distributed with mean 79.2 kg and standard deviation 5.5 kg. Show that: a

it is highly likely that the lift can take 62 randomly selected crates without being overloaded

b

65 randomly selected crates would almost certainly overload the lift.

The mean of a random sample of 500 observations of the random variable X, where X ~ N(25, 18), is denoted by . a

Find the value of a for which

b

State, giving a reason, if your answer depends on the Central Limit Theorem.

The time, T hours, taken to repair a piece of equipment has a probability density function which can be modelled by:

a

Find E(T) and Var(T).

b

denotes the mean of 30 randomly chosen repairs. Assuming that the Central Limit Theorem holds, estimate .

c

State, giving a reason, whether your answer has little error or considerable error.

A machine is set to produce ball-bearings with mean diameter 1.2 cm. Each day a random sample of 50 ball-bearings is selected and the diameters accurately measured. If the sample mean diameter lies outside the range 1.18 cm to 1.22 cm then it will be taken as evidence that the mean diameter of the ball-bearings produced is not 1.2 cm.The machine will then be stopped and adjustments made to it. Assuming that the diameters have standard deviation 0.075 cm, find the probability that: a

the machine is stopped unnecessarily

b

the machine is not stopped when the mean diameter of the ball-bearings is 1.15 cm.

The number of night calls to a fire station serving a small town can be modelled by a Poisson distribution with mean 2.7 calls per night. a

State the expectation and variance of the mean number of night calls over a period of n nights.

b

Estimate the probability that during a given year of 365 days the total number of night calls will exceed 1050.

The random variable X has a B(40, 0.3) distribution.The mean of a random sample of n observations of X is denoted by . Find: a b

7

M

.

when n = 49 the smallest value of n for which

.

A random sample of 2n observations is taken of the random variable X ~ B(n, p) and p > 0.5. The sample mean is denoted by . It is given that and . a

Find the values of p and n.

b

Find

.

8

The lifetime, X hours, of a light bulb is modelled by N(10 000, σ2). 5% of samples of 100 light bulbs have a mean lifetime of less than 9 900. Find the value of σ.

9

The life of Powerlong batteries, sold in packs of six, may be assumed to have a normal distribution with mean 32 hours and standard deviation σ hours. Find the value of σ so that for one pack in 100 (on average) the mean life of the batteries is less than 30 hours.

10 The times Markus takes to answer a multiple choice question are normally distributed with mean 1.5 min and standard deviation 0.6 min. He has one hour to complete a test consisting of 35 questions. a

Assuming the questions are independent, find the probability that Markus does not complete the test in time.

b

Explain why you did not need to use the Central Limit Theorem in your answer to part a.

11 A random variable has mean 15 and standard deviation 4. A large number of independent observations of the random variable is taken. Find the minimum sample size so that the probability that the sample mean is more than 16 is less than 0.05.

Chapter 6 Estimation ■ Calculate unbiased estimates of the population mean and variance from a sample. ■ Formulate hypotheses and carry out a hypothesis test concerning the population mean in

cases where the population is normally distributed with known variance or where a large sample is used. ■ Determine and interpret a confidence interval for a population mean in cases where the population is normally distributed with known variance or where a large sample is used. ■ Determine, from a large sample, an approximate confidence interval for a population proportion.

6.1 Unbiased estimates of population mean and variance WORKED EXAMPLE 6.1 A student is doing a project on the playing time of DVDs for comedy films. She found the playing time, x minutes, for a random sample of 10 such DVDs. The times were: 96, 102, 124, 98, 107, 88, 90, 116, 108, 111 Calculate unbiased estimates for the population mean and variance for these data. Answer Unbiased estimate of the mean . An unbiased estimate of population variance is . Alternative ways of writing the formula are

and .

EXERCISE 6A 1

Calculate unbiased estimates for the population mean and variance based on the following data. a b c

2

i

1, 5, 8, 10

ii

12, 15, 28, 34, 60

i

−2, 0, 1, 1, 6

ii

−3, −1, 0, 5, 10

i

Σx = −9, Σx2 = 135, n = 5

ii

Σx = 25, Σx2 = 439, n = 5

In order to find out the mean and standard deviation of masses of a particular breed of cat, Ben measures a random sample of 20 cats. The results are summarised as follows: Σw = 51. Σw2 = 138.32 a

Find the mean weight, and show that the unbiased estimate of the variance is 0.435 to three significant figures.

b

Ben hopes to obtain a more accurate estimate for the mean by taking a sample of size 100. The mean of this sample is 2.63. Is this necessarily a better estimate of the population mean than the one found in part a? Explain your answer.

3

A sample is found to have values 0, 1, 1, 6, 10. Find unbiased estimates for the mean and variance of the population based on this sample.

4

A random sample of ten people working for a certain company with 4000 employees are asked, at the end of a day, how much they had spent on lunch that day. The results, in \$, are as follows: 1.98 1.84 1.75 1.94 1.56 1.88 1.05 2.10 1.85 2.35 Calculate unbiased estimates of the mean and variance of the amounts spent on lunch that day by all workers employed by the company.

5

The diameters of 20 randomly chosen plastic doorknobs of a certain make were measured. The results, xcm, are summarised by Σx = 102.3 and Σx2 = 523.54. Find an unbiased estimate of the variance of the diameters of all knobs produced.

6

The number of vehicle accidents occurring each day along a long stretch of a particular road was monitored for a period of 100 randomly chosen days. The results are

summarised in the following table. No. accidents

0

1

2

3

4

5

6

No. days

8

12

27

35

13

4

1

Find unbiased estimates of the mean and variance of the daily number of accidents. 7

A random sample of 150 pebbles was collected from a beach. The masses of the pebbles, correct to the nearest gram, are summarised in the following grouped frequency table. Mass (g) Frequency

10−19 20−29 30−39 40−49 50−59 60−69 70−79 80−89 1

4

22

40

49

28

4

2

Find, to three decimal places, unbiased estimates of the mean and variance of the masses of all the pebbles on the beach. 8

Unbiased estimates of the mean and variance of a population, based on a random sample of 24 observations, are 5.5 and 2.42 respectively. Another random observation of 8.0 is obtained. Find new unbiased estimates of the mean and variance with this new information.

9

Thirty oranges are chosen at random from a large box of oranges. Their masses, x grams, are summarised by Σx = 3033 and Σx2 = 306 676. Find, to four significant figures, unbiased estimates for the mean and variance of the mass of an orange in the box.

6.2 Hypothesis tests of the population mean WORKED EXAMPLE 6.2 A particular wax candle has a mean burn time of six hours. The standard deviation of burn time is 1.4 hours. It is claimed the burn time of the wax candle is longer if it is kept in the freezer to harden before use. a

Mia tests the claim with a random sample of 18 wax candles. She finds that the average burn time of her sample of candles is 6.48 hours. Assuming the standard deviation of the burn time of the candles in the sample is the same as in the population, test the claim at the 5% level of significance.

b

Tabor decides to test the claim using a larger sample of 60 wax candles. For a burn time, x hours, his results are Σx = 408.9 and Σx2 = 3787.5. Using Tabor’s results, and without making any assumptions about the variance of the sample, test the claim at the 5% level of significance.

Let

be the mean burn time.

H0 : μ = 6, H1 : μ > 6 One-tailed test at 5% level of significance

Remember, in this case

State the hypotheses. It is a one-tailed test as we are looking for an increase in burn time. Calculate the test statistic using and compare to significance level 5%. Comment on the result in the context of the question.

0.0728 > 0.05 Accept H0. There is insufficient evidence to support the claim that the burn time is longer for wax candles kept in the freezer. We need to find unbiased estimates of mean and variance.

b

H0 : μ = 6, H1 : μ > 6 One-tailed test at 5% level of significance, critical value 1.645

The hypotheses are the same as for part a. Stating the critical value allows us to show the working differently. This time we find the z value and compare with the critical value.

1.533 < 1.646 Accept H0. There is insufficient evidence to support the claim that burn time is longer if wax candles are kept in the freezer.

We still need to write down the conclusion and comment in the context of the question.

EXERCISE 6B PS

1

The average height of 18-year-olds in England is 168.8 cm. Caroline believes that the students in her school are taller than average. To test her belief she measures the heights of 30 students in her class. a

State the hypotheses for Caroline’s test. The students in Caroline’s class have average height of 171.4 cm and a standard deviation of 12 cm.

PS

PS

PS

2

3

4

b

State an assumption required to conduct a normal hypothesis test on this data.

c

Test Caroline’s belief at the 5% level of significance.

d

All students in a large school are given a typing test and it was found that the times taken to type one page of text have a mean 10.3 minutes and standard deviation 3.7 minutes. The students are given a month-long typing course and then a random sample of 40 students were asked to take the typing test again. The mean time was 9.2 minutes. a

Test at the 10% significance level whether there is evidence that the time the students take to type a page of text has decreased.

b

The national mean score in a Mathematics exam is 4.73 with a standard deviation of 1.21. In a particular school the mean score of 50 students is 4.81. a

State two assumptions that are needed to perform a hypothesis test to see if the mean is better in this school than in the background population.

b

Assuming that these assumptions are met, test at the 5% significance level whether the school is producing better results than the national mean.

A doctor has a large number of patients starting a new diet in order to lose weight. Before the diet the weights of the patients were normally distributed with mean 82.4 kg and standard deviation 7.9 kg. The doctor assumes that the diet does not change the standard deviation of the weights. After the patients have been on the diet for six months, the doctor takes a sample of 40 patients and finds their mean weight. a

The doctor believes that the average weight of the patients has decreased following the diet. He wishes to test his belief at the 5% level of significance. Find the critical region for this test.

b

c

The average weight of the 40 patients after the diet was 78.4 kg. State the conclusion of the test.

PS

5

The blood pressure of a group of hospital patients with a certain type of heart disease has mean 85.6. A random sample of 25 of these patients volunteered to be treated with a new drug and a week later their mean blood pressure was found to be 70.4. Assuming a normal distribution with standard deviation 15.5 for blood pressures, and using a 1% significance level, test whether the mean blood pressure for all patients treated with the new drug is less than 85.6.

PS

6

Two-litre bottles of a brand of spring water are advertised as containing 6.8 mg of magnesium. In a random sample of ten of these bottles the mean amount of magnesium was found to be 6.92 mg. Assuming that the amounts of magnesium are normally distributed with standard deviation 0.18 mg, test whether the mean amount of magnesium in all similar bottles differs significantly from 6.8 mg. Use a 5% significance level.

PS

7

Cans of lemonade are filled by a machine which is set to dispense an amount that is normally distributed with mean 330 ml and standard deviation 2.4 ml. A quality control manager suspects that the machine is over-dispensing and tests a random sample of eight cans. The volumes of the contents, in ml, are as follows: 329 327 331 326 334 343 328 339 Test, at the 2.5% significance level, whether the manager’s suspicion is justified.

PS

8

A machine set to produce metal discs of diameter 11.90 cm has an annual maintenance check. After the check a random sample of 36 discs is measured and found to have a mean diameter of 11.930 cm and an estimated population standard deviation of 0.072 cm2. Test, at the 1% significance level, whether the machine is now producing discs of mean diameter greater than 11.90 cm.

PS

9

In Lanzarote the mean daily number of hours of sunshine during April is reported to be 5 hours. During a particular year, in April, the daily amounts of sunshine, x hours, were recorded and the results are summarised by Σx = 162.3 and Σx2 = 950.6. Test, at the 5% significance level, whether these results indicate that the reported mean of 5 hours is too low.

6.3 Confidence intervals for population mean WORKED EXAMPLE 6.3 A manufacturer uses a particular brand of battery in the satellite navigation system they produce. A random sample of batteries is taken and the length of time, x hours, the battery lasts in the satellite navigation system is recorded. a

For a random sample of 18 batteries, the average time they last is 6.48 hours. Assuming the standard deviation of the time the batteries last in the sample is 1.4 hours, find a 95% confidence interval.

b

For a larger sample of 60 batteries, the times they last in the satellite navigation system are summarised as Σx = 408.9 Σx2 = 3787.5. Without making any assumptions about the variance of the sample, find a 95% confidence interval.

Answer For a 95% confidence interval use

a The confidence interval is (5.83, 7.13).

.

For other % confidence intervals use the normal tables to find the multiplier for . This time, first find unbiased estimates for the mean and variance. We use

b

to find an unbiased estimate for the variance. The sample is sufficiently large so we use . The confidence interval is (5.77, 7.86).

Note that the confidence interval will be approximate as the population standard deviation is unknown.

EXERCISE 6C 1

The blood oxygen level (measured in %) of an individual is known to be normally distributed with a standard deviation of 3%. Based upon six readings, Niamh finds that her blood oxygen levels have a mean of 88.2%. Find a 95% confidence interval for Niamh’s true blood oxygen level.

2

The birth weight of male babies in a hospital is known to be normally distributed with variance 2 kg2. Find a 90% confidence interval for the mean birth weight if a random sample of ten male babies have an mean weight of 3.8 kg.

3

A data set is summarised below: Find a 90% confidence interval for the mean.

4

A sample of 50 people in a town have a mean wage of \$24 506 with an unbiased variance of \$144 million. Find a 95% confidence interval for the mean wage in the town.

5

The volume of milk in litre cartons filled by a machine has a normal distribution with

mean μ litres and standard deviation 0.05 litres. A random sample of 25 cartons was selected and the contents, x litres, measured. The results are summarised by Σx = 25.11. Calculate:

PS

PS

6

7

a

symmetric 98% confidence interval for μ

b

the width of a symmetric 90% confidence interval for μ based on the volume of milk in a random sample of 50 cartons.

In a particular country the heights of fully-grown males may be modelled by a normal distribution with mean 178 cm and standard deviation 7.5 cm. The 11 male biology students present at a university seminar had a mean height of 175.2 cm. a

Assuming a standard deviation of 7.5 cm, and stating any further assumptions, calculate a symmetric 99% confidence interval for the mean height of all fully-grown male biology students.

b

Does the confidence interval suggest that fully-grown male biology students have a different mean height from 178 cm?

A machine is designed to produce metal rods of length 5 cm. In fact, the lengths are distributed normally with mean 5.00 cm and standard deviation 0.032 cm. The machine is moved to a new site and, in order to check whether or not the mean length has altered, the lengths of a random sample of eight rods are measured. The results, in cm, are as follows: 5.07 4.95 4.98 5.06 5.13 5.05 4.98 5.06

PS

PS

PS

PS

PS

8

9

a

Assuming that the standard deviation is unchanged, calculate a symmetric 95% confidence interval for the mean length of the rods produced by the machine in its new position.

b

State, giving a reason, whether you consider that the mean length has changed.

The pulse rates of 90 eight-year-old children chosen randomly from different schools in a city gave a sample mean of beats per minute and an unbiased estimate of 2 population variance of s = 106.09. a

Calculate a symmetric 98% confidence interval for the mean pulse rate of all eightyear-old children in the city.

b

If, in fact, the 90 children were chosen randomly from those attending a hospital clinic, comment on how this information affects the interpretation of the confidence interval found in part a.

Rolls of sticky tape of a particular brand are claimed by the manufacturer to give, on average, at least 60 m of tape. After receiving some complaints, the manufacturer’s quality control manager obtains a random sample of 64 rolls and measures the length of the tape, tm, on each. The results are summarised by Σt = 3953.28 and Σt2 = 244 557.00. a

Calculate a symmetric 99% confidence interval for the population mean length of tape for this brand.

b

Does the confidence interval support the customers’ complaints? Give a reason for your answer.

10 When a scientist measures the concentration of a solution, the measurement obtained may be assumed to be a normally distributed random variable with standard deviation 0.2. a

The scientist makes 18 independent measurements of the concentration of a particular solution and correctly calculates the confidence interval for the true value as [43.908, 44.092]. Determine the confidence level of this interval.

b

The scientist is given a different solution and asked to determine a 90% confidence interval for its concentration. The confidence interval is required to have width less than 0.05. Find the minimum number of measurements required.

11 The masses of bananas are investigated. The mass of a random sample of 100 of these bananas was measured and the mean was found to be 168 g with an unbiased variance of 200 g2. a

Find a 95% confidence interval for μ.

b

State, with a reason, whether or not your answer requires the assumption that the weights are normally distributed.

12 The pH of a river is believed to be normally distributed with a standard deviation of 0.1. What is the smallest number of samples that should be taken to get a 90% confidence interval with a width of less than 0.05?

6.4 Confidence intervals for population proportion WORKED EXAMPLE 6.4 It was reported that harlequin ladybirds were commonly found in gardens. Gemma conducted a survey in her garden and, from a survey of 120 ladybirds, 48 were found to be harlequin ladybirds. a

Calculate a 90% confidence interval for the proportion of harlequin ladybirds.

b

Estimate the random sample size required to give a 99% confidence interval of the proportion of harlequin ladybirds with a width of 0.3.

Answer First find the sample proportion.

a

Calculate The confidence interval is (0.326, 0.474). b

.

Use k = 1.645 for a 90% confidence interval. Interval width is given by

.

Use k = 2.58 for a 99% confidence interval.

EXERCISE 6D 1

2

M

PS

For the following sample sizes, n, and sample proportions, p, calculate 95% symmetric confidence intervals: a

n = 60, p = 0.1

b

n = 14, p = 0.35.

For a sample size 80 and sample proportion p = 0.2, calculate r% confidence intervals for: a

r = 90%

b

r = 99%.

3

In a study of computer usage a random sample of 200 private households in a particular town was selected and the number that own at least one computer was found to be 68. Calculate a symmetric 90% confidence interval for the percentage of households in the town that own at least one computer.

4

Of 500 cars passing under a bridge on a busy road 92 were found to be red.

5

a

Find a symmetric 98% confidence interval of the population proportion of red cars.

b

State any assumption required for the validity of the interval.

c

Describe a suitable population to which the interval applies.

d

If this experiment is carried out 50 times, what is the expected number of confidence intervals that would contain the population proportion of red cars?

A biased die was thrown 600 times and 224 sixes were obtained. a

Calculate a symmetric 99% confidence interval of p, the probability of obtaining a six in a single throw of the die.

b

Estimate the smallest number of times the die should be thrown for the width of the symmetric 99% confidence interval of p to be at most 0.08.

c M

6

Give two reasons why the previous answer is only an estimate.

Two machines, machine 1 and machine 2, produce shirt buttons. Random samples of 80 buttons are selected from the output of each machine and each button is inspected for faults. The sample from machine 1 contained 8 faulty buttons and the sample from machine 2 contained 19 faulty buttons. a

Calculate symmetric 90% confidence intervals for the proportions, p1 and p2, of faulty buttons produced by each machine.

b

State, giving a reason, what the confidence intervals indicate about the relative sizes of p1 and p2.

M

7

An opinion poll is to be carried out to estimate the proportion of the electorate of a country who will vote ‘yes’ in a forthcoming referendum. In a trial run a random sample of 100 people were questioned; 42 said they would vote ‘yes’. Estimate the random sample size required to give a 99% confidence interval of the proportion with a width of 0.02.

M

8

A random sample of 200 fish was collected from a pond containing a large number of fish. Each fish was marked and returned to the pond. A day later, 400 fish were collected and 22 were found to be marked. These were also returned to the pond. a

Obtain a symmetric 90% confidence interval for the proportion of marked fish in the pond.

b

Assuming that the number of marked fish is still 200, what can be said about the size of the population of fish in the pond?

END-OF-CHAPTER REVIEW EXERCISE 6 M

PS

PS

PS

PS

PS

1

2

3

4

5

6

The time taken for a full kettle to boil is known to follow a normal distribution with mean 40 seconds and standard deviation 5 seconds. After cleaning, the kettle is boiled ten times to test if the time taken to boil has decreased. a

Stating two necessary assumptions, find the critical region for this hypothesis test at the 5% significance level.

b

The mean time is found to be 37 seconds. State the outcome of the hypothesis test.

c

Why should there be a long delay between the ten observations for this test to be valid?

The distance an athlete jumps in a long jump is known to be normally distributed with mean 5.84 m and standard deviation 0.31 m. After a change of technique the athlete looks at the mean of n jumps, using a 5% significance level, to see if the mean distance has changed. a

Write down appropriate null and alternative hypotheses for this test.

b

If the jumps are still normally distributed with standard deviation 0.31 m, find the acceptance region in terms of n.

c

If the mean is found to be 5.6 m, find the smallest value of n that would result in the null hypothesis being rejected.

In a large typesetting company the time taken for a typesetter to type one page is normally distributed with mean 7.80 minutes and standard deviation 1.22 minutes. A new training scheme is introduced and, after all the typesetters have completed the training, the times taken by a random sample of 35 typesetters are recorded. The mean time for the sample is 7.24 minutes. You may assume that the population standard deviation is unchanged. a

Test, at the 1% significance level, whether the mean time to type a page has decreased.

b

It is required to redesign the test so that the probability of incorrectly rejecting the null hypothesis is less than 0.01 when the sample mean is 7.50. Find the smallest sample size needed.

A study from the 1990s showed that 15-year-olds spent an average of 1.8 hours each week playing computer games. A study of time spent playing computer games in 2017 is summarised by the following data, where t is the time spent by each 15-year-old in hours: a

Find the unbiased variance of the data, giving your answer to three significant figures.

b

Conduct a hypothesis test at the 5% significance level to see if the mean time spent playing computer games has increased.

c

State one other fact you need to know about the sample before you would be confident in your conclusion.

The school canteen sells coffee in cups claiming to contain 250 ml. It is known that the amount of coffee in a cup is normally distributed with standard deviation 6 ml. Adam believes that on average the cups contain less coffee than claimed. He wishes to test his belief at the 5% significance level. a

Adam measures the amount of coffee in 36 randomly chosen cups and finds the average to be 248.8 ml. Show that he cannot conclude that the average amount of coffee in a cup is less than 250 ml.

b

Adam decides to collect a larger sample. He finds the average to be 248.8 ml again, but this time this is sufficient evidence to conclude that the average amount of coffee in a cup is less than 250 ml. What is the minimum sample size he must have used?

a A set of 40 data items produces a confidence interval for the mean of (94.93, 105.07). If Σx2 = 424 375, find the confidence level, giving your answer to two significant figures. b

A student wants to test the following hypotheses: H0 : μ = 94, H1 : μ > 94

Use the given confidence interval to conduct a hypothesis test, stating the significance level. PS

PS

PS

7

8

9

From experience, you know that the variance in the increase between marks in a beginning-of-year test and an end-of-year test is 64. A random sample of four students in class A was selected and the results in the two tests were recorded.

Alma

Brenda

Claron

Dominique

Beginning of year

98

62

88

82

End of year

124

92

120

116

a

Assuming that the difference can be modelled by a normal distribution with variance 64, find a 90% confidence interval for the mean increase.

b

How could the width of the confidence interval be decreased?

c

Do these data provide evidence at the 5% significance level that class A is doing better than the school average of a 20-mark increase?

A data set is summarised below: a

Calculate the unbiased estimate of the variance of this data.

b

Find a 95% confidence interval for the mean.

c

Conduct a two-tailed test at 5% significance to determine if there is a change from μ = 5.

d

The depth of water in a lake was measured at 50 randomly chosen points on a particular day. The depths, d metres, are summarised by Σd = 366.45 and Σd2 = 2978.16. a

Calculate an unbiased estimate of the variance of the depth of the lake.

b

Calculate a symmetric 99% confidence interval for the mean depth of the lake. The three months after the depths were obtained were very hot and dry and the water level in the lake dropped by 1.35 m.

c

Based on readings taken at the same 50 points as before, which were all greater than 1.34 m, what will be a symmetric 99% confidence interval for the new mean depth?

PS

10 The lives of a certain make of battery have a normal distribution with mean 30 h and variance 2.54 h2. When a large consignment of these batteries is delivered to a store the quality control manager tests the lives of eight randomly chosen batteries. The mean life was 28.8 h. Test whether there is cause for complaint. Use a 3% significance level.

PS

11 The masses of loaves from a certain bakery have a normal distribution with mean μ grams and standard deviation σ grams. When the baking procedure is under control, μ = 508 and σ = 18. A random sample of 25 loaves from a day’s output had a total mass of 12 554 grams.Does this provide evidence at the 10% significance level that the process is not under control?

PS

12 The Galia melons produced by a fruit grower under usual conditions have a mean mass of 0.584 kg. The fruit grower decides to produce a crop organically and a random sample of 75 melons, ready for market, had masses, x kg, summarised by Σx = 45.39 and Σx2 = 29.03. Test, at the 10% significance level, whether melons grown organically are heavier, on average, than those grown under the usual conditions.

THE STANDARD NORMAL DISTRIBUTION FUNCTION If Z is normally distributed with mean 0 and variance 1, the table gives the value of Φ(z) for each value of z, where

Use Φ(−z) = 1 − Φ(z) for negative values of z.

Critical values for the normal distribution The table gives the value of z such that P(Z ⩽ z) = p, where Z ~ N(0, 1) p z

0.75

0.90

0.95

0.975

0.99

0.995 0.9975 0.999 0.9995

0.674 1.282 1.645 1.960 2.326 2.576

2.807

3.090

3.291

1 Hypothesis testing Exercise 1A 1

a

b

2

a b

i

H0 : p = 0.6, H1 : p > 0.6; p is the proportion of children who like football

ii

H0: p = , H1: p > ; p is the proportion of households wih a pet

i

H0: p = 0.06, H1: p < 0.06; p is the proportion of faulty components produced

ii

H0: p = 0.05, H1 < 0.05; p is the proportion of children who eat 5 or more pieces of fruit per day

i

p-value 19.3% > 5%, insufficient evidence

ii

p-value 91.4% > 10%, insufficient evidence

i

p-value 31.8% > 10%, insufficient evidence

ii

p-value 18.2% > 3%, insufficient evidence

3

p = 6.51% < 10% The result is significant at the 10% level. This indicates that the reported figure is too high for young people at the student’s college.

4

a

H0 : p = , H1 : p > (or μ = 2.5 and μ > 2.5); p is the proportion of nails with length greater than 2.5cm;

b

; P(X ⩾ 13) = 0.0106 < 0.025, so accept H1, p > and μ > 2.5.

The symmetry of the normal distribution about its mean is used in the statement H0 : p = .

5

Insufficient evidence of a probability less than

6

Insufficient evidence of an increase

7

X ⩽ 10

8

a

H0 : p = 0.85, H1 : p > 0.85 where p is the proportion of patients the drug cures

b

X ⩾ 162

9

Insufficient evidence that the new treatment is better

10 H0 : p = 0.132, H1 : p > 0.132; X ~ B(95, 0.132) ≈ N(12.54, 10.88472); P(X ⩾ 20) = 1 − Φ (2.110); since 2.110 > 2.054 (or since 0.0174 < 0.02), reject H0 and accept that the drop-out rate is greater for science students. 11 Yes

Exercise 1B 1

2 3

a

H0 : p = 0.5 H1 : p ≠ 0.5; p is the probability that the coin shows heads

b

H0 : p = 0.26 H1 : p ≠ 0.26; p is the proportion of entries in AS Level Psychology graded ‘A’

a

p-value 17.9% > 4%, insufficient evidence

b

p-value 41.1% > 2.5%, insufficient evidence

a

H0 : p = 0.7, H1 : p ≠ 0.7; X ~ B(12, 0.7); P(X = 12) = 0.0138 < 0.05, so reject H0 and accept that the true figure is not 70%.

b

Marie’s friends do not comprise arandom sample, so the test is unreliable.

4

There is evidence, at the 10% significance level, that the proportion of voters who support Party Z has changed.

5

Insufficient evidence that the proportion with a younger sibling at the same school is different

6

Insufficient evidence that the proportion able to run 100 metres in under 12 seconds has changed

7

H0 : p = 0.3, H1 : p ≠ 0.3; X ~ B(80,0.3) ≈ N(24,16.8); P(X ⩽ 19) = Φ(−1.098), since −1.098 > −1.645 (or since 0.136 > 0.05), accept H0, 30% of the beads are red.

8

H0 : p = and H1 : p ≠ P(X ⩾ 9) = 7.30% > 5% null hypothesis is not rejected: there is insufficient evidence, at the 10% level, to say that the coin is biased.

9

H0 : p = and H1 : p ≠ ; X ~

10 a

No

b

No

c

Yes

d

No

≈ N(140,100). Accept H0

Exercise 1C 1

2

3 4

5 6

7

8

a

H0 : p = 0.95, H1 : p < 0.95

b

0.0755

c

0.206

a

X ⩾ 22

b

0.0442

c

0.0547

a

0.0421

b

0.917

a

H0 : p = , H1 : p <

b

10.3%

c

0.103

d

0.589

a

0.0635

b

0.859

a

H0 : p = 0.1, H1 : p < 0.1

b

P(X = 0) = 0.0798 < 0.10, so reject H0 and accept that the modified toaster is more reliable.

c

0.0798

a

X ⩽ 51, X ⩾ 69

b

0.0828

c

0.0094

a

If operatives do not avoid ending a recorded weight with 0 then the probability that a value chosen at random ends in 0 is 0.1; if they do avoid 0 then this probability will be less than 0.1, so take H0: p = 0.1 and H1: p < 0.1.

b

Under H0, X ∼ B(40, 0.1).

As 0.080 47… < 0.1, H0 is rejected if X = 1.     As 0.222 80… > 0.1, H0 is not rejected if X = 2. c

The rejection region is X ⩽ 1.

d

P(Type I error) = P(X ⩽ 1 │ p = 0.1). From part b, P(Type I error) = P( 0.080 47… = 0.0805, correct to 3 significant figures.

⩽ 1 │ p = 0.1) =

End-of-chapter review exercise 1 1

There is evidence that a 6 has probability significantly less than

2

14

3

It was greater than 4.45%

4

a

Acceptance region

b

0.0600

5

H0 : p = 0.75, H1 : p > 0.75; X ~ B(150, 0.75) ≈ N(112.5, 28.125); P(X ⩾ 124) = 1 . Φ(2.074); since 2.074 > 1.96 (or since 0.0190 < 0.025), reject H0 and accept that more than 75% get relief.

6

H0 : p = 0.07, H1 : p > 0.07; X ~ B(125, 0.07) ≈ N(8.75, 8.1375); P(X ⩾ 14) = 1 − Φ(1.665); since 1.665 < 1.881 (or since 0.048 > 0.03), accept H0 and retain the batch.

7

2

8

a

X ⩾ 34

b

0.0906

c

There is insufficient evidence that the proportion of students who play in a school sports team is greater than 40%.

9

X ⩾ 49

10 a

H0 : p = 0.035 H1 : p > 0.035 (where p is the proportion of faulty parts)

b

The test statistic X = 4 is not in the critical region, so there is not sufficient evidence to reject H0. There is not sufficient evidence that the proportion of faulty parts has increased.

c

0.177

2 The Poisson distribution Exercise 2A 1

1

2

3

4

>4

0.311 0.264 0.150 0.0636 0.0296 2

a b c d

3

4

5

6

7 8 9

i

0.180

ii

0.271

i

0.946

ii

0.592

i

0.933

ii

0.981

i

0.215

ii

0.525

a

0.224

b

0.199

c

0.577

a

0.222

b

0.370

c

0.835

a

0.779

b

0.692

c

0.209

a

0.616

b

0.351

c

0.825

a

0.0537

b

0.321

a

Mean = 2.57, sd = 1.84

b

Mean ≈ variance

0.475

10 a

0.00674

b

0.0337

c

0.0842

d

0.875

11 a

The Poisson distribution should be a good model for this situation as the appropriate conditions should be met: since the traffic is flowing freely the cars should pass independently and at random; it is not possible for cars to pass simultaneously; the average rate of traffic flow is likely to be constant to pass simultaneously; the average rate of traffic flow is likely to be constant over the time interval given.

b

The Poisson distribution is unlikely to be a good model: if it is a busy day the cars will be queuing for the car park and so they will not be moving independently.

c

The Poisson distribution should be a good model provided that the time period over which the measurements are made is much longer than the lifetime of the source: this will ensure that the average rate at which the particles are emitted is constant. Radioactive particles are emitted independently and at random and, for practical purposes, they can be considered to be emitted singly.

d

The Poisson distribution should be a good model provided that the following conditions are met: all the buns are prepared from the same mixture so that the average number of raisins per bun is constant; the mixture is well stirred so that the raisins are distributed at random; the raisins

do not stick to each other or touch each other so that they are positioned independently. e

The Poisson distribution will not be a good model because the blood cells will have tended to sink towards the bottom of the solution. Thus the average number of blood cells per ml will be greater at the bottom than the top.

f

If the solution has been well shaken the Poisson distribution will be a suitable model. The blood cells will be distributed at random and at a constant average rate. Since the solution is dilute the blood cells will not be touching and so will be positioned independently.

Exercise 2B 1

a b c

i

Po(36)

ii

Po(9)

i

Po(2.4)

ii

Po(120)

i

Po(3.6)

ii

Po(72)

2

0.0116

3

a

0.298

b

0.973

a

2.5

b

0.0821

a

0.0821

b

0.516

a

0.268

b

0.191

a

0.00268

b

0.406

a

0.00674

b

49.9s

4 5 6 7 8 9

0.394

Exercise 2C 1

a b

i

0.222

ii

0.221

i

0.0966

ii

0.0965

2

0.677

3

0.151

4

a

0.879

b

0.00775

a

0.108

b

0.0273

5 6

0.0190

7

a

0.953

b

0.434

a

0.904

b

0.0119

c

0.320

a

n large, but p not small enough to make np < 5

b

p small, but n not large enough to use Poisson approximation

8

9

Exercise 2D 1

2

a

0.608

b

0.178

c

0.212

a

0.265

b

0.0498

3

0.209

4

0.0441

5

a

0.689

b

0.0446

a

0.759

b

0.0599

6 7

62

8

123

Exercise 2E 1

X ~ Po(2); P(X ⩾ 4) = 0.143 > 0.05, so accept H0 that λ = 2.

2

a

H0 : λ = 4, H1 : λ < 4, where μ is mean number of accidents per week. X ~ Po(8); P(X ⩽ 3) = 0.0424 < 0.05, so accept that the mean has reduced.

b

X ~ Po(20) including all 5 weeks. Approximate by N(20, 20). P(X ⩽ 14) = 1 − Φ(−1.230); since −1.23 > −1.645 (or since 0.1093 > 0.05) the conclusion changes.

3

H0 : λ = 1.4, H1 : λ ≠ 1.4;X ~ Po(1.4); P(X ⩾ 4) = 0.0537 > 0.05, so accept H0, secretary is probably responsible.

4

H0 : λ = 3245, H1 : λ > 3245; X ~ Po(3245) ≈ N(3245, 3245); P(X ⩾ 3455) = 1 − Φ(3.678). Since 3.678 > 1.645, accept that the mean has increased.

5

a

Po(3); calls occur randomly, at a uniform rate over 2 days.

b

H0 : λ = 1.5, H1 : λ > 1.5, where μ is the mean number of calls per week. P(X ⩾ 5) = 0.1847 > 0.05 so accept H0, the daily average has not increased.

a

n > 50 and np = 2.4 < 5, so the Poisson approximation to binomial distribution is applicable.

b

H0 : μ = 2.4, H1 : μ > 2.4; X ~ Po(2.4); P(X ⩾ 6) = 0.0357 < 0.05, reject H0 and accept that mean has increased.

a

n = 60 > 50 and np = 4.2 < 5, so Poisson approximation to binomial distribution applies.

b

H0 : λ = 4.2 or p = 0.07, H1 : λ < 4.2 or p < 0.07; X ~ Po(4.2); P(X ⩽ 1) = 0.0780 > 0.05, so accept H0, the proportion is 7%.

6

7

8

H0 : λ = 1.5 and H1 : λ < 1.5; P(X ⩽ 1) = 1.74% < 5% reject H0, there is evidence that the new photocopier is more reliable than the old one.

9

a

0.109

b

0.206

End-of-chapter review exercise 2 1 2 3

4

a

2.30

b

0.375

a

0.189

b

0.212

a

0.135

b

0.271

c

0.238

a

Computers must be lost randomly and independently throughout the period at a uniform rate.

b

H0 : λ = 18, H1 : λ < 18; 9

5

6

7

8

9

a

Flaws must occur randomly at a uniform rate per metre length.

b

H0 : λ = 1.8, H1 : λ < 1.8; X ~ Po(5.4); P(X ⩽ 2) = 0.0948 > 0.05, so accept H0, the rate has not decreased.

c

X ~ Po(20.7) ≈ N(20.7, 20.7); P(X ⩽ 8) = 1 − Φ(−2.681). Since −2.681 < −1.645 (or since 0.0037 < 0.05) accept H1, the conclusion changes.

a

X ~ Po(1.9)

b

H0 : μ = 1.9, H1 : μ < 1.9; P(X = 0) = 0.1496 > 0.05 so accept H0, there is no decrease.

c

H0 : p = 0.1496, H1 : p > 0.1496; X ~ B(50, 0.1496) ≈ N(7.48, 6.361); P(X ⩾ 13) = 1 − Φ(1.990); since 1.990 > 1.645, reject H0 and accept that the numbers have decreased.

a

Sample size n is greater than 50 and np = 3 < 5.

b

One-tail since it would not be concerned if the rate were smaller than 5%.

c

Proof

d

i

Type II

ii

Type I

e

0.703

a

X ⩽ 12, X ⩾ 32

b

4.40%

c

The mean has not changed.

d

0.928

a

0.809

b

0.979

3 Linear combinations of random variables Exercise 3A 1 2 3

4 5

a

E(X) = 3.8 Var(X) = 2.66

b

E(3X + 1) = 12.4 Var(3X + 1) = 23.94

a

12

b

18

a

1.3, 0.61

b

5.9, 5.49

c

Proof

a

−4, 5

b

89, 80

a

i

13

ii

2.4

b

6 7

a

6.2

b

10.08

a

12

b 8

Mean 15, variance

Exercise 3B 1

a b c

i

−5, 6

ii

3, 6

i

5, 34

ii

−18, 72

i

−2.4, 1.52

ii 2 3

a

12

b

24

a

3.5

, 2

b c

3

d

9

e

6.5

f 4

a

x

0

1

2

P(X = x) 0.25 0.5 0.25 y

0

1

2

P(Y = y) 0.36 0.48 0.16 x + y

0

1

2

3

4

P(X + Y = x + y) 0.09 0.3 0.37 0.2 0.04

b

i

1

ii

0.5

iii 0.8 iv 0.48 v

1.8

vi 0.98 c

Proof

5

3.1, 3.69

6

a

24

b

12

c

6

d

24

7

,

8

8mm, 0.14mm2

9

mean 35.6 min, sd 3.59

10 a b

mean 80cm, var 20cm2 mean 12cm, var 5cm2

Exercise 3C 1

a b c d

2

i

0.826

ii

0.734

i

0.551

ii

0.547

i

0.355

ii

0.5

i

0.426

ii

0.543

a

N(91.3, 16.3)

b

0.0156

3

0.0592

4

0.0637

5

a

0.3s, 0.721s

b

0.339

c

0.166

6

0.0228

7

0.362

8

0.127

9

0.178

Exercise 3D 1

a

i

5, 5

ii

1, 5

iii 11, 27 2

b

i

a

W~ Po(3.6)

b

i

0.844

ii

0.390

is Poisson

3

a

V~ Po(6)

b

i

0.0892

ii

0.938

4

0.116

5

0.242

6

0.0107

7

a

B ~ Po(3.5)

b

0.0302

c

0.799

8

0.119

9

a

0.853

b

The rate of arrival of messages is unlikely to be constant − there will probably be more at some times of the day than others. Within each distribution messages are not likely to be independent as they may occur as part of a conversation. The two distributions are also probably not independent of each other, as times when more emails arrive might be similar to times when more texts might arrive.

End-of-chapter review exercise 3 1

1044 kg, 27.7 kg

2

0.0317

3

μ =16, σ2 =9

4

a

μ −2m

b

σ2 +4s2

c

16σ2

d

4σ2

5

2X is twice the weight of a single gerbil, X1 + X2 is the sum of the weights of two different gerbils.

6

0.00587

7

0.249

8

a

−24 kg, 308 kg2

b

0.0857

9

0.432

10 a

N(32, 1.96)

b

N(33, 0.25)

c

0.250

4 Continuous random variables Exercise 4A 1

a b c

2

a b

3

c

0.454

d

1.75

a

3,

b 4

5

a

2

b

0.135

c

0.632

a

400

b c d 6

7

a

Proof

b

0.330

c

0.368

a

0.0968

b

370

8

e−6

9

0.560

10 0.5 11 a b

2.5 0.000356

Exercise 4B 1

a

b 2

5 −

a

b

2+

c

(

− 1)

3 4

0.695

5

a

b = ek

b

a =

6

2.39

7

4.49

8

c = 2.68 d = 5.73

9

a

or 0.707

b

and substitute M21 = , to verify that

.

Exercise 4C 1

2 kg, kg2

2

15 hours, 75 hours2

3

a

Proof

b

584 pages, 19100 pages2

4

500 hours, 250 hours2

5

a

Proof

b

,

a

0.6

b

0.5 m

c

0.03 m2

a

a =

b

E(B) =

6

7

8 9

a b c

10 a

1

b

0.2

c

0.626

End-of-chapter review exercise 4 1

a

800 pages

b

1000

2 3

a

Proof

b 4

k = 0.5

5

a

Proof

b

6

7

c

E(X) = 0.5e − 1, Var(X) =

d

0.290

e

i

0.0243

ii

0.179

a b

6 (to the nearest integer)

a

b = ek

b 8

a = −

9

0.4

, b =

5 Sampling Exercise 5A 1

a

Final-year students may be more likely to progress to university and so be more worried about fees than people who are not affected.

b

People in supermarkets are more likely to be knowledgeable about food prices.

c

School children will be shorter than average.

d

This is a real-life example. Mostly richer people had telephones, so the poll predicted the wrong winner.

2

Obtain a numbered list of all the students; use a random number generator to generate 70 different random numbers; select students with those numbers.

3

16, 762, 537, 722, 152

4

a

D

b

Less time-consuming – no need to travel as far; cheaper

5

Not everybody will return the questionnaire so not all possible samples are equally likely.

6

This would require a complete list of all fish in the North Sea and the ability to measure any fish required.

7

Number each student; generate four- digit random numbers on the calculator or from random number tables, ignore repeats and numbers above 1246 until 120 chosen.

8

49 05 74 64 26 07 37 50 13 90 Or 49 20 48 08 05 74 26 87 72 37

9

a

Sample mean equally likely to be larger or smaller than true population mean

b

May not be typical, e.g. high density of a particular ethnic group which is typically taller or shorter than general population

a

Basketball players are on average taller than the general population.

b

Students who eat ice cream may not be interested in healthy eating.

c

People at a doctor’s surgery are likely to have poorer health than the general population.

9

11 237 155 78 3 12 a b

Not all possible samples equally likely, e.g. person at bottom of page not going to be selected. The researcher would have to know in advance who was going to be shopping on that day to create a random sample, and this is not feasible.

Exercise 5B 1

a b

i

0.212

ii

0.129

i

Cannot say

ii

Cannot say

2

0.0352

3

0.788

4

0.0228

5

a

0.9996

b

0.822

6

0.010

7

a

N(2.4, 0.0432)

b

0.315

c

Neither of them depends on the Central Limit Theorem.

a

0.271

b

0.0021

c

Part b unchanged, under CLT, sample is normally distributed even if underlying population is not normally distributed.

8

9

a

N(40, 0.25) assuming independence of emissions

b

0.955

10 0.0820 11 a

2500g, 1.79g

b

0.995

c

You could use normal distribution in part b

End-of-chapter review exercise 5 1 2 3

4 5 6 7

a

Yes, because probability = 0.981

b

1 − Φ (−3.338) ≈ 1

a

24.9

b

No since X has a normal distribution.

a

,

b

0.0289

c

Distribution of T is very skewed so answer not very accurate

a

0.0594

b

0.0023

a

2.7,

b

0.0192

a

0.114

b

81

a

p = 0.8, n = 80

b

0.0385

8

608

9

2.11 hours

10 a b 11 44

0.0173 The mean of normally distributed variables is normally distributed

6 Estimation Exercise 6A 1

a b c

2

i

Mean: 6, variance: 15.3

ii

Mean: 29.8, variance: 367.2

i

Mean: 1.2, variance: 8.7

ii

Mean: 2.2, variance: 27.7

i

Mean: −1.8, variance: 29.7

ii

Mean: 5, variance: 78.5

a

2.55

b

It is likely to be better, but with random fluctuations it is possible it is worse.

3

Mean: 3.6, variance: 18.3

4

\$1.83, 0.119\$2

5

0.0145 cm2

6

2.49, 1.61

7

50.633 g, 147.365 g2

8

Mean: 5.6, variance: 2.57

9

Mean: 101.1 g, variance: 1.369 g2

Exercise 6B 1

2 3

4

a

H0 : μ = 168.8, H1 : μ > 168.8

b

That the sample standard deviation can be used as the population standard deviation for all 18year-olds.

c

p-value = 0.118, accept H0, there is insufficient evidence to support Caroline’s belief.

d

Yes, she used it when calculating probabilities from a normal distribution.

a

p-value = 0.0300, reject H0: the average time has decreased.

b

Assume that the standard deviation is unchanged.

a

The scores of pupils in the school follow a normal distribution. The standard deviation is still 1.21.

b

p = 0.320, do not reject H0: results at this school are not better than the national mean.

a

< 80.3

b

The weights follow a normal distribution.

c

Reject H0: the diet is working.

5

H0 : μ = 85.6, H1 : μ < 85.6;

6

H0 : μ = 6.8, H1 : μ ≠ 6.8; 6.8.

7

H0 : μ = 330, H1 : μ > 330; z = 2.504 > 1.96; accept H1, manager’s suspicion is correct.

8

H0 : μ = 11.90, H1 : μ > 11.90; z = 2.5 > 2.326; reject H0 and accept that the mean is greater than 11.90 cm.

9

H0 : μ = 5 , H1 : μ < 5 temperatures are lower.

Exercise 6C 1

(85.8, 90.6)

2

(3.06, 4.54)

3

(−0.358, 1.96)

⩽ 78.39; reject H0 and accept that the mean is less than 85.6. ⩽ 6.69,

⩾ 6.91; reject H0, and accept that the mean is greater than

; s = 1.582, z = 0.554. Do not reject H0, insufficient evidence

4

(21 200, 27 800)

5

a

[0.981, 1.03]

b

0.0233 litres

a

[169.4, 181.0]

b

No, 178 is within the interval. It is assumed that the biology students at the seminar are a random sample of all fully-grown male biology students.

a

[5.013, 5.057]

b

5.00 cm is outside the interval, indicating that the mean has increased.

a

[84.1, 89.1]

b

Interval is only valid for mean of 8-year-olds attending the clinic.

a

[61.0, 62.6]

b

No, both limits greater than 60.

6

7 8 9

10 a b 11 a b

95% 174 (165, 171) No, the large sample means that you can use the Central Limit Theorem.

12 44

Exercise 6D 1 2

a

[0.0241, 0.176]

b

[0.100, 0.600]

a

[0.126, 0.274]

b

[0.0846, 0.315]

3

[28.5, 39.5]

4

a

[0.144, 0.224]

b

Sample is random.

c

For example, all cars passing under the bridge over a specified period

d

49

a

[0.322, 0.424]

b

n = 971; Variance is an estimate and normal approximation is used.

a

[0.045, 0.155], [0.159, 0.316]

b

Indicates p1 < p2 since intervals do not intersect, with interval for p1 lower than interval for p2.

5

6 7

16200

8

a

[0.0362, 0.0738]

b

Using the CI, population is from 2710 to 5520. Using the proportion, population is 3640.

End-of-chapter review exercise 6 1

2

a

Still normally distributed. Still standard deviation 5 seconds. Critical region x < 37.4.

b

Reject H0. Some evidence to suggest the time taken has decreased.

c

They would not be independent; the water needs to cool down to the original temperature.

a

H0 : μ = 5.84; H1 : μ ≠ 5.84

b

3 4

c

7

a

Reject H0, sufficent evidence that the mean time has decreased (z = −2.72, p = 0.0033)

b

90

a

8.46

b

p-value = 0.0659, do not reject H0

5 6 7

8

9

c

The sampling method. The conclusion is only valid if the sample is representative of all 15-yearolds.

a

p-value = 0.115

b

68

a

80%

b

Reject H0 at the 10% significance level.

a

(23.9, 37.1)

b

Increase the sample size

c

Yes

a

2

b

(4.41, 5.19)

c

Do not reject H0

d

Yes, because we do not know whether the data set is Normally distributed.

a

5.97 m2

b

[6.44, 8.22]

c

[5.09, 6.87] It would be 1.35 m below the previous depth, and thus the previous confidence interval.

10 H0 : μ = 30, H1 : μ < 30;

⩽ 28.94; reject H0, there is cause for complaint.

11 H0 : μ = 508, H1 : μ ≠ 508; z = −1.622. This lies between −1.645 and 1.645, so accept H0 and the process is under control. 12 H0 : μ = 0.584, H1 : μ > 0.584; X = 0.6052, s = 0.1452, z = 1.264 < 1.282, so accept H0, melons grown organically are not heavier on average.

University Printing House, Cambridge CB2 8BS, United Kingdom One Liberty Plaza, 20th Floor, New York, NY 10006, USA 477 Williamstown Road, Port Melbourne, VIC 3207, Australia 314–321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre, New Delhi – 110025, India 79 Anson Road, #06–04/06, Singapore 079906 Cambridge University Press is part of the University of Cambridge. It furthers the University’s mission by disseminating knowledge in the pursuit of education, learning and research at the highest international levels of excellence. www.cambridge.org Information on this title: www.cambridge.org/9781108444927 © Cambridge University Press 2018 This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 2018 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 A catalogue record for this publication is available from the British Library ISBN 978-1-108-44492-7 Paperback Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third- party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate. Information regarding prices, travel timetables, and other factual information given in this work is correct at the time of first printing but Cambridge University Press does not guarantee the accuracy of such information thereafter. The questions, example answers, marks awarded and/or comments that appear in this book were written by the author(s). In examination, the way marks would be awarded to answers like these may be different. This Practice Book has been compiled and authored by Jayne Kranat, using questions from: Cambridge International AS and A Level Mathematics: Statistics 2 Coursebook (Revised edition) by Steve Dobbs, Jane Miller and Julian Gilbey, that was originally published in 2016. A Level Mathematics for OCR A Student Book 2 (Year 2) by Vesna Kadelburg, Ben Woolley, Paul Fannon and Stephen Ward A Level Further Mathematics for OCR A Statistics Student Book (AS/A Level) by Vesna Kadelburg, Ben Woolley, Paul Fannon and Stephen Ward Cover image: Pinghung Chen/EyeEm/Getty Images NOTICE TO TEACHERS IN THE UK

It is illegal to reproduce any part of this work in material form (including photocopying and electronic storage) except under the following circumstances: (i) where you are abiding by a licence granted to your school or institution by the Copyright Licensing Agency; (ii) where no such licence exists, or where you wish to exceed the terms of a licence, and you have gained the written permission of Cambridge University Press; (iii) where you are allowed to reproduce without permission under the provisions of Chapter 3 of the Copyright, Designs and Patents Act 1988, which covers, for example, the reproduction of short passages within certain types of educational anthology and reproduction for the purposes of setting examination questions.