Mathematics for Sustainability

Mathematical models for sustainability calculations, theory and exercises.

259 112 51MB

English Pages 249 Year 2020

Report DMCA / Copyright

DOWNLOAD FILE

Mathematics for Sustainability

Table of contents :
Algebra and Population Growth
The Malthusian Growth Model
The Logistic Growth Model
Exercises
Set Theory and Renewable Energy
Sets
Set Operations
Venn Diagrams
Exercises
Probability and Organic Agriculture
Probability Preliminaries
General Probability Theory
Random Variables
Exercises
Statistics and Climate Change
Descriptive Statistics
Linear Regression and Correlation
The Normal Distribution
The Distribution of a Sample Mean
Confidence Intervals
Hypothesis Testing
Exercises
Network Theory and Green Transportation
Graph Theory Preliminaries
Eulerian Circuits
Graph Connectivity
Exercises
Geometry, Trigonometry, and Natural Building
Geometry
Trigonometry
Exercises
Calculus and Social Justice
Differentiation
Integration
Exercises

Citation preview

Mathematics for Sustainability Jacob Duncan Department of Mathematics and Statistics Winona State University [email protected]

1

CONTENTS

2

Contents 1

2

3

4

5

6

Algebra and Population Growth

4

1.1

The Malthusian Growth Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7

1.2

The Logistic Growth Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.3

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

Set Theory and Renewable Energy

17

2.1

Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.2

Set Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.3

Venn Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.4

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

Probability and Organic Agriculture

34

3.1

Probability Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.2

General Probability Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.3

Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3.4

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Statistics and Climate Change

64

4.1

Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.2

Linear Regression and Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

4.3

The Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

4.4

The Distribution of a Sample Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

4.5

Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

4.6

Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

4.7

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

Network Theory and Green Transportation

144

5.1

Graph Theory Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

5.2

Eulerian Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

5.3

Graph Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

5.4

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

Geometry, Trigonometry, and Natural Building 6.1

167

Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

CONTENTS

7

3

6.2

Trigonometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

6.3

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

Calculus and Social Justice

197

7.1

Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

7.2

Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225

7.3

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247

1

1

ALGEBRA AND POPULATION GROWTH

4

Algebra and Population Growth Topics: Proportionality Exponential function models Graphing Solving exponential equations

Relative error Logarithms Logistic function models Solving linear equations

Human overpopulation (or population overshoot) is when there are too many people for the environment to sustain (with food, drinkable water, breathable air, energy resources, etc.). In more scientific terms, there is overshoot when the ecological footprint of a human population in a geographical area exceeds that place’s carrying capacity, damaging the environment faster than nature can repair it, potentially leading to an ecological and societal collapse. Overpopulation may be considered the root cause of nearly all sustainability problems we face today. With 7.7 billion people on the planet to sustain, resources are becoming relatively scarce. * World population growth videos: • https://www.youtube.com/watch?v=khFjdmp9sZk • https://www.youtube.com/watch?v=PUwmA3Q0_OE • http://www.worldometers.info/world-population/ Example 1.1. Basic population growth Suppose for a single celled organism (e.g., bacteria), a cell divides every minute. *Bacteria population growth videos:

1

ALGEBRA AND POPULATION GROWTH

5

• https://www.youtube.com/watch?v=gEwzDydciWc • https://www.youtube.com/watch?v=4grQSLmWXQk

1. Derive a formula for predicting the population P , at time t (in minutes).

P = 2t 2. What is the population after 10 minutes? P = 210 1, 024 cells 3. What is the population after 20 min? P = 220 1, 048, 576 cells 4. Graph P vs t.

ALGEBRA AND POPULATION GROWTH

6

2500

2000

Population, P

1

1500

1000

500

0 0

2

4

6

8

10

time, t

Example 1.2. Infectious disease spread https://www.youtube.com/watch?v=Kas0tIxDvrg&vl=en

12

1

ALGEBRA AND POPULATION GROWTH

1.1

7

The Malthusian Growth Model

The Malthusian growth model is named after Thomas Robert Malthus, who wrote An Essay on the Principle of Population (1798), one of the earliest and most influential books on population dynamics. The Malthusian model assumes there are no limitations (such as food or space) on growth and that the rate of growth is proportional to the size of the population. That is, growth rate = r · population size for some constant r called the intrinsic growth rate. The intrinsic growth rate can be thought of as the per capita rate of increase (birth rate minus death rate). This defining assumption leads to the following exponential function model. Definition 1.1. For a population under ideal conditions (no limitation on food, space, etc.), the number of individuals in the population at a given point in time can be predicted by the Malthusian Model,

P (t) = P0 ert , where P is the population at time t, P0 is the initial population (when t = 0) , and r is the intrinsic growth rate (per capita growth rate) of the population. Note: The base of the exponential function above is Euler’s number: e ≈ 2.718. Note: A mathematical function is an input-output machine. For the function P (t) above, t is the input and P is the output.

Example 1.3. Exponential (Malthusian) population growth Suppose for a species of bacteria we have 1000 cells (initially) and the population grows at a per capita rate of 5% per hour. Then the exponential model specific to this species is

P (t) = 1000e0.05t , where t is measured in hours. 1. Use the model to predict the population after 3 hours. P (3) = 1000e0.05·3 ≈ 1161.8 cells 2. When will the population reach 2000? Solve the following equation for t: 2000 = 1000e0.05t Divide both sides by 1000 to isolate the exponential factor: 2 = e0.05t

ALGEBRA AND POPULATION GROWTH

8

Use a logarithm to convert the exponential factor into a linear term: ln 2 = 0.05t Solve for t: t = 20 ln 2 ≈ 13.86 hours

2500 X: 13.86 Y: 2000 2000

Population, P (# of cells)

1

1500

X: 3 Y: 1162

1000

500

0 0

5

10

15

time, t

Class Problem 1.1. Exponential (Malthusian) population growth Suppose for a species of bacteria we have 300 cells (initially) and the population grows at a per capita rate of 8% per hour. Then the exponential model specific to this species is

where t is measured in hours. 1. Use the model to predict the population after 1 day. 2. When will the population reach 1000? 3. Sketch a graph of the model.

ALGEBRA AND POPULATION GROWTH

9

Example 1.4. Human population growth The number of humans on Earth in 1900 was 1.65 billion. The world population in 1980 was 4.4 billion.

11 10 9 Actual Pop. in 2017 = 7.6 billion

Population, P (billions)

1

8 7 X: 2017 Y: 6.89

6 X: 1980 Y: 4.402

5 4 3 2 1 1900

1950

2000

2050

time, t (year)

1. Using the Malthusian model, what is our per capita growth rate? Solution: r = 0.0123. I.e., 1.23% per year. 2. What does the model predict for the population in 2017? Compute the relative error, given that the actual population was 7.6 billion in 2017. Note: relative error =

actual value−predicted value actual value

Solution: P (117) = 6.9 billion relative error =

7.6−6.9 7.6

= 0.092 = 9.2%

3. What does the model predict for our 2080 population? Solution: P (180) = 15.1 billion

1

ALGEBRA AND POPULATION GROWTH

10

Class Problem 1.2. Human population growth The number of humans on Earth in 1950 was 2.5 billion. The world population in 2000 was 6.1 billion.

1. Using the Malthusian model, what is our per capita growth rate? 2. What does the model predict for the population in 2019? Compute the relative error, given that the actual population is 7.7 billion in 2019. Note: relative error =

actual value−predicted value actual value

3. What does the model predict for our 2050 population? 4. Sketch a graph of the model.

* Yeast Population Growth Lab: https://digitalcommons.usu.edu/lemb/3/ *

Baker’s yeast: Saccharomyces cerevisiae (used in fermentation to make bread or alcohol) * Baker’s Yeast: https://www.youtube.com/watch?v=iyWtp_L0Kzc Fermentation: sugar (glucose) ⇒ alcohol (ethanol) and CO2 C6 H12 O6 ⇒ 2CH3 CH2 OH + 2CO2 Assumptions about unlimited growth are not valid since sugar (yeast food) is not replenished (sugar eventually runs out). Furthermore, the flask has a finite size and could not accommodate an exponentially increasing population of yeast.

1

ALGEBRA AND POPULATION GROWTH

1.2

11

The Logistic Growth Model

The Malthusian model assumes there are no limitations on population growth. For most realistic populations, however, growth is constrained by the finiteness of resources, e.g., food, water, space, etc. The logistic growth model accounts for such constraints by incorporating a carrying capacity – the maximum population that a particular environment can support. Unlike the Malthusian model, which assumes that the per capita growth rate is constant (regardless of population size), the logistic model allows for a per capita growth rate that varies with the size of the population. In particular, the per capita growth rate decreases as the population increases, and approaches zero as the population approaches the carrying capacity. Definition 1.2. For a population whose growth is constrained by its environment, the number of individuals in the population at a given point in time can be predicted by the Logistic Growth Model,

P (t) =

C , 1 + Ae−rt

where C is the carrying capacity of the environment, r is the intrinsic growth rate, and A =

C−P0 P0 .

Example 1.5. Logistic yeast population growth Suppose a yeast population (with 2 billion cells initially) in a flask has a carrying capacity of 100 billion and intrinsic growth rate of 10% per hour. 1. Then the logistic growth model is

P (t) =

Population, P (# of cells in billions)

Note that A =

C−P0 P0

=

100−2 2

100 1 + 49e−0.1t

= 49.

100

80

60

40

20

0 0

20

40

60

time, t (hours)

80

100

120

1

ALGEBRA AND POPULATION GROWTH

12

2. What does the model predict for the population after 50 hrs? Solution:

P (50) =

100 ≈ 75 1 + 49e−0.1·50

3. When does the population reach half its carrying capacity? Solution: Solve for t in:

P (t) =

100 1 = · 100 −0.1t 2 1 + 49e

100 50 = −0.1t 1 + 49e 1 Cross multiply:  50 1 + 49e−0.1t = 100 Divide by 50: 1 + 49e−0.1t = 2 Subtract 1, then divide by 49: e−0.1t =

1 49

Take natural log on both sides: −0.1t

ln e



 = ln

1 49



Simplify: −0.1t ≈ −3.892 Divide by -0.1: t = 38.92 ≈ 39hrs.

Class Problem 1.3. Logistic yeast population growth Suppose a yeast population (with 1 billion cells initially) in a flask has a carrying capacity of 10 billion and intrinsic growth rate of 20% per hour. 1. Then the logistic growth model is ... 2. What does the model predict for the population after 2 days? 3. When does the population reach half its carrying capacity? 4. Sketch a graph of the model.

1

ALGEBRA AND POPULATION GROWTH

13

* Yeast Population Growth Lab * Construct a logistic growth model of our yeast population growth using CO2 level as a proxy variable (since CO2 level is proportional to yeast population size).

Theorem 1.1. Carrying Capacity Formula Suppose a population grows logistically with initial population P0 and intrinsic growth rate r. If the population, P , is known at some time, t, then the carrying capacity can be calculated by the following formula: 

C=

P0 P e−rt − 1 . P e−rt − P0

Example 1.6. Logistic human population growth The number of humans on Earth in 1900 was 1.65 billion. The world population in 1980 was 4.4 billion. Assume the intrinsic growth rate is 0.015. 1. Predict the carrying capacity of the Earth. Solution: Using Theorem 1.1,  1.65 · 4.4 · e−0.015·80 − 1 C= ≈ 15.6 billion people. 4.4e−0.015·80 − 1.65 2. What does the model predict for the population in 2019? Compute the relative error, given that the actual 2019 population is 7.7 billion. Solution: Using the given information and the carrying capacity calculated above, the logistic model is

P (t) =

15.6 . 1 + 8.45e−0.015t

Our predicted population for 2019 is P (119) =

15.6 ≈ 6.5 billion. 1 + 8.45e−0.015·119

Now, the relative error in our prediction is relative error =

7.7 − 6.5 = 0.156 = 15.6% 7.7

ALGEBRA AND POPULATION GROWTH

14

3. What does the model predict for our 2080 population? Solution:

P (180) =

15.6 ≈ 10 billion. 1 + 8.45e−0.015·180

4. Sketch a graph of the model.

16 14 12

Population, P (billions)

1

X: 180 Y: 9.958

10 8 6

X: 119 Y: 6.455

4

X: 80 Y: 4.4

2 X: 0 Y: 1.650 0

50

100

150

200

250

300

time, t (years since 1900)

5. Use the graph to estimate when we reach half of our carrying capacity? Solution: Note that half the carrying capacity is 7.8. Since P (≈ 140) = 7.8, we expect to reach half carrying capacity in the year 2040.

350

400

1

ALGEBRA AND POPULATION GROWTH

15

Class Problem 1.4. Logistic human population growth The number of humans on Earth in 1950 was 2.5 billion. The world population in 2000 was 6.1 billion. Assume the intrinsic growth rate is 0.02. 1. Predict the carrying capacity of the Earth. 2. What does the model predict for our current population? Compute the relative error, given that the actual population is 7.7 billion. 3. What does the model predict for our 2100 population? 4. Sketch a graph of the model. 5. Use the graph to estimate when we reach half of our carrying capacity?

1

ALGEBRA AND POPULATION GROWTH

1.3

16

Exercises

1. Watch The History of the World: Every Year at • https://www.youtube.com/watch?v=-6Wu0Q7x5D0 • The video sets up an historical context and time scale in which to study sustainability, with particular regard to human civilization, discovery, invention, and technology. • Write a 1 page summary (12 pt font, double-spaced). Include answers to the following questions: (a) What does BCE mean? (b) Where and when did humanity (Homo Sapiens) begin? (What present day countries?) (c) Where and when were the first few advanced civilizations? Include the names of the civilizations. (d) What were the first advanced cultures in the U.S.? When did they appear? 2. Suppose for a single celled organism (e.g., bacteria), a cell divides every minute. Use the basic population growth model (see Example 1.1) to predict the population after 15 minutes? 3. Suppose for a species of bacteria we have 20 cells (initially) and the population grows at a per capita rate of 2% per hour. (a) What is the Malthusian (exponential) model specific to this species? (b) Use the model to predict the population after 1 day. (c) When will the population reach 1000? 4. Suppose a yeast population (with 1 billion cells initially) in a flask has a carrying capacity of 10 billion and intrinsic growth rate of 20% per hour. (a) What is the logistic model specific to this species? (b) Use the model to predict the population after 2 days. (c) When will the population reach 5 billion?

Extra credit: Derive the carrying capacity formula from Theorem 1.1.

Extra credit: Watch OVERPOPULATED - BBC Documentary at • https://www.youtube.com/watch?v=VUTP93qWV7I • Write a 1 page summary (12 pt font, double-spaced).

2

2

SET THEORY AND RENEWABLE ENERGY

17

Set Theory and Renewable Energy

Topics: Definition of a set Cardinality Intersection and Union Venn diagrams

Set membership Subsets of the real numbers Complements Inclusion-Exclusion Principle

Figure 2.1: Wind and solar farm Renewable energy is energy that is collected from renewable resources, which are naturally replenished on a human timescale, such as sunlight, wind, rain, tides, waves, and geothermal heat. In international public opinion surveys there is strong support for promoting renewable sources such as solar power and wind power (Fig. 2.2). Iceland and Norway generate all their electricity using renewable energy already. In Denmark the government decided to switch the total energy supply (electricity, mobility and heating/cooling) to 100% renewable energy by 2050. For a description of each energy source, see https://www.dropbox.com/s/w9yyrlgos14hj90/Energy_Sources.pdf?dl=0

2

SET THEORY AND RENEWABLE ENERGY

Figure 2.2: Global public support for energy sources

18

2

SET THEORY AND RENEWABLE ENERGY

2.1

19

Sets

Definition 2.1. A set is a collection of objects. Example 2.1. Breakdown of 2015 Electricity Generation by Energy Source: • www.tsp-data-portal.org/

Let U represent the set of the world’s electrical energy sources: U = {oil, coal, gas, nuclear, hydro, wind, solar, tidal, geothermal, biomass} Let A represent a subset of Afghanistan’s energy sources: A = {oil, hydro} Let G represent a subset of Argentina’s energy sources: G = {oil, gas, hydro, nuclear} Let N represent a subset of the Netherland’s energy sources: N = {coal, gas, nuclear, wind, biomass}

Terminology: The set U is called the universe if it contains all elements in the context of a given situation. For example, set U in Example 2.1 is the set of all the world’s electrical energy sources. Notation: If object x is an element of set B, we write x ∈ B. If x is not an element of set B, we write x ∈ / B. Definition 2.2. The number of elements in a set, A, is called the cardinality of A, denoted by n(A).

Example 2.2. 1. Refer to Example 2.1. (a) biomass ∈ N but hydro ∈ / N. (b) n(U ) = 10 and n(G) = 4. (c) If X is the set of all renewable energy sources, then wind ∈ X but coal ∈ / X.

2

SET THEORY AND RENEWABLE ENERGY

20

2. Let N denote the set of natural numbers (positive whole numbers). I.e., N = {1, 2, 3, 4, 5, ...}. Let Q denote the set of rational numbers. A number is rational if it can be written as a fraction. / N. Then 53 ∈ Q but 53 ∈

Definition 2.3. Set B is a subset of A if every element in B is also in A. Notation: If set B is a subset of set A, we write B ⊆ A.

Notation: The empty set (the set containing no elements) is denoted by φ. I.e., φ = {}.

Definition 2.4. The complement of set S is the set of all elements in the universe that are not in S. Notation: The complement of set S is denoted by S 0 .

Theorem 2.1. The Complement Rule:

n(A0 ) = n(U ) − n(A)

Example 2.3. 1. Refer to Example 2.1, (a) A ⊆ G. Also, A, G, N ⊆ U since U is the universe. (b) The complement of G is G0 = {coal, wind, solar, tidal, geothermal, biomass}. (c) Find n(G0 ) using the complement rule. Solution: n(G0 ) = n(U ) − n(G) = 10 − 4 = 6. 2. N ⊆ Q since every whole number can be written as a fraction (itself over 1), e.g., 5 = 51 .

2

SET THEORY AND RENEWABLE ENERGY

2.2

21

Set Operations

We now examine the two primary ways of combining sets: intersection and union. Definition 2.5. The intersection of sets A and B is the set of all elements that are in both A and B. Notation: The intersection of sets A and B is denoted by A ∩ B.

Example 2.4. 1. Refer to Example 2.1, where we had A = {oil, hydro}, G = {oil, gas, hydro, nuclear}, and N = {coal, gas, nuclear, wind, biomass}. (a) Find the intersection of G and N . Solution: G ∩ N = {gas, nuclear} (b) Find the intersection of A and N . Solution: A∩N =φ (c) Find the intersection of A and G. Solution: A ∩ G = {oil, hydro} 2. Find the intersection of the natural numbers with the set S = {−2, −1.5, −1, −0.5, 0, 0.5, 1, 1.5, 2, 2.5, 3}. Solution: N ∩ S = {1, 2, 3} 3. Let Z denote the set of integers (positive and negative whole numbers). I.e., Z = {..., −4, −3, −2, −1, 0, 1, 2, 3, 4, ...}. Find Z ∩ Q. Solution: Z ∩ Q = Z, since every integer can be written as itself over 1 (a fraction), and is therefore a rational number. Hence, we can say Z ⊆ Q.

2

SET THEORY AND RENEWABLE ENERGY

22

Definition 2.6. Sets A and B are disjoint if A ∩ B = φ. Disjoint sets have no elements in common. Example 2.5. Refer to Example 2.4. Since A ∩ N = φ, A and N are disjoint.

Definition 2.7. The union of sets A and B is the set of all elements that are in either A or B. Notation: The union of sets A and B is denoted by A ∪ B.

Example 2.6. 1. Refer to Example 2.4, where A = {oil, hydro}, G = {oil, gas, hydro, nuclear}, and N = {coal, gas, nuclear, wind, biomass}. (a) Find A ∪ N . Solution: A ∪ N = {oil, coal, gas, hydro, nuclear, wind, biomass} (b) Find G ∪ A. Solution: G ∪ A = {oil, hydro} (c) Find G ∪ N . Solution: G ∪ N = {oil, coal, gas, hydro, nuclear, wind, biomass} 2. Let I denote the set of irrational numbers. Irrational numbers are those that cannot be√represented as a fraction, i.e., non-terminating, non-repeating decimal numbers such as π, e, or 2). Find Q ∪ I. Solution: Q∪I=R where R is the set of real numbers. The set of real numbers include all positive and negative whole numbers, fractions, repeating and nonrepeating decimals, terminating and nonterminating decimals, square roots of positive numbers, etc. Essentially, the set of real numbers contains all types of numbers (except imaginary numbers!).

2

2.3

SET THEORY AND RENEWABLE ENERGY

23

Venn Diagrams

A Venn diagram is a way of visualizing sets and set operations.

U A

B

Figure 2.3: Venn diagram of sets A and B

Figure 2.4: The shaded region represents the intersection of sets A and B, A ∩ B

2

SET THEORY AND RENEWABLE ENERGY

Figure 2.5: The shaded region represents the union of sets A and B, A ∪ B

Example 2.7. Energy sources by state: • https://www.washingtonpost.com/graphics/national/power-plants/

24

2

SET THEORY AND RENEWABLE ENERGY

25

30 US states get some electrical energy from nuclear power and 41 states get some energy from wind power. Kentucky is the only state that has no nuclear power and no wind power. Venn diagram:

Note that the total number of states is indicated by the 50 at the top of the rectangle which represents the universe set (the United States). I.e., n(U ) = 50 The total number of states inside the circle for nuclear power is 30. I.e., n(N ) = 30

2

SET THEORY AND RENEWABLE ENERGY

26

Similarly, the total in circle W is 41. I.e., n(W ) = 41 The number of states that do not get any power from nuclear plants is 20 (the sum of numbers outside circle N ). I.e., n(N 0 ) = 19 + 1 = 20 Similarly, the number of states that do not get any power from wind farms is 9 (the sum of numbers outside circle W ). I.e., n(W 0 ) = 8 + 1 = 9 We see that 22 states get some electrical energy from both nuclear power plants and wind farms. I.e., n(N ∩ W ) = 22 Furthermore, 49 states get some electrical energy from either nuclear power plants or wind farms. I.e., n(N ∪ W ) = 8 + 19 + 22 = 49 Notice that n(N ∪ W ) = n(N ) + n(W ) − n(N ∩ W ) = 30 + 41 − 22 = 49 is an alternative way of finding the union of the two sets. The number 1 outside of either circle is for Kentucky, which gets no energy from either nuclear or wind sources. I.e.,  n (N ∪ W )0 = 1 Note that  n (N ∪ W )0 = n(U ) − n(N ∪ W ) = 50 − 49 = 1

Alternatively, we can represent this energy source information using percents: 60% of US states get some energy from nuclear power and 82% of states get some energy from wind power. Venn diagram:

2

SET THEORY AND RENEWABLE ENERGY

27

Suppose we would like to find the number of elements in the union of sets A and B (the set of objects in either A or B). It may be tempting to simply add the elements of each set to find n(A ∪ B). However, in n(A) + n(B), some elements are counted twice (e.g., the 22 states that get some energy from both nuclear and wind power). The elements that were counted twice are exactly those that belong to A (one count) and also belong to B (the second count), i.e., the elements of A ∩ B. To obtain the correct number of elements in the union we have to subtract from n(A) + n(B) the number of elements that get counted twice, i.e., the intersection n(A ∩ B). This is known as the Inclusion-Exclusion Principle. Theorem 2.2. The Inclusion-Exclusion Principle:

n(A ∪ B) = n(A) + n(B) − n(A ∩ B)

Example 2.8. Yemen gets its electrical energy from a total of 3 sources. Uruguay gets its electrical energy from a total of 5 sources. The two countries together get electrical energy from a total of 6 sources. How many sources do they have in common? Solution: Let Y denote the set of Yemen’s electrical energy sources. Let U denote the set of Uruguay’s electrical energy sources. The first two sentences of the problem statement yield n(Y ) = 3 and n(U ) = 5. The third sentence tells us that n(Y ∪ U ) = 6. We must find n(Y ∩ U ). Substituting the known quantities into the Inclusion-Exclusion Principle we have, n(Y ∪ U ) = n(Y ) + n(U ) − n(Y ∩ U ) 6 = 3 + 5 − n(Y ∩ U ).

2

SET THEORY AND RENEWABLE ENERGY

28

Thus, n(Y ∩ U ) = 2.

Example 2.9. Refer to Example 2.8. Assuming there are 10 world electrical energy sources total, construct a complete Venn diagram of Yemen and Uruguay electrical energy sources. Solution: The intersection, n(Y ∩ U ) = 2, is expressed in the Venn diagram as the number in the overlap of the two circles representing Yemen and Uruguay’s electrical energy sources. To find the number of sources that Yemen uses but Uruguay does not, we subtract the sources they have in common from Yemen’s total: n(Y ∩ G0 ) = n(Y ) − n(Y ∩ U ) = 3 − 2 = 1. Thus, we place a 1 in the region inside of circle Y but outside of circle U . In a similar fashion we see that n(U ∩ Y 0 ) = n(U ) − n(U ∩ Y ) = 5 − 2 = 3. Let W denote the set of world energy sources. To find the number of sources that neither country uses, we subtract the number in their union from the total: n(W ) − n(U ∪ Y ) = 10 − 6 = 4. Thus, we place a 4 in the region outside of both circles.

2

SET THEORY AND RENEWABLE ENERGY

29

Example 2.10. The Real Number System Construct a Venn diagram of the following subsets of Real numbers: N, Z, Q, I, R.

The set of real numbers (represented by the rectangle) is partitioned into two sets - the irrational and rational numbers. Notice that the circles representing these two sets do not overlap. This indicates that they are disjoint, i.e., Q ∪ I = φ (no number can be both rational and irrational). Furthermore, there are no real numbers outside of both sets, indicating that the union of the rational with irrational numbers is the entire set of real numbers, i.e., Q ∪ I = R. A circles contained entirely inside of another circle in a Venn diagram indicates a subset relationship (the inner circle is a subset of the outer). For example, the circle representing the integers is inside the circle representing the rational numbers. Hence, Z ⊆ Q. Similarly, N ⊆ Z.

2

SET THEORY AND RENEWABLE ENERGY

Class Problem 2.1. Let U represent the set of world energy sources: U = {oil, coal, gas, nuclear, hydro, wind, solar, tidal, geothermal, biomass/waste} 1. Let A represent the set of USA’s energy sources. Find A (ignore the “Others” category). 2. Let B represent the set of Columbia’s energy sources. Find B (ignore the “Others” category). 3. True or False: wind ∈ /A 4. True or False: coal ∈ B 5. Find n(A) and n(B). 6. Find A ∩ B and n(A ∩ B). 7. Find B ∪ A and n(B ∪ A). 8. Use the inclusion-exclusion principle to verify your your answer in (7). 9. Are sets A and B disjoint? Explain. 10. Find A0 and n(A0 ). 11. Find n(A0 ) using the complement rule to verify your answer from (10). 12. Find a subset of A. 13. Is B ∩ A a subset of A? Is B ∩ A a subset of B? Is this always the case? Why or why not? 14. Geothermal Electricity Generation

Given that 41 states get some energy from wind power,

30

2

SET THEORY AND RENEWABLE ENERGY

31

(a) Use the geothermal map (above) to determine the number of states that generate electricity (or plan to) from geothermal power. (b) Draw a Venn diagram with a set containing states that get electricity from wind and a set containing states that get electricity from geothermal sources. Use https://www.washingtonpost.com/graphics/national/power-plants/ to aid in constructing the Venn diagram. (c) Verify that the Inclusion-Exclusion Principle is true in this situation. 15. The real number system. (a) Is N ⊆ Z? (b) Is Q ⊆ Z? (c) Is Z ⊆ R? (d) Is Z ⊆ I? (e) Is 0 ∈ Z? (f) Is −15.2 ∈ Z? (g) Is π ∈ Z? (h) Is π ∈ Q? (i) Is 0.25 ∈ Q? ¯ ∈ Q? (j) Is 0.3 (k) Is −15.2 ∈ Q?

2

SET THEORY AND RENEWABLE ENERGY

2.4

Exercises

1. Let U be the set of all renewable (green) electrical energy sources, U = {hydro, tidal, wave, wind, solar, geothermal, biomass}. Let H be the set of water-based (renewable) energy sources, H = {hydro, tidal, wave}. Let T be the set of sources that drive a turbine to generate electricity, T = {hydro, wind, tidal, wave, geothermal}. Let E be the set of heat-based sources, E = {geothermal}. (a) True or False: hydro ∈ H (b) True or False: wave ∈ /T (c) True or False: U ⊆ T (d) Which sets are subsets of other sets? Use appropriate notation in your answer. (e) What is the universe set and why? (f) Find H ∩ T . (g) Find the union of H and T . (h) Find the intersection of E and H. (i) Find E ∪ T . (j) What is the cardinality of the universe? (k) Find n(T ). (l) Find n(E ∩ T ). 2. The real number system. (a) Is N ⊆ Z? (b) Is Q ⊆ Z? (c) Is Z ⊆ R? (d) Is Z ⊆ I? (e) Is 0 ∈ Z? (f) Is −15.2 ∈ Z? (g) Is π ∈ Z? (h) Is π ∈ Q? (i) Is 0.25 ∈ Q? (j) Is 0.¯ 3 ∈ Q? (k) Is −15.2 ∈ Q?

32

2

SET THEORY AND RENEWABLE ENERGY

33

3. * Energy sources by state: • https://www.washingtonpost.com/graphics/national/power-plants/ Use the website above to determine the number of states that get some energy from solar power. 41 states get some energy from wind power. 35 states get energy from both sources. Verify these numbers using the website. Draw a Venn diagram of this situation. How many states get no energy from solar or wind?

3

PROBABILITY AND ORGANIC AGRICULTURE

3

Probability and Organic Agriculture

34

Topics: Experiments & Sample space Events Probability (and properties) Complement rule

Independence Inclusion-Exclusion Principle Random variables Expectation

The goal of sustainable agriculture is to meet society’s food and textile needs in the present without compromising the ability of future generations to meet their own needs. Practitioners of sustainable agriculture seek to integrate three main objectives into their work: a healthy environment, economic profitability, and social and economic equity. Every person involved in the food system—growers, food processors, distributors, retailers, consumers, and waste managers—can play a role in ensuring a sustainable agricultural system.

3

PROBABILITY AND ORGANIC AGRICULTURE

3.1

35

Probability Preliminaries

Definition 3.1. An experiment (or trial) is any procedure that can be infinitely repeated and has a well-defined set of possible outcomes, known as the sample space. An experiment is said to be a random process if the result cannot be determined with certainty. Examples of random processes include flipping a coin (multiple times perhaps), administering a poll on voting tendencies, seed germination, precipitation on given day, etc. Example 3.1. (Seed germination rate) * Seed germination video: https://www.youtube.com/watch?v=E__rbDzNOZI The minimum germination rate for chives as set by the U.S. Federal Seed Act is 50%. In other words, if you plant 100 chive seeds (with 50% germination rate), you can expect about 50 to actually sprout. Suppose you plant 1 seed and make note of whether or not the seed germinates (sprouts). Let G denote the outcome that the seed germinates, and N the outcome that the seed does not germinate. Then the sample space of this experiment is S = {G, N }. Note that the cardinality (size) of the sample space is 2, i.e., n(S) = 2. Also note that these two outcomes are equally likely (since germination rate is 50%).

Example 3.2. Suppose you plant 3 chives seeds (50% germination rate) in a row and make note of which seeds germinate. Draw a tree diagram of this experiment to determine the sample space. Solution:

Tree diagram: germination of three seeds. From the tree diagram we see that the sample space of this experiment is S = {GGG, GGN, GN G, GN N, N GG, N GN, N N G, N N N }.

3

PROBABILITY AND ORGANIC AGRICULTURE Note that n(S) = 8 and all these 8 outcomes are equally likely (since germination rate is 50%).

36

3

PROBABILITY AND ORGANIC AGRICULTURE

37

Class Problem 3.1. 1. Suppose you plant 2 chives seeds in a row and make note of which seeds germinate. Draw a tree diagram of this experiment to determine the sample space, and the cardinality of the sample space. 2. Suppose you plant 4 chives seeds in a row and make note of which seeds germinate. Draw a tree diagram of this experiment to determine the sample space, and the cardinality of the sample space. 3. Suppose you plant N chives seeds in a row and make note of which seeds germinate. Find a relationship (i.e., a formula) between the number of seeds planted and the cardinality of the sample space. 4. Find the total number of possible outcomes if you plant 10 chives seeds.

Definition 3.2. An event is a set of outcomes from an experiment. I.e., an event, E, is a subset of the sample space, S, of an experiment; E ⊆ S.

Example 3.3. Suppose you plant 3 chives seeds. Write out the following events and find their corresponding cardinalities. (Use the tree diagram from Example 3.2.) Solution: 1. Let E1 be the event that only 1 seed germinates. Then E1 = {GN N, N GN, N N G} and n(E1 ) = 3. 2. Let E2 be the event that 2 seeds germinate. Then E2 = {GGN, GN G, N GG} and n(E2 ) = 3. 3. Let E3 be the event that at least one seed germinates. Then E3 = {GN N, N GN, N N G, GGN, GN G, N GG, GGG} and n(E3 ) = 7. 4. Let E4 be the event that all seeds germinate. Then E4 = {GGG} and n(E4 ) = 1.

3

PROBABILITY AND ORGANIC AGRICULTURE

38

* Germination rate of spinach * Students plant seeds in class. Observe the number that germinate. Calculate germination rate.

Definition 3.3. (Probability with equally likely outcomes) In an experiment with sample space S of equally likely outcomes, the probability of event E, denoted by P(E), is given by P(E) =

n(E) . n(S)

In other words, the probability of an event is the number of ways the event can occur, divided by the total number of possible outcomes. Example 3.4. Find the probabilities of the events defined in Example 3.3 (plant 3 chives seeds). Solution:

1. Let E1 be the event that only 1 seed germinates (E1 = {GN N, N GN, N N G}). Then P(E1 ) =

3 n(E1 ) = = 37.5%. n(S) 8

2. Let E2 be the event that 2 seeds germinate (E2 = {GGN, GN G, N GG}). Then P(E2 ) =

n(E2 ) 3 = = 37.5%. n(S) 8

3. Let E3 be the event that at least one seed germinates (E3 = {GN N, N GN, N N G, GGN, GN G, N GG, GGG}). Then P(E3 ) =

7 n(E3 ) = = 87.5%. n(S) 8

4. Let E4 be the event that all seeds germinate (E4 = {GGG}). Then P(E4 ) =

n(E4 ) 1 = = 12.5%. n(S) 8

Example 3.5. (Crop pests: alfalfa stem nematodes) The alfalfa stem nematode (ASN) is a plant parasite (worm) that can dramatically reduce harvest yields, raising considerable concern among alfalfa producers. ASN attacks and reproduces only inside alfalfa plants. Due to flood irrigation and harvesting tools employed for cutting the plants, nematodes get dispersed evenly over relatively large distances throughout a field. Thus, nematodes that leave an infested host plant have an equal chance of infesting any other plant in the field.

3

PROBABILITY AND ORGANIC AGRICULTURE

39

Alfalfa stem nematode (ASN)

Alfalfa field infested by ASN Suppose you have a small organic farm with 10 rows of alfalfa, each with 20 (uninfested) plants. Also suppose a single ASN worm is introduced into your field (through runoff irrigation from other infested fields for example). Solution: 1. What is the probability the nematode infests the 5th plant in the 2nd row? First note that the sample space contains 200 outcomes since any one of the plants can be infested by the single ASN worm. Let event A denote the event that the nematode infests the 5th plant in the 2nd row. Then, since

3

PROBABILITY AND ORGANIC AGRICULTURE

40

all plants have an equal probability of being infested (equally likely outcomes), P(A) =

1 n(A) = = 0.5%. n(S) 200

2. What is the probability it infests the 18th plant in the 6th row? Let event B denote the event that the nematode infests the 18th plant in the 6th row. Then, P(B) =

n(B) 1 = = 0.5%. n(S) 200

3. What is the probability it infests a plant in the 9th row? Let E denote the event that the nematode infests a plant in the 9th row. Since there are 20 plants in the 9th row, there are 20 different ways for event E to occur. Thus, n(E) = 20 and we have P(E) =

n(E) 20 = = 10%. n(S) 200

4. What is the probability it infests the 4th, 5th, or 6th plant in any of the rows? Let F denote the event that the nematode infests the 4th, 5th, or 6th plant in any of the rows. Since there are 10 rows, and 3 plants in question (4th, 5th, or 6th), there are 30 different ways for event F to occur. Thus, n(F ) = 30 and we have P(F ) =

n(F ) 30 = = 15%. n(S) 200

3

PROBABILITY AND ORGANIC AGRICULTURE

41

Class Problem 3.2. Suppose you plant 4 chives seeds and make note of which seeds germinate. Find the probabilities of the following events: 1. Let E0 be the event that no seeds germinate. 2. Let E1 be the event that only 1 seed germinates. 3. Let E2 be the event that 2 seeds germinate. 4. Let E3 be the event that 3 seeds germinate. 5. Let E4 be the event that all seeds germinate. 6. Let E5 be the event that an odd number of seeds germinate. 7. Let E6 be the event that at least one seed germinates.

3

PROBABILITY AND ORGANIC AGRICULTURE

3.2

42

General Probability Theory

Definition 3.4. The probability of any event E is given by, P(E) = sum of probabilities of the outcomes in E.

Example 3.6. Suppose you plant 3 chives seeds (50% germination rate). Let E be the event that an even number of seeds germinates. Find P (E). Solution: Note that E = {GGN, GN G, N GG, N N N }. Since the germination rate is 50%, all outcomes are equally likely with probability 3.2). Hence, P(E) = P(GGN ) + P(GN G) + P(N GG) + P(N N N ) =

1 8

each (see Example

1 1 1 1 + + + 8 8 8 8 4 1 = = . 8 2

Definition 3.5. The complement of event E, denoted by E 0 , is the set of all outcomes not in E. I.e., the complement of E is the event that E does not happen.

Example 3.7. Suppose you plant 3 chives seeds. Let E be the event that an even number of seeds germinate and O the event that an odd number of seeds germinate. Then events E and O are complements of each other.

Definition 3.6. (Complement rule) For any event E, P(E) + P(E 0 ) = 1. I.e., P(E 0 ) = 1 − P(E).

Example 3.8. Suppose you plant 3 chives seeds. Let E be the event that an even number of seeds sprout and O the event that an odd number of seeds sprout. Prove that E and O are complements of each other. Solution: Note that P(E) = 0.5 (from Example 3.6). Similarly, P(O) = 0.5. Since P(E) + P(O) = 1, E and O are complements of each other.

3

PROBABILITY AND ORGANIC AGRICULTURE

43

Example 3.9. Suppose the weather report for your area today claims that the chance of, rain is 30%, hail is 10%, and snow is 40%. What is the sample space for this experiment? Find the probability that you get some kind of precipitation. What is the chance of no precipitation? Solution: Sample space: S = {rain, hail, snow, no precip} Probability of getting some kind of precipitation is P(some precip) = P(rain) + P(hail) + P(snow) = 80%. The chance of no precipitation is P(no precip) = 1 − P(some precip) = 1 − 0.8 = 0.2 = 20%.

Example 3.10. 1. Suppose you plant 3 chives seeds. Use the complement rule to find the probability that at least one seed sprouts. Solution: 1 P(at least one) = 1 − P(not at least one) = 1 − P(none) = 1 − P(N N N ) = 1 − . 8 2. Suppose you plant 4 chives seeds. Use the complement rule to find the probability that at least one seed sprouts. Solution: P(at least one) = 1 − P(not at least one) = 1 − P(none) = 1 − P(N N N N ) = 1 − Note: Construct a tree diagram to see that P(N N N N ) =

1 16 .

Theorem 3.1. Properties of Probabilities: 1. The sum of the probabilities of all outcomes of an experiment is 1. 2. For any event E, P(E) is a real number between 0 and 1. 3. If P(E) = 0, then event E is impossible (cannot/will not occur). 4. If P(E) = 1, the event E is certain to occur (absolutely will happen). 5. P(φ) = 0.

1 . 16

3

PROBABILITY AND ORGANIC AGRICULTURE

Example 3.11. Verify Property 1 for 1. Example 3.1 (Plant 1 chives seed.) Solution: P(G) + P(N ) = 0.5 + 0.5 = 1. 2. The case where we plant 2 chives seeds. Solution: First note that the sample space is S = {GG, GN, N G, N N }. Hence, P(GG) + P(GN ) + P(N G) + P(N N ) = 0.25 + 0.25 + 0.25 + 0.25 = 1. 3. Example 3.2 (Plant 3 chives seeds.)

44

3

PROBABILITY AND ORGANIC AGRICULTURE

45

We now turn our attention to experiments and random phenomena which do not have equally likely outcomes. Example 3.12. Asparagus has a germination rate of 70%. Suppose you plant 1 asparagus seed. Let G denote the outcome that the seed germinates, and N the outcome that the seed does not germinate. Then the sample space of this experiment is S = {G, N }. Note that these two outcomes are not equally likely since P(G) = 0.7 and P(N ) = 0.3.

Definition 3.7. Two events A and B are independent if the probability of one is not affected by the occurrence of the other. I.e., they are unrelated (in terms of probability).

Example 3.13. Suppose you plant two asparagus seeds (labelled seed 1 and seed 2) in a row. The probability that one seed germinates is 70% regardless of whether or not the other seed germinated. Thus, the events A = seed 1 germinates and B = seed 2 germinates are independent.

Theorem 3.2. If A and B are independent events, then P(A ∩ B) = P(A)P(B).

If events A and B are independent, the probability that A occurs AND B occurs is the product of the individual probabilities. Example 3.14. Suppose you plant two asparagus seeds (labelled seed 1 and seed 2) in a row. Let A = seed 1 germinates and B = seed 2 germinates. What is the probability that both seeds germinate? Solution: P(both germinate) = P(seed 1 germinates AND seed 2 germinates) = P(A ∩ B) = P(A)P(B) = 0.7 · 0.7 = 0.49 Example 3.15. Suppose you plant 3 asparagus seeds. Find the probabilities of all outcomes in the sample space and verify probability property 1. Solution: To calculate the probability that the first seed germinates but the last two do not, for instance, we consider that P (GN N ) =

3

PROBABILITY AND ORGANIC AGRICULTURE

46

P (seed 1 sprouts AND seed 2 does not sprout AND seed 3 does not sprout) = P (seed 1 sprouts ∩ seed 2 does not sprout ∩ seed 3 does not sprout ) = P (seed 1 sprouts) · P ( seed 2 does not sprout) · P ( seed 3 does not sprout) = 0.7 · 0.3 · 0.3 = 0.063 Hence, we multiply down the branches to get probabilities for outcomes. E.g., P (GN G) = 0.7 · 0.3 · 0.7 = 0.147.

0.7

0.3

Seed 1

0.7

0.3

0.3

0.7

Seed 2

0.7

0.3

0.343

0.147

0.7

0.3

0.7

0.3

0.7

0.3

0.063

0.063

0.027

Seed 3

0.147

0.063

0.147

Tree diagram: germination of three seeds. From the tree diagram we see that the sample space of this experiment is S = {GGG, GGN, GN G, GN N, N GG, N GN, N N G, N N N }. Note that outcomes are not equally likely.

Definition 3.8. Two events A and B are disjoint if they have no outcomes in common, i.e., cannot occur simultaneously.

Example 3.16. Suppose you plant 3 asparagus seeds. Let A be the event that more than one seed germinates, B the event that an even number of seeds germinate, and C the event that no more than one germinates. Identify any disjoint events. Solution: Note that A = {2, 3} B = {0, 2}

3

PROBABILITY AND ORGANIC AGRICULTURE

47

C = {0, 1} where the numbers in the events indicate the quantity of seeds that germinate. Since A and C have no outcomes in common, they are disjoint.

Theorem 3.3. If A and B are disjoint events, then 1. P(A ∩ B) = 0, and 2. P(A ∪ B) = P(A) + P(B).

If events A and B are disjoint, the probability that A occurs OR B occurs is the sum of the individual probabilities. Example 3.17. Suppose you plant 3 asparagus seeds. Let A be the event that more than one seed germinates, B the event that an even number of seeds germinate, and C the event that no more than one germinates. Find the probability of the intersection and union of any disjoint events. Solution: From Example 3.16 we know that A and C are disjoint. Now P(A ∩ C) = P({}) = 0 since A and C have no outcomes in common and, hence, the intersection is empty. (Note that P({}) = 0 by property 5 of Theorem 3.1.) Event A can be written as A = {GGN, GN G, N GG, GGG} and has probability P(A) = P(GGN ) + P(GN G) + P(N GG) + P(GGG) = 0.147 + 0.147 + 0.147 + 0.343 = 0.784 (see Example 3.15). Similarly, P(C) = P(N N N ) = P(GN N ) + P(N GN ) + P(GGN ) = 0.027 + 0.063 + 0.063 + 0.063 = 0.216. Hence, P(A ∪ C) = P(A) + P(C) = 0.784 + 0.216 = 1.

Theorem 3.4. (Inclusion-Exclusion Principle) If A and B are any two events, then P(A ∪ B) = P(A) + P(B) − P(A ∩ B).

3

PROBABILITY AND ORGANIC AGRICULTURE

48

If A and B are not disjoint and we attempt to calculate the probability of the union using Theorem 3.3 (part 2), we would mistakenly count outcomes that are in both events (i.e., the intersection) twice. Hence, we must subtract the probability of the intersection to accurately compute the probability of the union. Example 3.18. Suppose you plant 3 asparagus seeds. Let A be the event that at least one seed germinates and B the event that an even number of seeds germinate. Find P(A ∪ B). Solution: Referring to Examples 3.16 and 3.17, we have P(A ∪ B) = P(2 seeds germinate) and P(2 seeds germinate) = P(GGN ) + P(GN G) + P(N GG) = 0.147 + 0.147 + 0.147 = 0.441. Furthermore, P(A) = 0.784 and P(B) = 0.027 + 0.441 = 0.468. Thus, P(A ∪ B) = P(A) + P(B) − P(A ∩ B) = 0.784 + 0.468 − 0.441 = 0.811. That is, the probability that either at least one seed germinates or an even number of seeds germinate is 0.811.

Example 3.19. Suppose you plant 2 asparagus seeds. Define the events A = seed 1 germinates and B = seed 2 germinates. What is the probability that at least one seed germinates? Draw a Venn diagram to represent the sample space. What is the probability that only one seed germinates? Compare the results with a tree diagram of this situation. Solution: First note that P(A ∩ B) = P(A)P(B) = 0.7 · 0.7 = 0.49 by Theorem 3.2 (since the two seeds are independent of each other in terms of germination). Also, P(A ∩ B 0 ) = P(A) − P(A ∩ B) = 0.7 − 0.49 = 0.21. Similarly, P(A0 ∩ B) = 0.21. The event that at least one seed germinates is equivalent to the event that seed 1 germinates or seed 2 germinates, i.e., A ∪ B. Now, by the Inclusion-Exclusion Principle, P(A ∪ B) = P(A) + P(B) − P(A ∩ B) = 0.7 + 0.7 − 0.49 = 0.91. Furthermore, P(A0 ∩ B 0 ) = 1 − P(A ∪ B) = 1 − 0.91 = 0.09 and hence, we have the following Venn diagram:

3

PROBABILITY AND ORGANIC AGRICULTURE

49

Venn diagram: germination of 2 asparagus seeds. Note that we can also calculate the union as P(A ∪ B) = 0.21 + 0.49 + 0.21 = 0.91 using the Venn diagram. Finally, using the Venn diagram we see that the probability that only one seed germinates is 0.21 + 0.21 = 0.42. Alternatively, we can calculate the probability that only one seed germinates using a tree diagram as P(GN or N G) = P(GN ) + P(N G) = 0.21 + 0.21 = 0.42.

3

PROBABILITY AND ORGANIC AGRICULTURE

50

Tree diagram: germination of 2 asparagus seeds.

Example 3.20. Germination rates of select seeds: http://www.webgrower.com/information/seed_germ_standards.html Suppose you plant an artichoke seed, a beet seed, and a broccoli seed in a row. Draw a tree diagram to help answer the following: 1. What is the probability that all seeds germinate? 2. What is the probability that at least 1 seed germinates? 3. What is the probability that only 1 seed germinates? 4. What is the probability that exactly 2 seeds germinates? Solution:

3

PROBABILITY AND ORGANIC AGRICULTURE

51

Tree diagram 1. P(all germinate) = P(GGG) = 0.6 · 0.65 · 0.75 = 0.2925. 2. P(at least one) = 1 − P(none) = 1 − P(N N N ) = 1 − 0.0350 = 0.965. 3. P(only one) = P(GN N ) + P(N GN ) + P(N N G) = 0.0525 + 0.0650 + 0.1050 = 0.2225. 4. P(exactly two) = P(GGN ) + P(GN G) + P(N GG) = 0.45.

Example 3.21. * Germination rates of select seeds: http://www.webgrower.com/information/seed_germ_standards.html Suppose you plant a field with 5 rows of artichokes each with 10 plants and another field with 2 rows of beets each with 10 plants. 1. What is the probability that all seeds germinate? 2. What is the probability that at least 1 seed germinates? 3. What is the probability that all artichoke seeds germinate or all beet seeds germinate?

Solution:

3

PROBABILITY AND ORGANIC AGRICULTURE

52

1. Let A be the event that all artichokes germinate and B be the event that all beets germinate. Since there are 50 artichoke plants, the probability that all of them germinate is P(A) = P(1st plant germinates AND 2nd plant germinates AND ... AND 50th plant germinates) = 0.6 · 0.6 · ... · 0.6 = (0.6)50 . Similarly, the probability that all beets germinate is P(B) = (0.65)20 . Now, P(All germinate) = P(A ∩ B) = (0.6)50 · (0.65)20 ≈ 1.5 × 10−15 . 2. The probability that at least 1 seed germinates can be calculated using the complement rule: P(At least one germinates) = 1 − P(NOT at least one germinates) = 1 − P(None germinate) = 1 − P(all artichokes AND all beets fail to germinate) = 1 − P(A0 ∩ B 0 ) = 1 − [(0.4)50 · (0.35)20 ] ≈ 1. 3. We use the Inclusion-Exclusion Principle to determine the probability that all artichokes OR all beets germinate: P(all artichokes OR all beets germinate) = P(A ∪ B) = P(A) + P(B) − P(A ∩ B) = (0.6)50 + (0.65)20 − (0.6)50 · (0.65)20 ≈ 0.0002.

3

PROBABILITY AND ORGANIC AGRICULTURE

53

Class Problem 3.3. 1. Suppose you plant 5 chives seeds and make note of which seeds germinate. Find the likelihood that at least one seed germinates (using the complement rule). 2. Use http://www.webgrower.com/information/seed_germ_standards.html to get the germination rate of Swiss Chard. Suppose you plant 2 Swiss chard seeds, numbered 1 and 2. Let A be the event that seed 1 germinates and B the event that seed 2 germinates. (a) Find P (A ∩ B). (b) Find P (A ∪ C) using the inclusion-exclusion principle. (c) Find P (A ∩ B 0 ). (d) Find P (A0 ∩ B). (e) Draw a completely labelled Venn diagram of this scenario. (f) Use the Venn diagram to find P (A ∩ B 0 ), P (A0 ∩ B), and P ((A ∪ B)0 ). 3. Suppose you plant 3 Swiss chard seeds. (a) Draw a tree diagram of the sample space and verify properties 1 and 2 of the probability properties (just below Example 3.9). (b) Let E be the event that the number of seeds that sprout is a prime number. Let F be the event that no seeds sprout. i. ii. iii. iv.

Are E and F disjoint? Justify your answer! Are E and F independent? Justify your answer! Find P (E ∩ F ). P (E ∪ F )

4. Suppose you plant an cauliflower seed, a water cress seed, and a moringa seed (90% germination rate) in a row. Draw a tree diagram to help answer the following: (a) What is the probability that all seeds germinate? (b) What is the probability that at least 1 seed germinates? (c) What is the probability that only 1 seed germinates? (d) What is the probability that exactly 2 seeds germinates? 5. Suppose you plant a field with 6 rows of cauliflower each with 8 plants and another field with 3 rows of carrots each with 10 plants. (a) What is the probability that all seeds germinate? (b) What is the probability that at least 1 seed germinates? (c) What is the probability that all cauliflower seeds germinate or all carrots seeds germinate?

3

PROBABILITY AND ORGANIC AGRICULTURE

3.3

54

Random Variables

Example 3.22. Suppose you plant 4 chive seeds (50% germination rate). Let X denote the number of seeds that actually germinate. What are the possible values of X? Solution: X = 0, 1, 2, 3, or 4.

The variable X in Example 3.22 is called a random variable since its value cannot be determined with certainty. A random variable counts the number of occurrences of a particular event in an experiment. Definition 3.9. A random variable on a sample space S is an assignment of real numbers to the elements of S.

Definition 3.10. The probability density function, (pdf) of a random variable X assigns probabilities to values of X.

Example 3.23. Suppose you plant 4 chive seeds. Let X denote the number of seeds that germinate. Find the pdf of X and graph it. Hint: use a tree diagram. Solution: P(0) =

1 16 ,

P(1) =

4 16 ,

P(2) =

6 16 ,

P(3) =

4 16 ,

Or, we could represent the pdf in the form of a table:

We can also represent a pdf as a graph:

X

P(X)

0

1 16

1

4 16

2

6 16

3

4 16

4

1 16

P(4) =

1 16 .

PROBABILITY AND ORGANIC AGRICULTURE

55

0.5

0.45

0.4

0.35

0.3

P(X)

3

0.25

0.2

0.15

0.1

0.05

0 0

1

2

3

4

X

Example 3.24. Suppose you plant 4 chive seeds. Let X denote the number of seeds that germinate. Suppose you paid $5 for the 4 seeds. If more than half sprout, you can sell them for $25. Let Y denote your profit. Find the pdf of Y . Solution:

Y

P(Y )

-5

11 16

20

5 16

* Simulate Example 3.24 with coin flipping! *

Example 3.25. Suppose you plant 2 asparagus seeds with 70% germination rate. Let X be the number of seeds that germinate. Find the pdf of X. Solution:

3

PROBABILITY AND ORGANIC AGRICULTURE

X

P(X)

0

0.09

1

0.42

2

0.49

56

Example 3.26. Suppose you plant 2 asparagus seeds with 70% germination rate. Suppose further that a single seed costs you $1 and you can sell the plants for $2 each. Let Y be your profit and find the pdf of Y . Solution:

Y

P(Y )

-2

0.09

0

0.42

2

0.49

Definition 3.11. Let X be a random variable with pdf given by

X x1 x2 .. . xn

P(x) p1 p2 .. . pn

The expected value (or mean) of X, denoted by E(X), is defined as E(X) = x1 p1 + x2 p2 + · · · + xn pn . Note: The expected value of a random variable X gives the long-term average value of X. I.e., E(X) is the value you can expect X to be.

3

PROBABILITY AND ORGANIC AGRICULTURE

57

Example 3.27. 1. Suppose you plant 4 chive seeds. Let X denote the number of seeds that actually germinate. Find and interpret E(X). Solution: E(X) = 0 ·

1 4 6 4 1 +1· +2· +3· +4· =2 16 16 16 16 16

In the long run we can expect 2 of the four seeds to sprout on average. Note here that E(X) = number of seeds × germination rate = 4 · 0.5 = 2. 2. Suppose you plant 4 chive seeds which cost you $5 total. If more than half sprout, you can sell them for $25. Let Y denote your profit. Find and interpret E(Y ). Solution: E(Y ) = −5 ·

11 5 + 20 · = $2.82. 16 16

We can expect $2.82 profit (per harvest of 4 chives seeds) on average. 3. Suppose you plant 2 asparagus seeds. Let X be the number of seeds that germinate. Find and interpret E(X). Solution: E(X) = 0 · 0.09 + 1 · 0.42 + 2 · 0.49 = 1.4. In the long run we can expect 1.4 of the 2 seeds to sprout on average. Note here that E(X) = number of seeds × germination rate = 2 · 0.7 = 1.4. 4. Suppose you plant 2 asparagus seeds. Suppose further that a single seed costs you $1 and you can sell the plants for $2 each. Let Y be your profit. Find your expected (long-term per harvest average) profit. Solution: E(Y ) = −2 · 0.09 + 0 · 0.42 + 2 · 0.49 = $0.80.

Example 3.28. Suppose you plant 2 asparagus seeds. Assume each seed costs you $1. Suppose you need to make $1 (profit) per harvest by charging $p per plant. Let Y be your profit and find the pdf of Y . Use the expected value of Y (long-term average per harvest profit) to find the appropriate (minimum) price to charge. Solution: We first construct the pdf for Y :

3

PROBABILITY AND ORGANIC AGRICULTURE

Y

P(Y )

−2

0.09

p−2

0.42

58

2p − 2 0.49

Now set the expected profit equal to $1 ... E(Y ) = −2 · 0.09 + (p − 2) · 0.42 + (2p − 2) · 0.49 = 1 and solve for the price p ... −0.18 + 0.42p − 0.84 + 0.98p − 0.98 = 1 1.4p − 2 = 1 p = 2.143. To ensure that you get at least $1 profit, round this price up to $2.15 per plant. Notice that the penultimate equation above is 1.4p − 2 = 1 where 1.4 is the expected number of germinated seeds (E(X) = 1.4 from Example 3.27(3)) and $2 is the total cost for the two seeds. Hence, we have total revenue minus total cost equals required profit.

Example 3.29. Suppose you plant N seeds with germination rate g. 1. Find the expected number of seeds that will germinate. Solution: E(X) = N g 2. Find your expected revenue if you charge $p per plant. Solution: Revenue = E(X)p = N gp 3. Find your total cost assuming each seed costs $c. Solution: Cost = N c 4. Find your expected profit. Solution: Profit = Revenue − Cost = N gp − N c 5. If you plant 1000 broccoli seeds (75% germination rate) and each seed cost $0.10, how much should you charge per plant in order to make $950.00 profit on the harvest?

3

PROBABILITY AND ORGANIC AGRICULTURE Solution: Note that N = 1000, g = 0.75, c = 0.10. From the profit equation in #4, we have 950 = 1000 · 0.75 · p − 1000 · 0.10 950 = 750p − 100 p = $1.40

* Analyze the paper: Managing the spread of alfalfa stem nematodes (Ditylenchus dipsaci): The relationship between crop rotation periods and pest reemergence https://www.dropbox.com/s/3jmqq0f90w0wy24/ASN.pdf?dl=0

* Visit Sustainable/Organic Farm *

59

3

PROBABILITY AND ORGANIC AGRICULTURE

60

Class Problem 3.4. 1. Suppose you plant 5 chive seeds (50% germination rate). Let X denote the number of seeds that actually germinate. (a) Find the pdf of X (in table form). (b) Graph the pdf. (c) Calculate the expected value of X. 2. Suppose you paid $10 for the 5 seeds. If more than half sprout, you can sell them for $30. Let Y denote your profit. (a) Find the pdf of Y . (b) What is your mean profit (per harvest)? 3. Suppose you plant 2 cucumber seeds with 80% germination rate. Let X be the number of seeds that germinate. (a) Find the pdf of X. (b) How many seeds can you expect to sprout (per harvest) in the long run (over many growing seasons)? 4. Suppose you plant 2 cucumber seeds with 80% germination rate. Suppose further that a single seed costs you $1.75 and you can sell the plants for $3.00 each. What can you expect for your per harvest profit? 5. Suppose you plant 2 cucumber seeds with 80% germination rate. Suppose further that a single seed costs you $1.75 and that you need to make $2.50 (profit) per harvest by charging $p per plant. Find the appropriate (minimum) price to charge. 6. If you plant 20,000 cucumber seeds (80% germination rate) and each seed cost $0.15, how much should you charge for each plant in order to make $5,000.00 profit on the harvest?

3

PROBABILITY AND ORGANIC AGRICULTURE

3.4

61

Exercises

1. Refer to Examples 3.1 and 3.2. Suppose you plant 4 chive seeds in a row and make note of which seeds germinate. Draw a tree diagram of this experiment to determine the sample space and the cardinality of the sample space. 2. Suppose for a particular plant species, a seed either germinates, waits to germinate until the next season, or dies. Assume all these outcomes are equally likely. (a) Suppose you plant 1 seed. i. Determine the sample space (and its cardinality) of this experiment using G for germinates, N for waits to germinate until the next season, and D for dies. ii. Find the probabilities for all outcomes. (b) Suppose you plant 2 seeds. i. ii. iii. iv.

Draw a tree diagram and determine the sample space (and its cardinality) of this experiment. Find the probabilities for all outcomes. What is the probability of the event that both seeds germinate? What is the probability that at least one seed germinates?

(c) Suppose you plant 3 seeds. i. What is the cardinality of the sample space? ii. What is the probability that all seeds germinate? (d) Suppose you plant m seeds. i. What is the cardinality of the sample space? ii. What is the probability that all seeds germinate? 3. * Germination rates of select seeds: http://www.webgrower.com/information/seed_germ_standards.html Suppose you plant in a row a carrot seed, a beet seed, and a cucumber seed. (a) What is the probability that all seeds germinate? (b) What is the probability that at least 1 seed germinates? (c) What is the probability that only 1 seed germinates? (d) What is the probability that exactly 2 seeds germinates? 4. * Germination rates of select seeds: http://www.webgrower.com/information/seed_germ_standards.html Suppose you plant a field with 3 rows of carrots each with 10 plants and another field with 2 rows of beets each with 20 plants. (a) What is the probability that all seeds germinate? (b) What is the probability that at least 1 seed germinates? (c) What is the probability that all carrot seeds germinate or all beet seeds germinate? 5. Refer to Example 3.6. Suppose your field has 5 rows each with 15 alfalfa plants and a single ASN worm is introduced into your field.

3

PROBABILITY AND ORGANIC AGRICULTURE

62

(a) Describe the sample space, S, and find n(S). (b) If O is any outcome in S, find P(O). (c) What is the probability the nematode infests the 15th plant in the 2nd row? (d) What is the probability it infests a plant in an even numbered row? (e) What is the probability it infests an odd numbered plant in any of the rows? (f) What is the probability it infests a plant in row 4, or, the 10th plant of any row? 6. Refer to Example 3.6. Suppose your field has 5 rows each with 15 alfalfa plants and two ASN worms are introduced into your field. Note that it is possible for two worms to infest the same plant. Furthermore, the two worms do not communicate or influence each other’s behavior in any way. (a) What can you say about the relationship between events A = worm 1 infests plant x and B = worm 2 infests plant y? (b) What is the likelihood that both worms infest the first plant in the first row. (c) Let C represent the event that both worms infest the same plant (anywhere in the field). Find P(C). (d) Let D represent the event that both worms infest the same plant in row 1. Find P(D). 7. Suppose you plant 3 chive seeds (50% germination rate). Let X denote the number of seeds that actually germinate. (a) Find the pdf of X (in table form). (b) Graph the pdf. (c) Calculate the expected value of X. 8. Suppose you paid $6 for the 3 seeds. If more than half sprout, you can sell them for $20. Let Y denote your profit. (a) Find the pdf of Y . (b) What is your mean profit (per harvest)? 9. Suppose you plant 2 cucumber seeds with 80% germination rate. Let X be the number of seeds that germinate. (a) Find the pdf of X. (b) How many seeds can you expect to sprout (per harvest) in the long run (over many growing seasons)?

3

PROBABILITY AND ORGANIC AGRICULTURE

63

10. Suppose you plant 2 cucumber seeds with 80% germination rate. Suppose further that a single seed costs you $1.50 and you can sell the plants for $2.50 each. What can you expect for your per harvest profit? 11. Suppose you plant 2 cucumber seeds with 80% germination rate. Suppose further that a single seed costs you $1.50 and that you need to make $1.25 (profit) per harvest by charging $p per plant. Find the appropriate (minimum) price to charge. 12. If you plant 5,000 eggplant seeds (60% germination rate) and each seed cost $0.05, how much should you charge for each plant in order to make $3,200.00 profit on the harvest?

4

4

STATISTICS AND CLIMATE CHANGE

64

Statistics and Climate Change Topics: Data sets Measures of center Measures of spread Regression and Correlation

The Normal distribution The Central Limit Theorem Confidence intervals Hypothesis testing

Climate change, also referred to as global warming, is the observed century-scale rise in the average temperature of the Earth’s climate system and its related effects. Multiple lines of scientific evidence show that the climate system is warming. Many of the observed changes since the 1950s are unprecedented in the instrumental temperature record which extends back to the mid-19th century, and in paleoclimate proxy records covering thousands of years. In 2013, the Intergovernmental Panel on Climate Change (IPCC) Fifth Assessment Report concluded that “It is extremely likely that human influence has been the dominant cause of the observed warming since the mid-20th century.” The largest human influence has been the emission of greenhouse gases such as carbon dioxide, methane and nitrous oxide. Climate model projections summarized in the report indicated that during the 21st century, the global surface temperature is likely to rise a further 0.3 to 1.7◦ C (0.5 to 3.1◦ F) in the lowest emissions scenario, and 2.6 to 4.8◦ C (4.7 to 8.6◦ F) in the highest emissions scenario. These findings have been recognized by the national science academies of the major industrialized nations and are not disputed by any scientific body of national or international standing.

4

STATISTICS AND CLIMATE CHANGE

65

4

STATISTICS AND CLIMATE CHANGE

66

The greenhouse effect is the process by which radiation from a planet’s atmosphere warms the planet’s surface to a temperature above what it would be without its atmosphere. If a planet’s atmosphere contains radiatively active gases (i.e., greenhouse gases) they will radiate energy in all directions. Part of this radiation is directed towards the surface, warming it. Earth’s natural greenhouse effect is critical to supporting life. Human activities, mainly the burning of fossil fuels and clearing of forests, have strengthened the greenhouse effect and caused global warming. Changes in Earth’s Surface Temperature Distribution: https://gfycat.com/silkycorruptconch-climate-change-global-warming-rsciences Global surface temperature - heat map animation: https://commons.wikimedia.org/wiki/File:1880-_Global_surface_temperature_-_heat_map_animation_-_NASA_SVS.webm The Sound of a Changing Climate: https://www.nelsonguda.com/project/threshold/ A Song of Our Warming Planet: https://www.youtube.com/watch?v=5t08CLczdK4

4

STATISTICS AND CLIMATE CHANGE

4.1

67

Descriptive Statistics

Temperature readings from local weather stations. “We are a way for the cosmos to know itself.” – Carl Sagan, Cosmos (1980) Knowledge has been a human pursuit for at least as long as recorded history. We, as a species, appear to have a strong desire to understand what has been, what is and what can be. But how do we generate that understanding? How do we know if what we believe is correct? One way of answering those questions begins with observation. It is through observation that we collect information about reality. So that we have a robust understanding, we generally want a lot of accurate information. That means taking many observations with a reasonable amount of precision. Once we have recorded the information gathered from many careful observations, we have to be able to deduce something from it. We want to somehow build an understanding from the information we gathered. As you’ll see in this section, it can feel overwhelming to look at raw data. Therefore, we will discuss how we can summarize large amounts of information so that we might understand the bigger picture. Statistics is the science of collecting, analyzing, and drawing conclusions from data. Definition 4.1. The entire collection of individuals or objects about which information is desired is called the population of interest. A sample is a subset of the population, selected for study in some prescribed manner.

Definition 4.2. A simple random sample (SRS) of size n consists of n individuals from the population chosen in such a way that every set of n individuals has an equal chance to be the sample actually selected.

4

STATISTICS AND CLIMATE CHANGE

68

Example 4.1. Temperature Data (South Bend, IN weather stations) • https://www.wunderground.com/weather/us/in/south-bend Consider the sample (SRS) of temperatures taken from 30 weather stations in South Bend at 6pm on January 27th, 2018: 45.9 42.6 43.0

43.8 42.2 42.7

44.4 41.4 41.2

41.9 43.5 42.7

42.5 43.3 42.2

42.7 43.0 42.0

44.1 41.7 41.8

42.7 44.1 42.5

43.7 43.4 41.0

40.9 42.7 44.0

1. Describe the population in this scenario? Solution: The set of all possible temperatures in the entire city of South Bend (on that particular day and time). 2. Make a histogram of this data set. Solution: We first construct a frequency table: Bin (range) [40, 41) [41, 42) [42, 43) [43, 44) [44, 45) [45, 46)

Frequency (count) 2 5 11 8 3 1

In a frequency table, the Bin indicates a range of values (temperatures in this case) and the Frequency is the count of values (temperatures) that fall within that range. Note the use of interval notation wherein [a, b) is the set of all numbers between a and b, including a but not including b. For example, the temperature reading of 43 is counted in bin [43, 44), not in [42, 43). A histogram is a bar graph of a frequency table:

STATISTICS AND CLIMATE CHANGE

69

12

10

8

Frequency

4

6

4

2

0 39

40

41

42

43

Temperature

Note that the center appears to be around 43.

44

45

46

47

4

STATISTICS AND CLIMATE CHANGE

70

In statistics we are interested in understanding the distribution of values in a sample (and hence, the populations from with the sample came from). A distribution can be visualized by a histogram (see Example 4.1) or number line graph (see Example 4.2). A distribution can be summarized by measures of center (e.g., mean and median) and spread (e.g., range and standard deviation). Measures of center: mean and median Definition 4.3. The mean (or average) of a sample (data set) {x1 , x2 , ..., xn }, denoted by x ¯, is given by x1 + x2 + · · · + xn x ¯= . n

Definition 4.4. The median, denoted by M , of a (sorted) data set with n values is the middle value if n is odd and the mean of the two middle values if n is even. Note: To find the median, the data set must first be sorted in increasing order.

Example 4.2. Given the data set S = {5.0, 4.2, 10.1, 9.8, 6.5, 6.0, 37.2}, 1. Graph the data. Solution: We can graph a data set on a number line where each value is represented as a point:

2. Find the mean and median. Solution: Mean: x ¯=

5.0 + 4.2 + 10.1 + 9.8 + 6.5 + 6.0 + 37.2 ≈ 11.3 7

Median: Since the sample size is 7 (n(S) = 7), we just sort the data and take the middle number: Sorted data: S = {4.2, 5.0, 6.0, 6.5, 9.8, 10.1, 37.2}, M = 6.5

4

STATISTICS AND CLIMATE CHANGE

71

The data point 37.2 is an outlier (it is far from the rest of the values). Note how much larger the mean is than the median. The outlier has a strong effect on the mean, pulling it to the right, compared to the median which only sees another value (37.2) but does not consider what the actual value is. 3. If we remove the outlier, the data set is Sˆ = {4.2, 5.0, 6.0, 6.5, 9.8, 10.1}. The mean of Sˆ is

4.2 + 5.0 + 6.0 + 6.5 + 9.8 + 10.1 ≈ 6.9 6 and the median is, since now there are only 6 (an even number) data points, x ¯=

M=

6.0 + 6.5 ≈ 6.25. 2

Note that without the outlier, the mean and median are much closer. Furthermore, the mean changed a lot more than the median, hence, we say the mean is sensitive to outliers while the median is not.

Example 4.3. Refer to Example 4.1. Find the mean and median of the South Bend temperature data.

x¯ =

45.9 + 42.6 + 43.0 + 43.8 + · · · + 42.7 + 44.0 ≈ 42.8 30

To find the median, first order the data. Since the sample size is an even number (30 temps), the median is the average of the two middle numbers: M=

42.7 + 42.7 = 42.7 2

Measures of variability (spread): range, standard deviation, and interquartile range

4

STATISTICS AND CLIMATE CHANGE

72

Definition 4.5. The range of an (ordered) data set {xmin , ..., xmax }, denoted by R, is the distance between the smallest and largest values, R = xmax − xmin . Example 4.4. Find the range of the sample S = {5.0, 4.2, 10.1, 9.8, 6.5, 6.0, 37.2}. Solution: The range is the largest value minus the smallest value: R = 37.2 − 4.2 = 33. Suppose the outlier (37.2) is suspected to be due to some inaccuracy in data collection. If we neglect this value, the range is R = 6.0 − 5.0 = 1.0. Notice that the data is much less spread out (i.e., less variable) without the outlier and thus, the range is much smaller. Definition 4.6. The standard deviation, denoted by s, of a sample {x1 , x2 , ..., xn } is given by r (x1 − x ¯)2 + (x2 − x ¯)2 + · · · + (xn − x ¯ )2 . s= n The standard deviation is an approximate measure of the average deviation (of the data) from the mean – it is the square root of the average squared deviation from the mean. Example 4.5. Find the standard deviation of the sample S = {5.0, 4.2, 10.1, 9.8, 6.5, 6.0, 37.2}. Solution: s= s

(5.0 − 11.3)2 + (4.2 − 11.3)2 + (10.1 − 11.3)2 + (9.8 − 11.3)2 + (6.5 − 11.3)2 + (6.0 − 11.3)2 + (37.2 − 11.3)2 7

= 10.8. Suppose the outlier (37.2) is suspected to be due to some inaccuracy in data collection. If we neglect this value, the standard deviation is s = 2.3. Notice that the data is much less spread out (i.e., less variable) without the outlier and thus, the standard deviation is much smaller. Definition 4.7. The 1st and 3rd quartiles, denoted Q1 and Q3 , (along with the median) divide a sample into 4 subsets of equal size (number of values). I.e., Q1 is the median of the left half of the sample, and Q3 is the median of the right half.

4

STATISTICS AND CLIMATE CHANGE

73

The interquartile range (IQR) is the distance between the first and third quartiles, IQR = Q3 − Q1 . Example 4.6. Find the quartiles and interquartile range of the sample S = {5.0, 4.2, 10.1, 9.8, 6.5, 6.0, 37.2}. Solution: First sort the data: S = {4.2, 5.0, 6.0, 6.5, 9.8, 10.1, 37.2}. Then find the median: M = 6.5 (middle value). The first quartile is the median of the lower half of the sample, {4.2, 5.0, 6.0}: Q1 = 5.0, and the third quartile is the median of the upper half of the sample, {9.8, 10.1, 37.2}: Q3 = 10.1. Hence, the interquartile range is IQR = Q3 − Q1 = 10.1 − 5.0 = 5.1. Suppose the outlier (37.2) is suspected to be due to some inaccuracy in data collection. If we neglect this value, the new data set is S = {4.2, 5.0, 6.0, 6.5, 9.8, 10.1} which has median M = 6.25, Q1 = 5.0, and Q3 = 9.8. Thus, IQR = 9.8 − 5.0 = 4.8. Notice that the data is less spread out (i.e., less variable) without the outlier and thus, the interquartile range is smaller.

Notice that the standard deviation changed much more than the IQR upon removal of the outlier. We say that the standard deviation is sensitive to outliers while the IQR is not. This is because s depends on the actual values in the sample, including the mean which is sensitive to outliers, and the quartiles are really just medians which are not sensitive to outliers. Example 4.7. Refer to Example 4.1. Find the range, standard deviation, quartiles, and interquartile range of the (Jan 27th) South Bend temperature data. R = 45.9 − 40.9 = 5 r s=

(45.9 − 42.8)2 + (42.6 − 42.8)2 + · · · + (44.0 − 42.8)2 = 1.1 30

To find the quartiles, first sort the data:

4

STATISTICS AND CLIMATE CHANGE

74

Then find the median:

42.7 + 42.7 = 42.7 2 The first quartile is the median of the first half of the data: M=

Q1 = 42. The third quartile is the median of the last half of the data: Q3 = 43.5. Now, the interquartile range is IQR = Q3 − Q1 = 43.5 − 42 = 1.5.

4

STATISTICS AND CLIMATE CHANGE

75

Definition 4.8. A parameter is a quantity that measures some aspect of a population. A statistic is a quantity that measures some aspect of a sample. Note: The value of a parameter is rarely known. One goal of statistics is to estimate the value of a population parameter with a sample statistic. Greek letters are commonly used to denote parameters. E.g., the mean of a populations is denoted by µ and the standard deviation is denoted by σ. Example 4.8. Refer to Example 4.1. We may use the sample mean temperature, x ¯ = 42.8 (in South Bend on Jan 27th, 2018), to estimate the actual mean temperature (population mean), µ, across the entire city.

Definition 4.9. The five number summary of a sample consists of the following summary statistics: minimum value, Q1 , Median, Q3 , maximum value A box plot is a graph of a five number summary:

Definition 4.10. A value in a sample is an outlier if it falls outside the interval [Q1 − 1.5IQR, Q3 + 1.5IQR].

Example 4.9. Determine any outliers, calculate the five number summary, and create a box plot for the sample S = {5.0, 4.2, 10.1, 9.8, 6.5, 6.0, 37.2}. Solution:

4

STATISTICS AND CLIMATE CHANGE From Example 4.6, we have Q1 = 5.0, Q3 = 10.1, and hence, IQR = 5.1. The outlier range is thus, [Q1 − 1.5IQR, Q3 + 1.5IQR] = [5.0 − 1.5 · 5.1, 10.1 + 1.5 · 5.1] = [−2.65, 17.75]. The only value falling outside this interval is 37.2. Therefore, 37.2 is the only outlier. The five number summary is min = 4.2 Q1 = 5.0 M = 6.5 Q3 = 10.1 max = 37.2 The box plot (not accounting for outliers) is

Box plots may be presented vertically. The box plot for this sample which accounts for outliers is

76

4

STATISTICS AND CLIMATE CHANGE

77

Example 4.10. Temperature Data (South Bend, IN weather stations) • https://www.wunderground.com/weather/us/in/south-bend Consider the sample of temperatures taken from 30 weather stations in South Bend at 6pm on January 27th, 2018: 45.9 42.6 43.0

43.8 42.2 42.7

44.4 41.4 41.2

41.9 43.5 42.7

42.5 43.3 42.2

42.7 43.0 42.0

44.1 41.7 41.8

42.7 44.1 42.5

43.7 43.4 41.0

40.9 42.7 44.0

Find any outliers. Solution: From Example 4.7, we have Q1 = 42, Q3 = 43.5, and IQR = 1.5. Thus, the lower end of the outlier interval is Q1 − 1.5IQR = 42 − 1.5 · (1.5) = 42 − 2.25 = 39.75

4

STATISTICS AND CLIMATE CHANGE

78

and the upper end of the outlier interval is Q3 + 1.5IQR = 43.5 + 1.5 · (1.5) = 43.5 + 2.25 = 45.75. No values fall below the lower cutoff. However, the value 45.9 is larger than the upper cutoff. Hence, 45.9 is the only outlier.

Example 4.11. Temperature Data (South Bend, IN weather stations) • https://www.wunderground.com/weather/us/in/south-bend Consider the sample of temperatures taken from 30 weather stations in South Bend at 3pm on January 31st, 2018: 39.2 40.1 41.1

39.4 38.7 38.9

39.0 39.8 39.6

40.2 39.9 40.0

39.5 40.1 39.4

38.5 40.2 39.0

39.7 39.9 40.1

40.5 39.9 40.5

39.7 40.1 41.9

40.6 40.3 40.6

Use statistical software to 1. Make a histogram and box plot. 2. Find the mean and standard deviation. 3. Find the 5-number summary. 4. Find the range and interquartile range. 5. Identify any outliers. 6. Compare/contrast the center and spread (distribution) with that of Jan 27th from Example 4.1. Solution: 1. Histogram and box plot (for both Jan 27th and Jan 31st temperature samples):

4

STATISTICS AND CLIMATE CHANGE

2. Mean and standard deviation: x ¯ = 39.88 s = 0.72 3. 5-number summary: min = 38.5 Q1 = 39.4 M = 39.9 Q3 = 40.2

79

4

STATISTICS AND CLIMATE CHANGE

80

max = 41.9 4. Range and interquartile range: R = 3.4 IQR = 0.8 5. Identify any outliers. 41.9 is an outlier. 6. Compare/contrast the center and spread (distribution) with that of Jan 27th from Example 4.1:

Mean Median Std. Dev. IQR

Jan 27 42.8 42.7 1.1 1.5

Jan 31 39.88 39.9 0.72 0.8

The histograms and box plots indicate that the distribution of temperatures was higher, as well as more spread out (higher variability) on the 27th. The mean and median were both higher on the 27th as well, further supporting the notion that the center of the distribution was higher on the 27th. Furthermore, the standard deviation of temperatures (as well as interquartile range) on the 27th were higher than that of the 31st, which implies that temperatures were more variable on the 27th (as indicated by the histograms and box plots).

4

STATISTICS AND CLIMATE CHANGE

Class Problem 4.1. Descriptive Statistics and Atmospheric CO2 Concentrations 1. Given the sample of atmospheric CO2 concentrations taken in the 1970s, S = {325, 323, 330, 340, 337, 320, 334, 339, 328, 322}, calculate the following by hand: (a) Measures of Center: i. Mean 1970s CO2 concentration. ii. Median 1970s CO2 concentration. (b) Measures of Variability: i. Range. ii. Standard deviation. iii. Interquartile range. (c) Five-number summary. (d) Identify any outliers. (e) Data visualizations: i. Histogram. ii. Box plot. 2. Given the sample of atmospheric CO2 concentrations taken in the 1990s, S = {355, 343, 364, 365, 363, 368, 366, 357, 358, 361}, calculate the following using statistical software: (a) Measures of Center: i. Mean 1990s CO2 concentration. ii. Median 1990s CO2 concentration. (b) Measures of Variability: i. Range. ii. Standard deviation. iii. Interquartile range. (c) Five-number summary. (d) Identify any outliers. (e) Data visualizations: i. Histogram. ii. Box plot. 3. Compare/contrast the distribution of 1990s CO2 concentrations with that of the 1970s.

81

4

STATISTICS AND CLIMATE CHANGE

82

4. Suppose you wish to estimate the average municipal solid waste generated by an American household in a day. You measure the daily waste production of 1000 households and find that the average is 11.5 lbs. (a) Describe the population of interest in this study. (b) Describe the sample. Indicate the sample size. (c) Describe and identify (if possible) any parameters. Use appropriate notation! (d) Describe and identify (if possible) any statistics. Use appropriate notation!

4

4.2

STATISTICS AND CLIMATE CHANGE

Linear Regression and Correlation

83

4

4.3

STATISTICS AND CLIMATE CHANGE

84

The Normal Distribution

* Galton Board * https://www.dropbox.com/s/rz7vkjdn9s85ko7/bidjXXg.mp4?dl=0 A probability distribution is a description of a population or random phenomenon in terms of the probabilities of events. Distributions can be used to model the overall shape of a data set histogram, i.e., the pattern of the data in terms of how frequently certain values occur. For example, a bell curve (also called the normal distribution) might be a good approximation of the overall pattern of temperature readings at a particular location and time.

4

STATISTICS AND CLIMATE CHANGE

85

A bell curves capture the general pattern in (South Bend) temperature data. For the top figure above, the normal curve, i.e., bell curve, has mean µ = 42.8 and standard deviation σ = 1.1. I.e., the distribution of South Bend temperatures (for Jan 27th) is approximately N (42.8, 1.1). Here we are using the sample mean and standard deviation, x ¯ and s, to estimate the population mean and standard deviation, µ and σ, in the normal curve (see Examples 4.3 and 4.7 for x ¯ and s calculations). Similarly for Jan 31st (the bottom figure), the distribution is approximately N (39.9, 0.72) (see Example 4.6 for x ¯ and s calculations). * The normal distribution is one of the most important and widely used distributions in probability and statistics – because it is the most ubiquitous in nature (we will see why in the next section).

Definition 4.11. The normal probability density function (pdf), or “bell curve,” is given by the function: (x−µ)2 1 y = √ e− 2σ2 σ π

where µ is the mean (center) of the distribution and σ is the standard deviation (spread) of the distribution. Notation: The normal distribution with mean µ and standard deviation σ is denoted by N (µ, σ). Note: Normal distribution = Gaussian distribution = Bell curve (these terms all mean the same thing) Note: The standard normal distribution has mean µ = 0 and standard deviation σ = 1.

4

STATISTICS AND CLIMATE CHANGE

86

0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 -4

-3

-2

-1

0

1

2

3

4

Graph of standard normal distribution (mean: µ = 0 and standard deviation: σ = 1)

µ-3σ

µ-2σ

µ- σ

µ

µ+ σ

µ+2 σ

µ+3 σ

Graph of the normal distribution N (µ, σ) (mean: µ and standard deviation: σ) Notice that there is an inflection point on the curve at exactly 1 standard deviation from the mean. An inflection point is a point where a curve changes from increasing slope to decreasing slope, or vice versa.

Example 4.12. Temperature anomaly distributions.

4

STATISTICS AND CLIMATE CHANGE

87

The figure below illustrates that temperature anomalies (deviations in ◦ F from the 1951-1980 average) follow a normal distribution.

1. Estimate the mean and standard deviation of the 2001-2011 distribution. Solution:

4

STATISTICS AND CLIMATE CHANGE

88

We find the mean by locating the approximate center of the purple histogram: µ ≈ 1.2 One standard deviation away from the mean is found by locating the inflection points. The inflection point to the left of the mean is approximately 0.1; this is a distance of 1.1 units away from the mean (1.2 − 0.1 = 1.1). Hence, then standard deviation is approximately 1.1: σ ≈ 1.1 2. The mean temperature anomaly for the 1980s appears to be approximately 0.25◦ F. The standard deviation appears to be approximately 0.08◦ F. Sketch and label the normal probability density function (pdf) for this distribution. Solution:

4

STATISTICS AND CLIMATE CHANGE

89

The following graphs illustrate the effect of changing the mean or standard deviation of a normal distribution. Increasing or decreasing the mean shifts the bell curve right or left, respectively. 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 -4

-3

-2

-1

0

N (1, 1)

1

2

3

4

4

STATISTICS AND CLIMATE CHANGE

90

0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 -4

-3

-2

-1

0

1

2

3

4

1

2

3

4

N (1.7, 1)

0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 -4

-3

-2

-1

0

N (−2, 1) Increasing (decreasing) the standard deviation increases (decreases) the spread of the distribution. I.e., the bell curve gets wider (and shorter) with larger standard deviation. 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -5

-4

-3

-2

-1

0

1

2

3

4

5

4

STATISTICS AND CLIMATE CHANGE

91 N (0, 0.5)

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -5

-4

-3

-2

-1

0

1

2

3

4

5

1

2

3

4

5

N (0, 1)

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -5

-4

-3

-2

-1

0

N (0, 1.5)

Example 4.13. Refer to Example 4.1. The normal distribution with mean µ = 42.8 and standard deviation σ = 1.1 is a good approximation of the general pattern in the (Jan 27th) South Bend temperature data. That is, we can make predictions about the temperature using the normal distribution.

4

STATISTICS AND CLIMATE CHANGE

92

For example, the probability that the temperature is 44◦ (±0.5◦ ) is about 0.2. What is the chance that the temperature is 41◦ (±0.5◦ )? Solution: Using the graph, we see that 0.1 is the probability corresponding to the temperature value of 41◦ . Thus, the likelihood that that the temperature is 41◦ approximately 10%.

4

STATISTICS AND CLIMATE CHANGE

93

Calculating Normal Distribution Probabilities: The probability that a normal random variable, x, with mean µ and standard deviation σ takes a value between a and b is given by the area under the normal density function between x = a and x = b:

P(a 2) = 0.9772 ≈ 98%. 2. What is the probability that the temperature change by 2100 will be within one standard deviation of the mean? Solution:

4

STATISTICS AND CLIMATE CHANGE

100

P(2.4 < ∆T < 3.2) = 0.6827 ≈ 68%.

We can also calculate the value of a normal random variable, x, corresponding to a particular area under a normal distribution using the inverse normal calculator: * Inverse normal probability calculator: http://onlinestatbook.com/2/calculators/inverse_normal_dist.html

Example 4.16. Global Warming (Refer to Example 4.15) The multi-model mean surface warming projected for the year 2100 using emissions scenario A1B is approximately 2.8◦ C with a standard deviation of 0.4◦ C. Assume that climate model temperature predictions are normally distributed. 1. Only 5% of all warming values are above what temperature? Solution: Here we must solve P(∆T > x) = 0.05 for x. From the inverse normal probability calculator we have P(∆T > 3.458) = 0.05.

4

STATISTICS AND CLIMATE CHANGE

101

Thus, 5% of ∆T values are above 3.46◦ C. 2. 95% of warming values are within how many standard deviations of the mean? Solution: We must first solve P(x1 < ∆T < x2 ) = 0.95 for x1 and x2 . From the inverse normal probability calculator we have P(2.016 < ∆T < 3.584) = 0.95.

4

STATISTICS AND CLIMATE CHANGE

102

Since 3.584 is 0.784 away from the mean: 3.584 − 2.8 = 0.784, 95% of ∆T values are within

0.784 0.784 = = 1.96 σ 0.4

standard deviations from the mean.

Example 4.17. Global Warming (Refer to Example 4.15) Using the figure in Example 4.15, estimate the multi-model mean surface warming projected for the year 2140 using emissions scenario B1. Also estimate the standard deviation. Assume that climate model temperature predictions are normally distributed.

4

STATISTICS AND CLIMATE CHANGE

103

1. What is the probability that the temperature will increase by less than 1◦ C? Solution: First we estimate the mean warming value projected by all climate models using the figure in Example 4.15: µ ≈ 1.9 Similarly, we estimate the standard deviation of all projections: σ ≈ 0.45 Now, we find P(∆T < 1) = 0.0228 using the normal probability calculator. 2. What is the probability that the temperature change by 2140 will be between 1◦ C and 2◦ C? Solution: P(1 < ∆T < 2) = 0.5652. 3. There is a 5% chance that increase in temperature, ∆T , will be less than what temperature? Solution: We use the inverse normal probability calculator to find P(∆T < 1.16) = 0.05. Thus, there is a 5% change that warming will be less than 1.16◦ C. 4. 90% of warming values are within how many standard deviations of the mean? Solution: We must first solve P(x1 < ∆T < x2 ) = 0.90 for x1 and x2 . From the inverse normal probability calculator we have P(1.16 < ∆T < 2.64) = 0.90. Since 2.64 is 0.74 away from the mean (2.64 − 1.9 = 0.74), 90% of ∆T values are within 0.74 0.74 σ = 0.4 = 1.64 standard deviations from the mean.

Example 4.18. Mean Global Temperature https://climate.nasa.gov/vital-signs/global-temperature/ The average temperature in 2015 is estimated at approximately 58◦ F with a standard deviation of 0.2◦ F. Assume global surface temperatures are normally distributed in any given year. Suppose a single temperature, denoted by T , is taken at random in 2015. 1. What is the probability that the temperature is within 1 standard deviation of the mean? Graph

4

STATISTICS AND CLIMATE CHANGE

104

the (fully labelled) Normal probability density function (bell curve) along with the corresponding probability. Solution: P(µ − σ < x < µ + σ) = P(58 − 0.2 < x < 58 + 0.2) = P(57.8 < x < 58.2) = 0.6827 ≈ 68%

2. Find P(µ − 2σ < T < µ + 2σ). Graph the pdf along with the corresponding probability.

4

STATISTICS AND CLIMATE CHANGE

105

Solution: P(µ − 2σ < x < µ + 2σ) = P(58 − 2 · 0.2 < x < 58 + 2 · 0.2) = P(58 − 0.4 < x < 58 + 0.4) = P(57.6 < x < 58.4) = 0.9545 ≈ 95%

3. What percent of temperatures are no more than 3 standard deviations from average? Graph the pdf along with the corresponding probability. Solution: P(µ − 3σ < x < µ + 3σ) = P(58 − 3 · 0.2 < x < 58 + 3 · 0.2) = P(58 − 0.6 < x < 58 + 0.6) =

4

STATISTICS AND CLIMATE CHANGE

106

P(57.4 < x < 58.6) = 0.9973 ≈ 99.7%

The results from the above calculations are called the Empirical Rule. It is also known as the 68-9599.7 Rule.

Theorem 4.1. The Empirical Rule Suppose x is an observation taken from a Normal distribution with mean µ and standard deviation σ. Then

4

STATISTICS AND CLIMATE CHANGE

107

1. 68% of all observations fall within 1 standard deviation of the mean, i.e., P(µ − σ < x < µ + σ) = 0.68 2. 95% of all observations fall within 1 standard deviation of the mean, i.e., P(µ − 2σ < x < µ + 2σ) = 0.95 3. 99.7% of all observations fall within 1 standard deviation of the mean, i.e., P(µ − 3σ < x < µ + 3σ) = 0.997

Example 4.19. The Empirical Rule and Sea Level Rise Global sea level has been rising over the past century, and the rate has increased in recent decades. In 2014, global sea level was 2.6 inches above the 1993 average - the highest annual average in the satellite record (1993-present). Sea level continues to rise at a rate of about one-eighth of an inch per year. Higher sea levels mean that deadly and destructive storm surges push farther inland than they once did, which also means more frequent nuisance flooding. Disruptive and expensive, nuisance flooding is estimated to be from 300 percent to 900 percent more frequent within U.S. coastal communities than it was just 50 years ago.

4

STATISTICS AND CLIMATE CHANGE

108

Mean sea level anomaly relative to 1880 Assume sea level measurements in a given year are normally distributed. For 1981, the mean sea level anomaly (relative to 1880) is approximately 180 mm with a standard deviation of approximately 2 mm. Let x represent the value of a random sea level measurement from 1981. Use the Empirical Rule to answer the following: 1. What is the chance that x is less than 182 mm? Solution:

4

STATISTICS AND CLIMATE CHANGE

109

From the graph above, we see that P(x < 182) = 0.16 + 0.68 = 0.84 2. 95% of all measurements are between what two sea level heights (centered around the mean)? Solution: The Empirical Rule tells us that 95% of all observations are within 2σ of µ. Thus, 95% of all measurements are between µ − 2σ = 180 − 2 · 2 = 180 − 4 = 176 mm and µ + 2σ = 180 + 2 · 2 = 180 + 4 = 184 mm.

4

STATISTICS AND CLIMATE CHANGE

110

3. Find the probability that a random sea level measurement from 1981 is greater than 186 mm. Solution: P(x > 186) =

100% − 99.7% = 0.15% 2

* Plinko! * https://www.dropbox.com/s/rz7vkjdn9s85ko7/bidjXXg.mp4?dl=0

4

STATISTICS AND CLIMATE CHANGE

111

Diffusion of Air Pollution (e.g., greenhouse gases)

https://www.shutterstock.com/video/clip-27964867-stock-footage-smoking-power-plant-chimney-closeup.html?src=rel/23233789:0/3p

The Gaussian plume model: https://www.dropbox.com/s/iwyynng2lm0n5lh/Smokestack-plumes.pdf?dl=0

4

STATISTICS AND CLIMATE CHANGE

112

Class Problem 4.2. The Normal Distribution and Global Warming 1. The normal distribution with mean µ = 42.8 and standard deviation σ = 1.1 is a good approximation of the general pattern in the (Jan 27th) South Bend temperature data. That is, we can make predictions about the temperature using the normal distribution.

(a) Using only the graph, the probability that the temperature is 43◦ (±0.5◦ ) is about (b) What is the chance that the temperature is

40◦

.

(±0.5◦ )?

2. The average surface temperature in 1975 is estimated at approximately 57◦ F with a standard deviation of 0.1◦ F. Assume global surface temperatures are normally distributed in any given year. Suppose a single temperature, denoted by T , is measured at random in 1975. (a) Sketch a graph of the probability density function (pdf), i.e., bell curve, for 1975 global surface temperature. Make sure to label 3 standard deviations from the mean (locating the first std. dev. by the inflection point). (b) What is the probability that the temperature is between 56.95◦ F and 57.05◦ F? Sketch a graph of the pdf including appropriate shading corresponding to the probability. (c) Find P(T < 57.1). Sketch a graph of the pdf including appropriate shading corresponding to the probability. (d) Find the 90th percentile for temperatures in 1975. (The 90th percentile is the temperature that 90% of all other temperatures are below, and 10% are above.) Sketch a graph of the pdf including appropriate shading.

4

STATISTICS AND CLIMATE CHANGE

113

3. Using the figure in Example 4.15 estimate the multi-model mean surface warming projected for the year 2080 using emissions scenario A2 as well as the standard deviation. Assume that climate model temperature predictions are normally distributed. (a) What is the probability that the temperature will increase by more than 3◦ C? (b) What is the probability that the temperature change by 2080 will be between 1◦ C and 2◦ C? (c) There is a 5% chance that all warming values, ∆T , are below what temperature? (d) 99% of warming values are within how many standard deviations of the mean? 4. Assume sea level measurements in a given year are normally distributed. For 2012, the mean sea level anomaly (relative to 1880) is approximately 250 mm with a standard deviation of approximately 5 mm. Let x represent the value of a random sea level measurement from 2012. Sketch an appropriate normal pdf and use the Empirical Rule to answer each of the following: (a) Find the probability that a random sea level measurement from 2012 is greater than 250 mm. (b) Find the probability that a random sea level measurement is less than 250 mm. (c) Find the probability that a measurement is less than 240 mm. (d) What is the chance that x is between 245 and 265 mm?

4

STATISTICS AND CLIMATE CHANGE

4.4

114

The Distribution of a Sample Mean

A Year in the Life of Earth’s CO2: https://www.youtube.com/watch?v=x1SgmFa0r04

Example 4.20. Atmospheric carbon dioxide (CO2 ) concentrations. A greenhouse gas is a gas in an atmosphere that absorbs and emits radiant energy within the thermal infrared range. This process is the fundamental cause of the greenhouse effect. The primary greenhouse gases in Earth’s atmosphere are water vapor, carbon dioxide, methane, nitrous oxide, and ozone. Without greenhouse gases, the average temperature of Earth’s surface would be about -18C (0F), rather than the present average of 15C (59F). The greenhouse effect is the process by which radiation from a planet’s atmosphere warms the planet’s surface to a temperature above what it would be without its atmosphere. If a planet’s atmosphere contains greenhouse gases, they will radiate energy in all directions. Part of this radiation is directed towards the surface, warming it. The intensity of the downward radiation – that is, the strength of the greenhouse effect – will depend on the atmosphere’s temperature and on the amount of greenhouse gases that the atmosphere contains.

4

STATISTICS AND CLIMATE CHANGE

115

Human activities since the beginning of the Industrial Revolution (around 1750) have produced a 40% increase in the atmospheric concentration of carbon dioxide (CO2), from 280 ppm in 1750 to 406 ppm in early 2017. This increase has occurred despite the uptake of more than half of the emissions by various natural “sinks” involved in the carbon cycle. The vast majority of anthropogenic carbon dioxide emissions (i.e., emissions produced by human activities) come from combustion of fossil fuels, principally coal, oil, and natural gas, with comparatively modest additional contributions coming from deforestation, changes in land use, soil erosion, and agriculture.

4

STATISTICS AND CLIMATE CHANGE

116

It has been estimated that if greenhouse gas emissions continue at their present rate, Earth’s surface temperature could exceed historical values as early as 2047, with potentially harmful effects on ecosystems, biodiversity and the livelihoods of people worldwide. Recent estimates also suggest that at current emission rates the Earth could pass a threshold of 2C global warming, which the United Nations’ IPCC designated as the upper limit to avoid “dangerous” global warming, by 2036. Measurements from Mauna Loa Observatory: ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt

4

STATISTICS AND CLIMATE CHANGE

117

4

STATISTICS AND CLIMATE CHANGE

Take 4 samples of size n = 40 CO2 concentration measurements from the 1960s:

The mean of these sample means, µx¯ =

320.38 + 320.76 + 320.65 + 321.35 = 320.79, 4

is a good estimate of the true mean, µ, atmospheric CO2 concentration in the 1960s. * Animation: https://www.dropbox.com/s/z32y3n4czyht0na/co2_dist.m?dl=0

118

4

STATISTICS AND CLIMATE CHANGE

119

Example 4.21. Sample means of 1960s Mauna Loa CO2 data. Take 500 samples of size n = 30, n = 60, n = 90, and n = 120 each:

Note that the distributions of the sample mean are approximately bell-shaped (normal) with more or less constant mean µx¯ ≈ 320.3. However, the standard deviation σx¯ (spread) of the distributions decreases as the sample size n increases.

The previous example illustrates that the more data we collect (the larger n is), the more we can “narrow down” where the true mean µ is. I.e., the larger the sample, the smaller the variation and thus, less uncertainty about the value of the true mean. These ideas are summed up and formalized in the Central Limit Theorem.

4

STATISTICS AND CLIMATE CHANGE

120

Theorem 4.2. The Central Limit Theorem (CLT) Let x ¯ denote the mean of the observations in a random sample of size n from a population having mean µ and standard deviation σ. If the n is sufficiently large, then the sampling distribution of x ¯ is approximately normal with mean µx¯ = µ and standard deviation σx¯ = √σn . The Central Limit Theorem tells us that as sample sizes get larger, the sampling distribution of the mean will become normally distributed, even if the data within each sample are not normally distributed. The CLT is important in statistics because it allows us to safely assume that the sampling distribution of the mean will be normal in most cases. This means that we can take advantage of statistical techniques that assume a normal distribution, as we will see in the next section. Example 4.22. Atmospheric CO2 concentrations from Mauna Loa Observatory. Suppose it is known that the actual mean atmospheric CO2 concentration in the 1960s is 320.3 ppm with a standard deviation of 3 ppm. Take a sample of 60 CO2 concentrations. 1. What is the approximate shape of the distribution of the sample mean? Solution: First note that we know µ = 320.3 and σ = 3 for this population. Thus, the sampling distribution of x ¯ will be approximately bell shaped (normal) with mean µx¯ = µ = 320.3 and

σ 3 σx¯ = √ = √ ≈ 0.3873. n 60

Note that µx¯ = 320.3 and σx¯ = 0.3873 agrees with what was calculated in the above figure where n = 60. 2. What is the probability that the sample mean will be less than 319.5 ppm? Solution: Since we know from the CLT that the distribution of x ¯ is approximately N (320.3, 0.3873), we can calculate this probability from the normal probability calculator at http://onlinestatbook.com/2/calculators/normal_dist.html

4

STATISTICS AND CLIMATE CHANGE

We find that P(¯ x < 319.5) = 0.0194 ≈ 2%.

3. What is the probability that the sample mean will be between 319.5 and 320.2 ppm? Solution:

121

4

STATISTICS AND CLIMATE CHANGE

122

P(319.5 < x ¯ < 320.2) = 0.3787 ≈ 38%.

Example 4.23. Mean global temperature. https://climate.nasa.gov/vital-signs/global-temperature/ Assume global surface temperatures are normally distributed in any given year. Graph the (fully labelled) probability density function (bell curve) for parts (a) and (b) of (1) and (2) along with the corresponding probabilities.

4

STATISTICS AND CLIMATE CHANGE

123

1. The average surface temperature in 1975 is estimated at approximately 57◦ F with a standard deviation of 0.1◦ F. (a) Suppose a single temperature is taken at random in 1975. What is the probability that the temperature is between 56.95◦ F and 57.05◦ F? Solution: The distribution of temperatures is N (57, 0.1). Thus, from the normal probability calculator, we have

P(56.95 < x < 57.05) = 0.3829 ≈ 38.2%.

4

STATISTICS AND CLIMATE CHANGE

124

(b) Suppose a random sample of 16 temperatures is taken in 1975. What is the probability that the sample mean temperature is between 56.95◦ F and 57.05◦ F? Solution: Note that we are given a sample of size n = 16. Thus, we know from the CLT that the distribution of x ¯ is approximately normal with mean µx¯ = µ = 57 and standard deviation

σ 0.1 0.1 σx¯ = √ = √ = = 0.025. 4 n 16

That is, the sampling distribution of x ¯ is N (57, 0.025). From the probability calculator we find

4

STATISTICS AND CLIMATE CHANGE

125

P(56.95 < x ¯ < 57.05) = 0.9545 ≈ 95.5%. Notice that this probability is substantially larger than in part (a) even though the probability calculations involve the same range of values (56.95 to 57.05). The only difference between (a) and (b) is that in (a), we are asking about just a single temperature observation, x, whereas is (b), we are asking about the sample mean, x ¯, of a sample of n = 16 observations. Thus, for (b), we must first recalculate the standard deviation using the CLT, i.e., σx¯ = √σn , to account for the reduced variability caused by taking a sample. The mean of the sampling distribution does not change from that of the population distribution, i.e., µx¯ = µ.

4

STATISTICS AND CLIMATE CHANGE

126

2. The average temperature 40 years later, in 2015, is estimated at approximately 58◦ F with a standard deviation of 0.2◦ F. (a) Suppose a single temperature is taken at random in 2015. What is the probability that the temperature is below 57.9◦ F? Solution: A normal distribution calculation yields P(x < 57.9) = 0.3085 ≈ 30.9%.

4

STATISTICS AND CLIMATE CHANGE

127

(b) Suppose a random sample of 25 temperatures is taken in 2015. What is the probability that the sample mean temperature is below 57.9◦ F? Solution: Since we have a sample, the CLT is needed to obtain the correct standard deviation: σ 0.2 0.2 σx¯ = √ = √ = = 0.04 5 n 25 The mean for the sampling distribution, µx¯ , is the same as the originally given population mean, µ, which is 58. From the normal probability calculator we have P(¯ x < 57.9) = 0.0062 ≈ 0.6%.

4

STATISTICS AND CLIMATE CHANGE

128

4

STATISTICS AND CLIMATE CHANGE

129

Class Problem 4.3. Atmospheric CO2 , Global Temperature, and The Central Limit Theorem 1. The average temperature in the year 2000, is estimated at approximately 57.4◦ F with a standard deviation of 0.3◦ F. Assume global surface temperatures are normally distributed in any given year. Graph the (fully labelled) probability density function (bell curve) for parts (a) and (b) along with the corresponding probabilities. (a) Suppose a single temperature is taken at random in 2000. What is the probability that the temperature is above 57.3◦ F? (b) Suppose a random sample of 100 temperatures is taken in 2000. i. What is the probability that the sample mean temperature is above 57.3◦ F? ii. Find the 80th percentile for sample means. 2. Suppose it is known that the actual mean atmospheric CO2 concentration in the 1980s was 345.54 ppm with a standard deviation of 4.97 ppm. Suppose you take a sample of 80 CO2 concentrations. (a) What is the approximate shape of the distribution of the sample mean? (b) What is the probability that the sample mean will be more than 344 ppm? (c) What is the probability that the sample mean will be less than 344 ppm? (d) What percent of sample means would be within 3σx¯ of the true mean? (e) 99% of all samples would have means between what two concentrations?

4

STATISTICS AND CLIMATE CHANGE

4.5

130

Confidence Intervals

Arctic sea ice is now declining at a rate of 13.1 percent per decade, relative to the 1981 to 2010 average. https://climate.nasa.gov/vital-signs/arctic-sea-ice/ In this section we develop a method for estimating a population mean using a range (i.e., interval) of feasible values centered around a sample mean. The interval, called a confidence interval, is ultimately based on the sample size, sample mean, sample standard deviation, and how successful the interval is at capturing the true population mean. Definition 4.12. A point estimate of a population mean is a single number that is based on sample data and that represents a plausible value of the mean. Note: We can use a sample mean x ¯ as a point estimate of the population mean µ provided that the sample size is sufficiently large. Definition 4.13. A confidence interval (CI) for a population mean is an interval estimate of plausible values of the mean. The confidence level associated with a CI is the success rate of the method used to construct the interval. Note: The confidence level gives the percentage of all samples resulting in a CI that actually contains the population mean. A typical confidence level is 95%.

4

STATISTICS AND CLIMATE CHANGE

131

To construct a 95%-confidence interval, we need the number of standard deviations from the mean (of a sampling distribution and hence, a normal distribution) that would contain 95% of all sample means. Example 4.24. 95% of all observations in a standard normal distribution are within how many standard deviations of the mean? Recall that the mean and standard deviation of a standard normal distribution are 0 and 1, respectively. Solution: We must solve P(x1 < X < x2 ) = 0.95 for x1 and x2 . From the inverse normal probability calculator we have P(−1.96 < X < 1.96) = 0.95. Thus, 95% of all observations (from any normal distribution) fall within 1.96 standard deviations of the mean. 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 -4

-3

-2

-1

0

1

2

3

4

The shaded area is 0.95 under the standard normal density curve.

Theorem 4.3. In the distribution of a sample mean, 95% of all sample means are within 1.96σx¯ = 1.96 √σn where n is the size of the sample and σ is the population standard deviation.

Definition 4.14. Given a random sample of n observations with sample mean x ¯, the 95% confidence interval to estimate the mean µ of a population with known standard deviation σ is σ x ¯ ± 1.96 √ . n   We can be 95% confident that the true mean is in the interval x ¯ − 1.96 √σn , x ¯ + 1.96 √σn . That is,

4

STATISTICS AND CLIMATE CHANGE

132

we can expect 95% of all samples (of size n) to yield confidence intervals containing the actual mean µ.

The margin of error, ME of the confidence interval, given by 1.96 √σn , is the radius of the interval (distance from the center, x ¯, to the boundary). Note that the larger the sample size, n, the smaller the M E. The margin of error is an indication of the precision of the confidence interval; the smaller the margin of error, the more precise the confidence interval.

Example 4.25. Estimating Mean Sea Level Anomaly 1. A sample of 2008 sea level anomaly (in mm relative to the year 1880) measurements using coastal tide gauges is shown below: 229.5 229.1 235.5 239.3 243.1 243.1 242.3 238 233.9 229.8 229.8 229 Assuming the population standard deviation is 5.1 mm, estimate the mean sea level anomaly in 2008 using a 95% confidence interval. Interpret the result. Solution: From the sample, we find x ¯ = 235.2 and n = 12.

4

STATISTICS AND CLIMATE CHANGE

133

Also note that σ = 5.1 is given. The confidence interval, given by Definition 4.14, is σ x ¯ ± 1.96 √ n 5.1 = 235.2 ± 1.96 √ 12 = 235.2 ± 1.96 · 1.4722 = 235.2 ± 2.9. Thus, we estimate the mean sea level anomaly for 2008 to be 235.2 mm with a margin of error of 2.9 mm. I.e., we are 95% confident that the actual (population) mean sea level anomaly is within 2.9 mm of 235.2 mm. Another form of the interval, also given in Definition 4.14, is   σ σ ¯ + 1.96 √ x ¯ − 1.96 √ , x n n = (235.2 − 2.9, 235.2 + 2.9) = (232.3, 238.1) . Note that the sample mean, x ¯, is at the center of the CI. From this form we can say we are 95% confident that the mean sea level anomaly is between 232.3 mm and 238.1 mm.

2. A different sample of 2008 sea level measurements is taken using satellite altimetry:

4

STATISTICS AND CLIMATE CHANGE

134

217.86 219.08 222.4 222.08 219.89 221.21 218.26 221.45 221.92 221.01 219.56 220.88 220.7 219.88 225.29 225.22 222.95 221.66 223.38 224.41 224.21 222.5 222.19 221.19 219.66 222.16 220.72 218.36 220.61 220.9 219.32 220.7 222.53 221.46 223.64 221.23 221.48 224.48 223.91 227.03 227.31 222.41 221.14 220.63 Assuming the population standard deviation is 5.1 mm, estimate the mean sea level anomaly in

4

STATISTICS AND CLIMATE CHANGE

135

2008 with a 95% confidence interval using this satellite data. Interpret the result. Solution: From the sample, we find x ¯ = 221.8 and n = 44. Note that σ = 5.1 is the same since it is a population parameter. The confidence interval is

σ x ¯ ± 1.96 √ n 5.1 = 221.8 ± 1.96 √ 44 = 221.8 ± 1.5.

Thus, we estimate the mean sea level anomaly for 2008 to be 221.8 mm with a margin of error of 1.5 mm. I.e., we are 95% confident that the actual (population) mean sea level anomaly is within 1.5 mm of 221.8 mm. The alternative form of the CI is 

σ σ x ¯ − 1.96 √ , x ¯ + 1.96 √ n n



= (221.8 − 1.5, 221.8 + 1.5) = (220.3, 223.3) . From this form we can say we are 95% confident that the mean sea level anomaly is between 220.3 mm and 223.3 mm.

3. Do the confidence intervals from parts (a) and (b) indicate that the two measurement techniques (or data collection methods) are inconsistent? Solution:

Since the confidence intervals do not overlap, we do have evidence that the measurement or data collection techniques are inconsistent. That is, one or both methods must be flawed since the true

4

STATISTICS AND CLIMATE CHANGE

136

(population) mean sea level anomaly cannot possibly be in both confidence intervals. 4. Which confidence interval is more precise? Solution: Since the sample in part (b) is larger, the margin of error is smaller. Hence, the CI in (b) is more precise.

Example 4.26. 1960s atmospheric CO2 concentrations from Mauna Loa Observatory. 1. We wish to estimate the true mean CO2 concentration in the 1960s using a 95% confidence interval. We take a sample of 30 observations (CO2 concentration from the 60s) and calculate the sample mean to be 320.71 ppm. Suppose we know that the standard deviation of the population of 1960s CO2 concentrations is 2.9732 ppm. Construct the 95% CI from this sample. Solution: Note that x ¯ = 320.71, n = 30, and σ = 2.9732. Thus, the CI is σ x ¯ ± 1.96 √ n 2.9732 320.71 ± 1.96 √ 30 320.71 ± 1.06. Note that M E = 1.06 for this CI. From the CI, we conclude that we are 95% confident that the true mean CO2 concentration µ is in the interval (320.71 − 1.06, 320.71 + 1.06) = (319.65, 321.77) .

2. Suppose a climate change skeptic claims that the mean CO2 concentration in the 1960s was no more than 315 ppm. Does our sample provide statistically significant evidence to reject this claim? Explain. Solution: Yes, our sample provides statistically significant evidence to reject the claim that mean 1960s CO2 concentration was less than 315 ppm since 315 falls below our CI, i.e., out of the likely range of values for µ. More precisely, we can think of the skeptic’s claim that µ was no more than 315 ppm as the inequality µ < 315, or equivalently, that µ is in the interval (−∞, 315).

4

STATISTICS AND CLIMATE CHANGE

137

Since this interval does not overlap with the CI, it is very unlikely that the true population mean is in the claimed interval and not in the CI. 3. Suppose another 95% CI is calculated from a sample of 1960s CO2 concentrations from a different observatory: (317.80, 319.92) . (a) What must have been the sample mean used to generate this CI? Solution: The sample mean is at the center of the CI. Hence, we find x ¯ by taking the midpoint (average) of the boundaries of the CI: x ¯=

317.80 + 319.92 = 318.86 2

(b) What is the margin of error for this CI? Solution: The margin of error is the distance from the center to either edge. Hence, we can find M E by taking the difference between the mean and the left boundary: M E = 318.86 − 317.80 = 1.06 4. Is there statistically significant evidence that these two observatories are yielding inconsistent CO2 concentration measurements? Solution:

No. Since the two CIs do overlap, the actual mean CO2 concentration may very well be in their intersection (i.e., in both intervals) which would imply that both observatories are accurate. 5. Which CI is more precise? Solution: Since both confidence intervals have the same M E, they have the same level of precision.

4

STATISTICS AND CLIMATE CHANGE

138

Class Problem 4.4. Confidence Intervals and Arctic Sea Ice Extent

Arctic sea ice is now declining at a rate of 13.1 percent per decade, relative to the 1981 to 2010 average.

https://climate.nasa.gov/vital-signs/arctic-sea-ice/ 1. A sample of Arctic Sea ice extent measurements (in millions of square kilometers) taken in the year 1980 is shown below:

4

STATISTICS AND CLIMATE CHANGE 14.2 14.771 15.401 15.639 15.76 15.988 16.005 15.946 15.945 13.531 13.405 13.34 13.206 11.283 11.075 10.983 8.868 8.76 7.761 7.717 7.758 8.213 8.485 8.532 12.096 12.256 12.476 15.503 15.627 15.646 15.242 15.202 15.135 12.863 12.721 12.652 7.724 7.667 7.593 9.609 9.719 13.217 13.418

139

4

STATISTICS AND CLIMATE CHANGE

140

(a) Assuming the population standard deviation is 3 million km2 , estimate the mean sea ice extent in 1980 using a 95% confidence interval. (b) Graph the CI. (c) What is the margin of error? (d) Interpret the CI. (e) Suppose it is claimed that sea ice extent was greater than 13 million km2 in 1980. Does the data provide statistically significant evidence to reject this claim? Explain. 2. A 95% confidence interval for the mean ice extent in the year 2020 is (9.7, 10.5) (a) What is the mean of the sample that the CI was constructed from? (b) What is the margin of error for this CI? 3. Of the two confidence intervals, which is more precise? Why? 4. Suppose a climate change skeptic claims that sea ice extent has not changed in the last 40 years. Do the confidence intervals provide evidence to the contrary? Explain.

4

4.6

STATISTICS AND CLIMATE CHANGE

Hypothesis Testing

141

4

STATISTICS AND CLIMATE CHANGE

4.7

142

Exercises

1. Temperature Data: https://www.wunderground.com/weather/us/in/south-bend Take a sample of 20 temperatures from weather stations in an area of your choosing using the link above. Show all work and use appropriate notation in doing the following by hand: (a) Find the 5-number summary. (b) Make a histogram and box plot. (c) Find the mean and standard deviation. (d) Find the range and interquartile range. (e) Identify any outliers. 2. Temperature Data: https://www.wunderground.com/weather/us/in/south-bend Take a sample of 20 temperatures from weather stations in another area of your choosing using the link above. Do the following using statistical software: (a) Find the 5-number summary. (b) Make a histogram and box plot. (c) Find the mean and standard deviation. (d) Find the range and interquartile range. (e) Identify any outliers. (f) Compare/contrast the center and spread (distribution) with that of problem #1.

3. Describe what is meant by variation in a data set. 4. Small standard deviation of a data set corresponds to high or low variation? 5. Large standard deviation of a data set corresponds to high or low variation? 6. Suppose you wish to estimate the average amount of municipal solid waste recycled by an American household in a day. You measure the daily recycling of 800 households and find that the average is 4 lbs. (a) Describe the population of interest in this study. (b) Describe the sample. Indicate the sample size. (c) Describe and identify (if possible) any parameters. Use appropriate notation! (d) Describe and identify (if possible) any statistics. Use appropriate notation! 7. Global Warming (Refer to Example 4.9) Using the figure in Example 4.9 estimate the multi-model mean surface warming projected for the year 2060 using emissions scenario A2 as well as the standard deviation. Assume that climate model temperature predictions are normally distributed. (a) What is the probability that the temperature will increase by more than 2.1◦ C? (b) What is the probability that the temperature change by 2060 will be between 1.8 and 2.1?

4

STATISTICS AND CLIMATE CHANGE

143

(c) There is a 1% chance that warming, ∆T , will be below what temperature? (d) 99% of warming values are within how many standard deviations of the mean?

8. Atmospheric CO2 concentrations (Mauna Loa). (Refer to Examples 4.12.) ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt Take 5 random samples of 6 CO2 concentrations from the 2000s decade using the random number generator at https://www.random.org/ Use 1 for the minimum and 120 for the maximum values; a random number represents the month number in the decade, e.g., 1 is January of 2000, 2 is February of 2000, ... 13 is January of 2001, 14 is February of 2001, ... etc. (a) Calculate the sample mean for each sample. (b) Represent the sampling distribution of mean CO2 concentration as a histogram for x ¯.

9. Atmospheric CO2 concentrations. (Refer to Examples 4.12 and 4.14.) Suppose it is known that the actual mean atmospheric CO2 concentration in the 2000s is 379.61 ppm with a standard deviation of 6.73 ppm. Suppose you take a sample of 100 CO2 concentrations. (a) What is the approximate shape of the distribution of the sample mean? (b) What is the probability that the sample mean will be less than 379.61 ppm? (c) What percent of sample means would be within 2σx¯ of the true mean? (d) 95% of all samples would have means between what two concentrations?

10. Atmospheric CO2 concentrations. (a) Suppose you wish to estimate the true mean CO2 concentration in the 2000s using a 95% confidence interval. Take a random sample of 20 observations (CO2 concentration from the oughts) using https://www.random.org/ and calculate the sample mean. Suppose we know that the standard deviation of the population of 2000s CO2 concentrations is 6.73 ppm. Construct and interpret a 95% CI from this sample. (b) Suppose a climate change denier claims that the mean CO2 concentration in the 2000s was no more than 375 ppm. Does your sample provide statistically significant evidence to reject this claim? Explain.

11. Suppose a climate change denier claims that the observed increase in atmospheric CO2 concentration is merely due to natural random fluctuations and that the true mean CO2 concentration has not changed from the 1960s to the 2000s. Use the confidence interval from Example 4.17 in conjunction with your confidence interval from Exercise 8 to refute this claim.

5

5

NETWORK THEORY AND GREEN TRANSPORTATION

144

Network Theory and Green Transportation

Concerns about the environment, health effects of burning fossil fuels, greenhouse gas emissions, energy security, oil supply, peak oil, land use, resource utilization and more are forcing us to reconsider the design of the transportation system. Some of the greenest and most sustainable forms of transportation include bicycling, walking, trains, and electric cars. Example 5.1. Seven Bridges of K¨onigsberg Can the seven bridges be traversed exactly once starting and ending at the same place?

Graph representation:

5

NETWORK THEORY AND GREEN TRANSPORTATION

145

5

NETWORK THEORY AND GREEN TRANSPORTATION

5.1

146

Graph Theory Preliminaries

Definition 5.1. A graph consists of two sets: V , the set of vertices, and E, the set of edges.

Example 5.2. For the seven bridges problem, V = {A, B, C, D} and E = {a, b, c, d, e, f, g}.

Definition 5.2. A walk from vertex v0 to vn is a sequence w = {v0 , e1 , v1 , e2 , ..., vn−1 , en , vn } of (alternating) vertices and edges from v0 to vn .

Example 5.3. For the seven bridges problem, a walk from vertex B to vertex D is {B, b, A, d, C, g, D}. Name two other walks from B to D. Possible Solutions: {B, c, C, g, D} or {B, a, A, d, C, c, B, b, A, f, D}, or {B, b, A, d, C, d, A, e, D}

Definition 5.3. A graph is connected if there is a walk between every pair of vertices.

Example 5.4. Is the seven bridges graph connected? Solution: Yes

5

NETWORK THEORY AND GREEN TRANSPORTATION

147

Example 5.5. Is the following graph connected?

Solution: No

Definition 5.4. The degree of vertex V , denoted by |V |, is the number of edges connected to it.

Example 5.6. For the seven bridges problem, what is the degree of vertex A? What is the degree of C? Solution: The degree of A is 5, i.e., |A| = 5. |C| = 3.

Theorem 5.1. Euler’s Sum-Degree Theorem The sum of the degrees of the vertices of a graph is twice the number of edges. Proof: By definition, each edge has two vertices connected to it; so each edge gets counted twice in summing the degrees of the vertices.

Example 5.7. For the seven bridges problem, find the degree of each vertex. Find the sum of the degrees of the vertices. Solution: |A| = 5, |B| = 3, |C| = 3, |D| = 3. The sum of the degrees is 14 (twice the number of edges!).

5

NETWORK THEORY AND GREEN TRANSPORTATION

5.2

148

Eulerian Circuits

Definition 5.5. A trail is a walk with no repeated edges.

Definition 5.6. A minimal trail between vertices U and V is a trail with the least number of edges.

Definition 5.7. The distance between vertices U and V , denoted by d(U, V ), is the number of edges in a minimal trail between U and V .

Example 5.8. Consider the seven bridges problem.

1. Find a walk from B to D that is not a trail. 2. Find a minimal trail between B and D. 3. Find d(B, D). Possible Solutions: 1. {B, b, A, d, C, d, A, e, D} 2. {B, b, A, e, D} 3. d(B, D) = 2

5

NETWORK THEORY AND GREEN TRANSPORTATION

Definition 5.8. An Eulerian trail is a trail which visits every edge once.

Example 5.9. For the following graph, find an Eulerian trail.

Possible Solution: {V0 , e0 , V1 , e4 , V5 , e5 , V0 , e3 , V3 , e6 , V5 , e8 , V4 , e7 , V3 , e2 , V2 , e1 , V1 }.

Example 5.10. Find an Eulerian trail for the seven bridges problem. Solution: Not Possible!

Definition 5.9. An Eulerian circuit is an Eulerian trail which starts and ends on the same vertex.

Example 5.11. For the following graph, find an Eulerian circuit.

149

5

NETWORK THEORY AND GREEN TRANSPORTATION

150

Possible Solution: {V0 , e0 , V1 , e1 , V2 , e2 , V3 , e3 , V1 , e4 , V5 , e6 , V3 , e7 , V4 , e8 V5 , e5 , V0 }.

Example 5.12. Find an Eulerian circuit for the seven bridges problem. Solution: Not Possible! Why?

Theorem 5.2. A connected graph contains an Eulerian circuit if and only if every vertex has even degree. Proof (partial and informal): Each time we visit a vertex, V , we enter through one edge and leave through another. Thus, the degree of V must be twice the number of times we visit it. Hence, the degree is an even number.

Example 5.13. Solve the seven bridges problem: Can the seven bridges be traversed exactly once starting and ending at the same place? Solution: Since not all vertices have even degree (e.g., |D| = 3), there does not exist an Eulerian circuit. Thus, the seven bridges cannot be traversed exactly once starting and ending at the same place (this would amount to an Eulerian circuit).

Example 5.14. A municipal recycling company would like to minimize its costs (truck mileage) in recycling collection (and subsequently reduce CO2 emissions). Consider the graph representation of neighborhoods in a city where neighborhoods are represented by vertices and major thoroughfares connecting the neighborhoods correspond to edges:

5

NETWORK THEORY AND GREEN TRANSPORTATION

151

Can a truck service all neighborhoods without retracing any part of its route? If so, find such a route. Note: A truck must service residences along all thoroughfares and hence, must traverse all edges. Solution: Since the degree of every edge is even, the graph contains an Eulerian circuit. Hence, a truck can traverse all edges exactly once (beginning and ending at the same place). Here is a possible route (omitting edge names) if the recycling center happens to be located in neighborhood A: {A, B, C, D, F, E, D, G, E, C, A}.

5

NETWORK THEORY AND GREEN TRANSPORTATION

152

Class Problem 5.1. The bike-friendly bridges of New York City problem 1. Use the NYC Bike Map at http://www.nycbikemaps.com/maps/nyc-bike-map/ in conjunction with the figure below to construct a graph model of the New York City bikefriendly bridges connecting the 5 boroughs and New Jersey. Use the Seven Bridges of K¨onigsberg problem (Example 5.1) as a guide.

Notes: • Since Brooklyn and Queens are on the same land mass (Long Island), merge them together as just a single location. • Assume that the Verrazzano-Narrows Bridge and the Bayonne Bridge are now bike-friendly. • Use the following abbreviations: – – – – –

MH - Manhattan BX - The Bronx BQ - Brooklyn/Queens SI - Staten Island NJ - New Jersey

2. Find a walk from SI to BX that is not a trail. 3. Find a trail from SI to BX that is not a minimal trail. 4. Find a minimal trail from SI to BX. 5. What is the distance between SI and BX?

5

NETWORK THEORY AND GREEN TRANSPORTATION

153

6. Find the degree of each vertex. 7. On a bike tour of NYC starting (and ending) in Manhattan, is it possible to cross all bridges only once? Explain. 8. If possible, find an Eulerian circuit. 9. Find an Eulerian trail from SI to BX. 10. Develop a theorem for determining whether or not an Eulerian trail exists between two given vertices (for any graph).

5

NETWORK THEORY AND GREEN TRANSPORTATION

5.3

154

Graph Connectivity

Some connected graphs are “more connected” than others. For instance, some connected graphs can be disconnected by the removal of a single edge, whereas others remain connected unless more edges are removed. Determining the number of edges that must be removed to disconnect a given connected graph applies directly to analyzing the connectivity of existing or proposed networks, such as the network of bicycle routes and lanes in a city. Another description of connectivity is the number of alternative paths between each pair of vertices. Definition 5.10. The connectivity of a connected graph G, denoted by c(G) is the minimum number of edges whose removal can disconnect G.

Example 5.15. Determine the connectivity of the following graphs:

1.

2.

5

NETWORK THEORY AND GREEN TRANSPORTATION

3.

4. Solution:

155

5

NETWORK THEORY AND GREEN TRANSPORTATION

156

1. c(G) = 2 2. c(G) = 3 3. c(G) = 3 4. c(G) = 1

Definition 5.11. Let U and V be two vertices in a graph G. A collection of U -V trails in G is said to be edge-disjoint if no two trails in the collection have an edge in common.

Definition 5.12. A collection of edge-disjoint U -V trails is said to be maximal if it includes the largest number of such trails. Definition 5.13. The connectivity of two vertices U and V , denoted by c(U, V ), is the number of trails in a maximal collection of edge-disjoint U -V trails.

Definition 5.14. The mean connectivity of a graph G, denoted by cavg (G), is the average connectivity between all vertex pairs.

Example 5.16. 1. Consider the graph below:

5

NETWORK THEORY AND GREEN TRANSPORTATION (a) Find a collection of edge-disjoint trails between vertices V1 and V5 . (b) Find a maximal collection of edge-disjoint V1 -V5 trails. 2. Consider the graph below:

(a) Find a maximal collection of edge-disjoint E-F trails. (b) Are there any edge-disjoint pairs of A-G trails? 3. Consider graph G below:

(a) Find the connectivity of graph G. (b) Find the connectivity of each pair of vertices in G. (c) Find the mean connectivity of G. Solution: 1.

(a) {{e4 }, {e0 , e5 }}

157

5

NETWORK THEORY AND GREEN TRANSPORTATION (b) {{e4 }, {e0 , e5 }, {e1 , e2 , e6 }} 2.

(a) {{E, F }, {E, D, F }, {E, G, F }} (b) No

3.

(a) c(G) = 2 (b)

i. ii. iii. iv. v. vi.

c(X, Y ) = 2 c(X, Z) = 2 c(X, W ) = 3 c(Y, W ) = 2 c(Y, Z) = 2 c(W, Z) = 2

(c) cavg = 2.17

Example 5.17. Find the mean connectivity of graph K given below

Solution: 1.5

Example 5.18. Network model of Bike Infrastructure 1. Construct a graph model of Winona’s bicycle infrastructure (use G to denote the graph):

158

NETWORK THEORY AND GREEN TRANSPORTATION

City of Winona

Wall

Zum bro

¡ Bike © Shop

Lake Winona

14 £ ¤ 61 £ ¤

East Gar vin Heig

Lake

Fronte nac

Blvd

o n Bruski

hts

Garvin Heights

14 £ ¤ 61 £ ¤

West y lle Burns Va

ey

Bundy

Co nr ad

y n

Kansas

Franklin

Walnut

Louis Jeff erso a n

Libe rty sha

Bu Ea rns st Va ll

Homer

WILSON TOWNSHIP Market

Lafayette

Main

Johnson

Wab a

Ham ilton

Sioux

Win ona John Main son

Huff

How ard Train Station Sarn ia

7 5 6 4

Bike © Shop ¡

Center

Washington

Waln ut

Bierce

Vila

Holzinger Trails

Wabasha

Winona Visitor Information

® f

4th

44

Levee Park

Mississippi River Trail

r

2nd

Bro adw ay

"

Lake Winona

See Inset

ive

t san Pleaalley V

4th This map was compiled from a variety of sources. This information is provided with the understanding that conclusions drawn from such information are solely the responsibility of the user. The GIS data is not a legal representation of any of the features depicted, and any assumption of the legal status of this map is hereby disclaimed.

Huff

© ¡

43 " )

3rd

Broadway

Olm stea d

Gould

Orrin

Baker

Lake

2nd

5th

To W

Prairie Isl and

Pe lze r

Downtown Central Business District Bicycles Prohibited on Sidewalks

Winona State University

Leve e

City Limits iR

Huff

HILLSDALE TOWNSHIP

r

Ridgewood

Gilmore

Sarn ia

st cre

Gilmore Valley

Levee

Wabasha

Lake

Win

Trail Access Points

21 7 5 6 4

ve

Broadway

Mis sis sip p

Latsch Island

43 ) "

3rd

He igh ts

St. Mary's University

43 ) "

Recommended Mountain Bike Trails

Gilmore

14 £ ¤

Ri

w vie od Go 14 £ ¤

Bike Lane Wide Shoulder

Riverview

5th

Knopp Valley

pi

We Valley

Trail Path

iew rv

5th

Bikeways

ve Ri

The ure r

Goodview 61 £ ¤

ip is s is s M

I

isco nsin

Lake Village

I

B i ke w a y s M a p

Prairie Island

HOMER TOWNSHIP

Airport

159

Fran klin

¢ n

Winona

5

Valley Oaks

17 7 5 4 6 0

0.25

0.5

1

July 2016

Miles

Use vertices to represent Latch Island, Downtown (pink shading on map), WSU (between Huff & Main, and Wabasha & Sarnia), Lake Winona, and Kolter Bike Shop (Mankato Ave); use labels I, D, W , L, and K, respectively. Connect two vertices only if there is a bike lane (orange) or recommended route (red) between them. Notes: • If a vertex is within one city block of an edge, connect the edge to the vertex. • There is more than one valid way to construct this graph. 2. Find c(G). 3. Find c(D, W ). 4. Find cavg (G). Solution:

5

NETWORK THEORY AND GREEN TRANSPORTATION

1. 2. c(G) = 1 3. c(D, W ) = 3 4. cavg (G) = 1.4

160

NETWORK THEORY AND GREEN TRANSPORTATION

161

Class Problem 5.2. Network model of Winona Bike Infrastructure 1. Construct a graph model of Winona’s bicycle infrastructure (use G to denote the graph): City of Winona

Wall

© Bike ¡ Shop

Lake Winona

14 £ ¤ 61 £ ¤

East Gar vin Heig

Lake

Fronte nac

Blvd

o n Bruski

hts

Garvin Heights

14 £ ¤ 61 £ ¤

West y lle Burns Va

ey

Bundy

Co nr ad

y n

Kansas

Franklin

Walnut

Louis Jeff erso a n

sha

Zum bro

Wab a

Ham ilton

Huff

How ard Train Station Sarn ia

Bu Ea rns st Va ll

Homer

WILSON TOWNSHIP Market

Lafayette

Main

Johnson

Bike © Shop ¡

Libe rty

Win ona John Main son

Sioux

Baker

Vila

Holzinger Trails

Wabasha

Winona Visitor Information

® f

4th

44 7 5 4 6

Levee Park

Center

Washington

Waln ut

Bierce

Lake Winona

Mississippi River Trail

r

2nd

Bro adw ay

"

ive

t san Pleaalley V

4th

Huff

© ¡

See Inset

43 ) "

This map was compiled from a variety of sources. This information is provided with the understanding that conclusions drawn from such information are solely the responsibility of the user. The GIS data is not a legal representation of any of the features depicted, and any assumption of the legal status of this map is hereby disclaimed.

Broadway

Olm stea d

Gould

Orrin

Levee

Winona State University

Lake

2nd 3rd

5th

To W

Prairie Isl and

Pe lze r

Downtown Central Business District Bicycles Prohibited on Sidewalks

Gilmore

Leve e

City Limits iR

Huff

HILLSDALE TOWNSHIP

r

Ridgewood

) "

Sarn ia

st cre

Gilmore Valley

21 7 5 4 6

Wabasha

Lake

Win

Trail Access Points

43 " )

ve

Broadway

Mis sis sip p

Latsch Island

43

3rd

He igh ts

St. Mary's University

£ ¤

Recommended Mountain Bike Trails

Gilmore

14

Ri

w vie od Go

£ ¤ Knopp Valley

Bike Lane Wide Shoulder

Riverview

5th

14

pi

We Valley

Trail Path

iew rv

5th

Bikeways

ve Ri

The ure r

Goodview 61 £ ¤

ip is s is s M

I

isco nsin

Lake Village

I

B i ke w a y s M a p

Prairie Island

HOMER TOWNSHIP

Airport

Fran klin

¢ n

Winona

5

Valley Oaks

17 7 5 6 4 0

0.25

0.5

1

July 2016

Miles

Use vertices to represent Latch Island, Downtown (pink shading on map), WSU (between Huff & Main, and Wabasha & Sarnia), Lake Winona, Kolter Bike Shop (Mankato Ave), Garvin Heights, St. Mary’s U., and Mugby Jct (on Frontenac). Use labels LI, DT , W S, LW , KB, GH, and SM , respectively. Connect two vertices only if there is a bike lane (orange) or recommended route (red) between them. Notes: • If a vertex is within one city block of an edge, connect the edge to the vertex. • There is more than one valid way to construct this graph. 2. Find c(G). 3. Find c(D, W ). 4. Find cavg (G).

5

NETWORK THEORY AND GREEN TRANSPORTATION

5.4

162

Exercises

1. Consider the graph

(a) Find a walk from B to F . (b) Find the degree of E. (c) Find |F |. (d) Is {C, D, E, F, D, E, G} a trail from C to G? Why or why not? (e) Is {B, A, C, D, F } a minimal trail between B and F ? If not, find one. (f) Find d(B, F ). 2. If a graph has 130 edges, what is the sum of the degrees of all vertices? 3. Does the following graph contain an Euler circuit? Why or why not? If it does, find one.

4. Does the following graph contain an Euler circuit? Why or why not? If it does, find one.

5. The band, Cloud Caravan , is going on tour of capital cities in the southeast U.S. (circled in the map below). Since Cloud Caravan is a very environmentally conscious band, they want to pick a route that visits all cities without retracing any part of the route. The tour kicks off in Nashville

5

NETWORK THEORY AND GREEN TRANSPORTATION

163

(hence, the route must begin and end in Nashville).

(a) Model the tour as a graph where cities are vertices and interstate highways are edges (there are several valid graph models). (b) Can the band hit all capitals without retracing any part of its route (based on your graph from part(a))? Why or why not? If so, find a route. (c) If the answer to part (b) was “No”, modify your graph so that the band can conduct their tour in an environmentally friendly way. Then find a route. 6. In the following graphs, find (a) The connectivity of the graph. (b) The mean connectivity of the graph.

i

5

NETWORK THEORY AND GREEN TRANSPORTATION

ii

iii

iv

164

5

NETWORK THEORY AND GREEN TRANSPORTATION

v

vi

vii

165

5

NETWORK THEORY AND GREEN TRANSPORTATION

viii

ix

166

6

6

GEOMETRY, TRIGONOMETRY, AND NATURAL BUILDING

167

Geometry, Trigonometry, and Natural Building

The Physiological and Psychological Benefits of Natural Environments If you live in a big city, you may have noticed new buildings popping up — a high-rise here, a skyscraper there. The concrete jungles that we’ve built over the past century have allowed millions of us to live in close proximity, and modern economies to flourish. But in this shift to an increasingly urbanized society, are we missing something important? For more than 30 years, psychologist Ming Kuo has studied the effects of nature on humans. She came to this field of research not from an interest in greenery, but from a fascination with crowding and noise — the negative impacts of urban environments. “I was interested in the dark side of the environment,” she says. “I was interested in how violent or dangerous or you know bad urban environments had detrimental effects on people.” Kuo says she eventually became intrigued by the positive effects of nature after she started to dig into the data. “It’s only when you look at the patterns of what people are like with more and less access to nature that you start to see this pattern,” she says. More at Hidden Brain podcast: “Our Better Nature: How The Great Outdoors Can Improve Your Life” https://www.npr.org/2018/09/10/646413667/ Natural Building Natural building involves a range of building systems and materials that place major emphasis on sustainability. Ways of achieving sustainability through natural building focus on durability and the use of minimally processed, plentiful or renewable resources, as well as those that, while recycled or salvaged, produce healthy living environments and maintain indoor air quality. Natural building tends to rely on human labor more than technology and depends on local ecology, geology and climate. The basis of natural building is the need to lessen the environmental impact of buildings and other supporting systems, without sacrificing comfort or health. To be more sustainable, natural building uses primarily abundantly available, renewable, reused or recycled materials. The use of rapidly renewable materials is increasingly a focus. In addition to relying on natural building materials, the emphasis on the architectural design is heightened. The orientation of a building, the utilization of local climate and site conditions, the emphasis on natural ventilation through design, fundamentally lessen operational costs and positively impact the environment. Building compactly and minimizing the ecological footprint is common, as are on-site handling of energy acquisition, on-site water capture, alternate sewage treatment and water reuse. More info at https://strawbalestudio.org/

6

GEOMETRY, TRIGONOMETRY, AND NATURAL BUILDING

168

In passive solar building design, windows, walls, and floors are made to collect, store, and distribute solar energy in the form of heat in the winter and reject solar heat in the summer. This is called passive solar design because, unlike active solar heating systems, it does not involve the use of mechanical and electrical devices. Passive Solar Building Design Consider the problem of window placement for optimal utilization of solar energy seasonality. That

6

GEOMETRY, TRIGONOMETRY, AND NATURAL BUILDING

169

is, given a particular roof overhang and window dimensions, where should the window be placed so that it is fully exposed to sunlight in the winter (particularly at the winter solstice) but fully shaded in the summer (at the summer solstice)? There are two key parameters in this problem: roof pitch and solar altitude.

Roof pitch, denoted by p, is a numerical measure of the steepness of a roof and is equivalent to the algebraic concept of slope. That is, the pitch of a roof is its vertical rise divided by its horizontal run: rise run The angle of inclination of a roof is the angle made by the roof with the ground. p=

6

GEOMETRY, TRIGONOMETRY, AND NATURAL BUILDING

170

Solar altitude is the angle between the horizon and the center of the Sun’s disc (see Fig. 6.1). On any given day, the solar altitude is largest at noon; this is called the maximal solar altitude.

Figure 6.1: Solar Altitude The day of the year with the largest maximal solar altitude is the summer solstice (June 21 or 22). The day of the year with the smallest maximal solar altitude is the winter solstice (December 21 or 22). See https://www.youtube.com/watch?v=WgHmqv_-UbQ

6

GEOMETRY, TRIGONOMETRY, AND NATURAL BUILDING

171

Figure 6.2: The Earth’s revolution about the Sun, in conjunction with the Earth’s tilt give rise to the four season. Because of the Earth’s 23.44◦ tilt (relative to the Earth-Sun plane), the Sun’s radiant energy waves (e.g., light and heat) are most direct on the Tropic of Capricorn (23.44◦ south latitude) at the winter solstice (see Fig. 6.2). At the summer solstice, they are most direct on the Tropic of Cancer (23.44◦ north latitude). At the spring and fall equinox, the sun shines most directly on the equator (0◦ latitude).

6

GEOMETRY, TRIGONOMETRY, AND NATURAL BUILDING

6.1

Geometry

Definition 6.1. If the measures of all the angles of two triangles are the same, the triangles are called similar.

Theorem 6.1. Similar Triangles Theorem Ratios of corresponding sides of similar triangles are equal.

Example 6.1. Using the figure below, find the values of x and c.

Solution: Note that the triangles are similar since all angles are the same. To find c, we use theorem 6.1 to form the proportion c 10 = . 5 8 Then solve for c: c=5·

10 = 6.25. 8

To find x, we form the proportion x 3 = . 8 5

172

6

GEOMETRY, TRIGONOMETRY, AND NATURAL BUILDING Then solve for x: x=8·

173

3 = 4.8. 5

Definition 6.2. A right triangle is a triangle with a right angle (90◦ angle). The hypotenuse is the side across from the right angle. The other two sides are called legs.

Theorem 6.2. Pythagorean Theorem Consider a right triangle with legs labeled a and b, and hypotenuse labeled c.

The sum of the squares of the legs equals the square of the hypotenuse. I.e., a2 + b2 = c2 Proof: Consider the two squares below:

Note that they have the same area.

6

GEOMETRY, TRIGONOMETRY, AND NATURAL BUILDING

174

The area of the square on the left is the sum of the areas of the 4 triangles and the 2 squares:   1 Area = 4 ab + a2 + b2 = 2ab + a2 + b2 . 2 The area of the square on the right is the sum of the areas of the 4 triangles and the square in the center:   1 Area = 4 ab + c2 = 2ab + c2 . 2 Since the areas are equal, we have 2ab + a2 + b2 = 2ab + c2 . Subtracting 2ab from both sides yields a2 + b2 = c2 .

(The scarecrow didn’t quite get it right: https://www.youtube.com/watch?v=jbvip1Ot6jQ) Example 6.2. Suppose we have a triangle with leg lengths 3 and 4. Find the length of the hypotenuse. Solution:

Using the Pythagorean theorem, we have a2 + b2 = c2 32 + 42 = c2 9 + 16 = c2 25 = c2 √ c = 25 = 5. Example 6.3. Suppose we have a triangle with leg length 5 and hypotenuse length 13. Find the length of the other leg. Solution:

6

GEOMETRY, TRIGONOMETRY, AND NATURAL BUILDING

175

Using the Pythagorean theorem, we have a2 + b2 = c2 52 + b2 = 132 25 + b2 = 169 b2 = 169 − 25 = 144 √ b = 144 = 12. Definition 6.3. Two angles are complementary if their sum is 90◦ . Two angles are supplementary if their sum is 180◦ . Definition 6.4. Suppose a transversal intersects two parallel lines:

1. Vertical angles are each of the pairs of opposite angles made by two intersecting lines. E.g., angles 2 and 3 are vertical angles. 2. Corresponding angles are the angles that occupy the same relative position at each intersection where a straight line crosses two parallel lines. E.g., angles 1 and 5.

6

GEOMETRY, TRIGONOMETRY, AND NATURAL BUILDING

176

3. Alternate interior angles are pairs of angles on the inner side of each of the parallel lines but on opposite sides of the transversal. E.g., angles 3 and 6. 4. Alternate exterior angles are pairs of angles on the outer side of each of the parallel lines but on opposite sides of the transversal. E.g., angles 1 and 8.

Postulate 6.1. Corresponding angles are equal.

Theorem 6.3. 1. Vertical angles are equal. 2. Alternate interior angles are equal. 3. Alternate exterior angles are equal. Proof: 1. Using the figure below, we will show that angle α equals angle β.

Since α and γ are supplemental, α + γ = 180 and thus, γ = 180 − α. Now, since γ and β are also supplemental, γ + β = 180. Substituting γ = 180 − α, we have (180 − α) + β = 180,

6

GEOMETRY, TRIGONOMETRY, AND NATURAL BUILDING

177

which yields α = β. 2. Using the figure below, we will show that angle α equals angle β.

Since α and γ are corresponding angles, α = γ. Since γ and β are vertical angles, γ = β. Hence, α = β. 3. (see Class Problem 6.1)

Example 6.4. In the first two figures, find the value of the angle α. In the last figure, find an expression involving x for α, then suppose x = 38 and find the value of α.

Solution: 1. By the corresponding angles postulate, β = 60. Also, since the angle (α + β) and the right angle are supplemental, (α + β) + 90 = 180

6

GEOMETRY, TRIGONOMETRY, AND NATURAL BUILDING

178

which implies that α = 90 − β. Hence, α = 90 − 60 = 30. 2. By the corresponding angles postulate, β = 40 + 15 = 55. Now, α = 90 − β = 90 − 55 = 35. 3. By the corresponding angles postulate, β = x + 20. Now, α = 90 − β = 90 − (x + 20) = 70 − x. If x = 38, then α = 70 − 38 = 32.

Example 6.5. Winter Solstice Solar Altitude

Figure 6.3: Solar altitude, denoted by α, is the angle between the horizon and the sun and measures how high in the sky the sun is at a given time of day and year.

Figure 6.4: If we rotate our perspective, we see the solar altitude as the angle between a line parallel to the earth’s surface (at your location) and a horizontal line to the sun. Also note that the line tangent to the surface is perpendicular to the radius of the Earth at that point.

6

GEOMETRY, TRIGONOMETRY, AND NATURAL BUILDING

179

Calculate the maximal solar altitude on the winter solstice. Denote the angle by αw .

Winter Solstice Sunray

𝑙𝑎𝑡 23.44°

Sunray

Figure 6.5: The latitude (lat) of a point on Earth is the angle made between the equator and a radius to that point. The Sun’s radiant energy waves (e.g., light and heat) are most direct on the Tropic of Capricorn (23.44◦ south latitude) at the winter solstice. Solution: By the corresponding angles postulate, the angle above the top sunray made with the extended radius is (lat + 23.44) (see Fig. 6.6). Since the extended radius is perpendicular to the surface of the Earth (because Earth is a sphere), the angle (lat + 23.44) and the solar altitude, αw , are complementary. Hence, (lat + 23.44) + αw = 90 αw = 90 − (lat + 23.44) αw = 66.56 − lat

6

GEOMETRY, TRIGONOMETRY, AND NATURAL BUILDING

180

Winter Solstice 𝑙𝑎𝑡 + 23.44° Sunray 𝛼𝑤 = 90° − (𝑙𝑎𝑡 + 23.44°)

𝑙𝑎𝑡 23.44°

Figure 6.6: The extended radius is a transversal through the parallel sunrays. South Bend, IN is at latitude 41.68◦ N. What is αw for South Bend? Solution: αw = 66.56◦ − 41.68◦ = 24.88◦

Sunray

6

GEOMETRY, TRIGONOMETRY, AND NATURAL BUILDING

181

Example 6.6. Solar Panels Suppose a natural builder wishes to determine the pitch of a roof with solar panels affixed to it. Since demand for electricity is greatest in the winter, the panels should be perpendicular to sunrays at the winter solstice (see Fig. 6.7). Find the appropriate roof angle, θ (see Fig. 6.8).

Figure 6.7: Solar panels should be perpendicular to sunrays for optimal energy capture.

6

GEOMETRY, TRIGONOMETRY, AND NATURAL BUILDING

182

Solar Panel (roof)

Winter Solstice

𝜃

𝑙𝑎𝑡 + 23.44° Sunray

𝑙𝑎𝑡 23.44°

Sunray

Figure 6.8: A radius from the center of Earth to your location is a transversal through the parallel sunrays. Solution: By the corresponding angles postulate, the angle between the radius and sunray is (lat + 23.44). Since we require the sunlight hit the solar panel at a right angle, the angle made by the panel and radius is 90 − (lat + 23.44) = 66.56 − lat. Now, since the radius is perpendicular to the ground, we have θ = 90 − (66.56 − lat). Thus, θ = 23.44◦ + lat. Thus, in South Bend, we would need a roof angle of θ = 23.44◦ + 41.68◦ = 65.12◦ . As we will see later, this corresponds to a roof pitch of approximately 2.16 – an Elizabethan roof!

6

GEOMETRY, TRIGONOMETRY, AND NATURAL BUILDING

183

Class Problem 6.1. Geometry and Passive Solar Design 1. Prove that alternate exterior angles are equal. 2. Use the figure below and the formula for the area of a trapezoid 1 A = (base1 + base2 ) · height 2 to prove the Pythagorean Theorem.

3. Calculate the maximal solar altitude on the winter solstice for Winona. 4. Derive a formula for the solar altitude on the summer solstice for a location in the northern hemisphere in terms of its latitude. 5. Calculate the maximal solar altitude on the summer solstice for Winona, MN. 6. Derive a formula for the optimal angle of a solar paneled roof on the summer solstice (for a location in the northern hemisphere), where there is more of a power draw for air conditioning in the summer than for heating in the winter (e.g., Arizona). 7. Calculate the optimal angle of a solar paneled roof on the summer solstice for Tucson, AZ. 8. Derive a formula for the optimal angle of a solar paneled roof on the winter solstice (for a location in the northern hemisphere) in terms of the solar altitude. Hint: Use Fig. 6.8 and extend the top sunray into the Earth; identify the solar altitude angle and use the vertical angles theorem in conjunction with the definition of complementary angles.

6

GEOMETRY, TRIGONOMETRY, AND NATURAL BUILDING

6.2

184

Trigonometry

Definition 6.5. A circle is the set of all points (in a plane) at a fixed distance from a single point called the center. The fixed distance is called the radius. The diameter d of a circle is twice its radius (the distance across the circle through its center). The circumference C of a circle is the distance around (perimeter) the circle.

Definition 6.6. The real number π is defined as the ratio of the circumference of a (any) circle to the diameter: π=

C d

Definition 6.7. The unit circle is the circle centered at the origin (in the x-y-plane) with radius equal to 1.

Definition 6.8. Consider a point on the unit circle with coordinates (x, y) (see Fig. 6.9). Suppose the angle made by the positive x-axis and a line connecting the center of the circle to the point is θ. Then the trigonometric functions are defined as follows: • The cosine of θ is the x-coordinate of the point, i.e., cos θ = x. • The sine of θ is the y-coordinate of the point, i.e., sin θ = y. • The tangent of θ is the ratio of the y-coordinate to the x-coordinate, i.e., tan θ =

y x

=

sin θ cos θ .

6

GEOMETRY, TRIGONOMETRY, AND NATURAL BUILDING

Figure 6.9: The unit circle (radius of 1 and center at the origin).

Example 6.7. Find cos(90◦ ), sin(90◦ ), and tan(90◦ ).

185

6

GEOMETRY, TRIGONOMETRY, AND NATURAL BUILDING

186

Solution: From the figure above, we see that the x-coordinate of the point is x = 0. Thus, since the cosine of an angle is defined as the x-coordinate, cos(90◦ ) = 0. Similarly, since the sine of an angle is defined as the y-coordinate, sin(90◦ ) = 1. Now, the tangent of an angle is defined as the ratio of y to x, i.e., xy . However, since x = 0, tan(90◦ ) is undefined.

Example 6.8. Find cos(0◦ ), sin(0◦ ), and tan(0◦ ).

Solution: cos(0) = x = 1 sin(0) = y = 0 tan(0) =

0 sin(0) = =0 cos(0) 1

Example 6.9. Find cos(45◦ ), sin(45◦ ), and tan(45◦ ).

6

GEOMETRY, TRIGONOMETRY, AND NATURAL BUILDING

187

Solution: To find cos(45◦ ), we need to find the x-coordinate of the point in the above figure. We form a right triangle whose side lengths are x and y, and whose hypotenuse is 1 (since this is the unit circle). By the Pythagorean theorem, we have x2 + y 2 = 12 . Since the angle happens to be 45◦ , it follows that x = y so we can substitute x for y in the equation, x2 + x2 = 1 which simplifies to 1 x2 = . 2 Hence, r x=

√ 1 1 2 =√ = . 2 2 2

Since y = x,

√ y=

2 . 2

Thus,

√ ◦

cos(45 ) = and

2 2

√ ◦

sin(45 ) =

2 . 2

6

GEOMETRY, TRIGONOMETRY, AND NATURAL BUILDING Finally, tan(45◦ ) =

188

y =1 x

since x and y are the same. The trigonometric functions cosine, sine, and tangent can be defined using a right triangle (for θ less than (see Fig. 6.10). With the leg across from θ labeled opposite, the leg next to θ labeled adjacent, and the hypotenuse labeled hypotenuse, we have 90◦ )

• sin θ =

opposite hypotenuse

• cos θ =

adjacent hypotenuse

• tan θ =

opposite adjacent

Figure 6.10: SOHCAHTOA: A mnemonic device for remembering the three primary trigonometric functions. To see why the first relationship is true, we use similar triangles in the unit circle (Fig. 6.11) (ratios of corresponding sides of similar triangles are equal), to obtain y opp = . 1 hyp But since y = sin θ by definition, we have sin θ =

opp . hyp

The other two relationships are derived in similar fashion.

6

GEOMETRY, TRIGONOMETRY, AND NATURAL BUILDING

Figure 6.11: The unit circle, similar triangles, and SOHCAHTOA.

Example 6.10. Using the triangle below, find the cosine, sine, and tangent of the (non-right) angles.

Figure 6.12: 30-60-90 Triangle Solution: 1.

• • 2.

√ adjacent 3 = hypotenuse 2 opposite sin(30◦ ) = hypotenuse = 12 opposite tan(30◦ ) = adjacent = √13

• cos(30◦ ) =

• cos(60◦ ) =

adjacent hypotenuse

=

1 2

189

6

GEOMETRY, TRIGONOMETRY, AND NATURAL BUILDING

190

√ opposite 3 = hypotenuse 2 opposite tan(60◦ ) = adjacent = √13

• sin(60◦ ) = •

Definition 6.9. A radian is equal to an angle at the center of a circle whose arc is equal in length to the radius.

For an illustration of the definition of a radian, see https://www.dropbox.com/s/mcdawncia21lg97/Circle_radians.gif?dl=0 Since the circumference of the unit circle is 2π, and we have 360◦ in one full circle, we have the following relationship between degrees and radians: 2π rad = 360◦ . Hence, 2π rad π rad = = 1. ◦ 360 180◦ Thus, if we need to convert from degrees to radians, we must multiply by from radians to degrees, we multiply by 180 π .

π 180 .

On the other hand, to convert

The figure below gives cosine and sine values for some typical angles measured in degrees and radians.

6

GEOMETRY, TRIGONOMETRY, AND NATURAL BUILDING

Figure 6.13: The Unit Circle

Example 6.11. 1.

(a) Convert 90◦ to radians. (b) Convert 60◦ to radians. (c) Convert 45◦ to radians. (d) Convert π rad to degrees. (e) Convert

π 6

rad to degrees.

2. Find sin(π), cos(π), tan(π), cos( π4 ), sin( π3 ), and tan( π6 ). Solution: 1.

(a) Convert 90◦ to radians: 90◦ ·

π rad π = rad ◦ 180 2

60◦ ·

π rad π = rad ◦ 180 3

(b) Convert 60◦ to radians:

191

6

GEOMETRY, TRIGONOMETRY, AND NATURAL BUILDING (c) Convert 45◦ to radians: 45◦ ·

π rad π = rad ◦ 180 4

(d) Convert π rad to degrees: π rad · (e) Convert

π 6

180◦ = 180◦ π rad

rad to degrees: π 180◦ rad · = 30◦ 6 π rad

2. Find sin(π), cos(π), tan(π), cos( π4 ), sin( π3 ), and tan( π6 ). Using the unit circle (Fig. 6.13), we have • sin(π) = 0, cos(π) = −1, and tan(π) = √

• sin( π3 ) =

2 2 √ 3 2

• tan( π6 ) =

sin( π6 ) cos( π6 )

• cos( π4 ) =

=

1 √2 3 2

=

√1 3

0 −1

= 0.



=

3 3

Note: If there is no degree symbol, ◦ , after the angle measure, we assume the angle is in radians. Example 6.12. Converting roof angle to pitch. Convert the roof angle obtained in Example 6.6 (Elizabethan roof in South Bend, IN) to roof pitch. Solution: Using Fig. 6.14, we have pitch = slope =

rise opposite = = tan θ = tan(65.12◦ ) ≈ 2.16 run adjacent

(using a calculator for the tangent function).

192

6

GEOMETRY, TRIGONOMETRY, AND NATURAL BUILDING

193

Figure 6.14: Roof pitch (slope) is equal to the tangent of the roof angle, tan θ. Example 6.13. Thatch roof pitch: https://strawbalestudio.org/ The higher the roof pitch, the less rainwater can penetrate the roofing and the less time is needed for the roof to dry after a rain: 25◦ Roof → up to 15 years roof life 30◦ Roof → 10-20 years roof life 45◦ Roof → 25-45 years roof life 50◦ Roof → 45 years and longer roof life Calculate the roof pitch if you want your thatch roof to last 10-20 years. Solution: For a 10-20 year roof life, we need a roof angle of 30◦ . Thus, the correct pitch is √ 3 ◦ pitch = tan(30 ) = ≈ 0.58. 3 Definition 6.10. Inverse trigonometric functions • If x = cos(θ), then θ = arccos(x). • If y = sin(θ), then θ = arcsin(y). • If z = tan(θ), then θ = arctan(z).

Notational Note: The notation cos−1 (x) is also used to denote arccos x (similarly for arcsin x and arctan x). I.e.,

6

GEOMETRY, TRIGONOMETRY, AND NATURAL BUILDING

194

• cos−1 (x) = arccos(x) • sin−1 (x) = arcsin(x) • tan−1 (x) = arctan(x)

Example 6.14. Converting roof pitch to angle. Suppose a roof has pitch 14 . Find the corresponding roof angle in radians and degrees. Solution: Since pitch = tan(θ),   1 θ = arctan(pitch) = arctan = 0.245 rad. 4 Note: A calculator will typically return angles in radians. Thus we must convert to degrees using the formula degrees = 180 π radians: 180 · 0.245 ≈ 14◦ . π

* Measure Current Solar Altitude *

GEOMETRY, TRIGONOMETRY, AND NATURAL BUILDING

195

Class Problem 6.2. Passive Solar Window Design Given the dimensions of a window (in particular, its height y), distance from the ground to the roof H, and solar altitudes at the summer and winter solstices, αs and αw respectively, we wish to find the length of the overhang L and height of the window above ground h such that no sunlight enters the window on the summer solstice but none of the window is shaded on the winter solstice. Summer Solstice

Winter Solstice L 𝑥 𝑦

𝛼𝑠

𝛼𝑤

Window

6

𝐻 ℎ

1. Following the steps below, determine the length of the overhang and the height above ground that the window should be in order to optimally utilize solar energy. (a) Find a relationship between αw , x, and L. Then solve for x. (b) Find a relationship between αs , x, y, and L. Then solve for x. (c) Equate the expressions for x (found in parts (a) and (b)) and solve for L. (d) Substitute the expression for L (from part (c)) into your result from part (a). (e) Find a relationship between H, h, x, and y. Solve for h. (f) Substitute the expression for x (from part (d)) into the result of part (e). (g) Summarize: What are the optimal overhang length and window placement formulas? 2. Suppose you are building a 15 ft tall house (ground to roof overhang) in the Winona area with 4ft tall window. Find the optimal overhang length and window placement.

6

GEOMETRY, TRIGONOMETRY, AND NATURAL BUILDING

6.3

196

Exercises

1. Listen to Hidden Brain podcast: Our Better Nature: How The Great Outdoors Can Improve Your Life • https://www.npr.org/2018/09/10/646413667/ Write a 1 page summary of the podcast. 2. Suppose we have a triangle with leg lengths 1 and 6. Find the length of the hypotenuse. 3. Suppose we have a triangle with leg length 12.5 and hypotenuse length 20. Find the length of the other leg. 4. Angles β and γ are supplementary with γ =

π 3

rad. Find the radian and angular measure of β.

5. Prove #4 in Theorem 5.2 (you may use #1, 2, and 3 of the theorem if you need). 6. Calculate the maximal solar altitude on the equinoxes for a location with latitude lat. 7. Calculate the maximal solar altitude on the equinoxes for your favorite location on Earth. 8. Suppose a builder in the southern hemisphere wishes to determine the pitch of a roof with solar panels affixed to it. Since demand for electricity is greatest in the winter (southern hemisphere winter), the panels should be perpendicular to sunrays at the winter solstice. Find the appropriate roof angle, θ. 9. Apply the Pythagorean Theorem to the right triangle below to obtain the Pythagorean Identity.

10. Consider the right triangle from Exercise 2. Find the measures of the other two angles. 11. Consider the right triangle from Exercise 3. Find the measures of the other two angles. 12. What is π? (not the approximate value) 13. Find the sine cosine and tangent of π using the definitions (not a calculator). Draw a graph to illustrate how you got the answers. 14. Convert 10◦ to radians. 15. Convert 4 rad to degrees. 16. Suppose a 20 foot tall tree casts a 40 foot long shadow on the ground. Find the solar altitude.

7

CALCULUS AND SOCIAL JUSTICE

7

Calculus and Social Justice

197

* Mathematics of Social Justice Activity * Connections between Social Bias, Segregation, and Gerrymandering

In this chapter we focus on measuring the distribution of resources (e.g., money, land, energy, food, etc.) using techniques of differential and integral calculus. In particular, we will examine a method for deciding whether or not a resource (such as income) is divided equally among the individuals in a country. For more background on this problem read about the Lorenz Curve (UMAP Module 60, sections 1.1-1.3): https://www.dropbox.com/s/dqu92004evrst04/UMAP60.PDF?dl=0 “We are the 99%” is a political slogan widely used and coined by the Occupy movement. A related statistic, the 1%, refers to the top 1% wealthiest people in society that have a disproportionate share of capital, political influence, and the means of production.

Before discussing the application of calculus to the problem of equal distribution of resources, we must first define a few of the essential mathematical tools from calculus.

7

CALCULUS AND SOCIAL JUSTICE

7.1

198

Differentiation

Definition 7.1. The Derivative The derivative of a function f at a point x is the instantaneous rate of change of f , and is denoted by f 0 (x). That is, the derivative measures how fast the y-coordinate increases or decreases as we move along the graph of f . Thus, the derivative gives the slope of the graph of f at the particular point x. The process of finding a derivative is called differentiation. A derivative is essentially a speed (velocity) – the rate at which a quantity (such as distance traveled) changes for unit increases in some other quantity (such as time). For instance, if you are driving 60 mph, then your distance is changing by 60 miles per every 1 hour (unit increase in time). Hence, your derivative at that moment is 60. Suppose you begin to accelerate from rest (at a stop light) in your car. Your initial instantaneous velocity is 0 mi hr . That is, at time t = 0, your derivative (instantaneous rate of change of position with respect to time) is equal to 0. As you accelerate, you speed up. The derivative of your position at any time is equivalent to your speed at that time. Example 7.1. Suppose your position, y = f (t), (distance from your home) with respect to time t is given by the following graph:

Assume you left work on bike 1 mile from home at time t = 0 and headed toward a store 2 miles away (in the opposite direction as home).

7

CALCULUS AND SOCIAL JUSTICE

199

1. How fast were you traveling in the first four minutes? Solution: Since you traveled 2 miles in 4 minutes (starting 1 mile from home and arriving 3 miles from home at minute 4), your speed was 0.5 mile per minute (30 mph). Thus, f 0 (t) = 0.5 for all t between 0 min and 4 min. For example, f 0 (1.25) = 0.5, f 0 (3.14159) = 0.5, etc. Note that this is the slope of the graph of f for 0 < t < 4. 2. What direction were you traveling in the first four minutes? Solution: Since your distance from home was increasing, your direction was away from home. This corresponds to the positive slope of f over this time interval. 3. How fast were you traveling between minutes 4 and 6? Solution: Since your position did not change in this time interval (at the store, y = 3 for those two minutes), your speed must have been zero. Thus, f 0 (t) = 0 for all t between 4 min and 6 min. Note that this is the slope of the graph of f for 4 < t < 6. 4. How fast were you traveling between minutes 6 and 9? mi Solution: Since you traveled 3 miles in 3 minutes, your speed was 1 min (60 mph). 0 Furthermore, since your distance from home was decreasing, f (t) = −1 for all t between 6 min and 9 min. Note that this is the (negative) slope of the graph of f for 6 < t < 9.

5. Where were you at minute 9? Solution: Since your position at t = 9 was 0, i.e., f (9) = 0, you were home! 6. Graph the derivative function. Solution:

7

CALCULUS AND SOCIAL JUSTICE

200

Other notations for the derivative: f 0 (x) = y 0 =

d dy = (Function) dx dx

dy *Note: The notation dx indicates that the derivative gives the slope of a function at a particular point, as in, change in y ∆y ∆x = change in x . For a curve, the derivative at a point gives the slope of the tangent line at that point on the curve.

7

CALCULUS AND SOCIAL JUSTICE

201

Example 7.2. Suppose a biodiesel rocket launched upward accelerates such that its position (miles above the ground) is given by f (t) = t3 where t is time in minutes (see the figure below). 1. Using the graph of f , find the rocket’s velocity (instantaneous speed and direction) (a) 0.5 min after launch, (b) 1 min after launch, (c) 1.5 min after launch, and (d) 2 min after launch. 2. Graph the derivative of f , i.e., plot the velocities from part 1.

7

CALCULUS AND SOCIAL JUSTICE

202

Solution: 1.

(a) The velocity of the rocket at 0.5 min is equal to the slope of the tangent line (purple) to the position curve (red) at t = 0.5. To find the slope of the tangent line, we need two points on the line. Using the graph, we see that the tangent line contains the point of tangency (as all tangent lines do), which is (0.5, 0.125). The line also contains the point (1, 0.5). Thus, the slope (i.e., velocity) is y2 − y1 0.5 mi − 0.125 mi 0.375 mi mi ∆y = = = = 0.75 = 45 mph. ∆x x2 − x1 1 min − 0.5 min 0.5 min min (b) The rocket’s velocity after 1 min is equal to the slope of the tangent line (green) to the position curve (red) at t = 1. Using the points (1, 1) and (1.5, 2.5) on the line, we find the

7

CALCULUS AND SOCIAL JUSTICE

203

velocity as ∆y 2.5 mi − 1 mi y2 − y1 1.5 mi mi = = = =3 = 180 mph. ∆x x2 − x1 1.5 min − 1 min 0.5 min min (c) The rocket’s velocity after 1.5 min is equal to the slope of the tangent line (orange) to the position curve (red) at t = 1.5. Using the points (1.5, 3.375) and (1, 0) on the line, we find the velocity as ∆y 3.375 mi − 0 mi y2 − y1 3.375 mi mi = = = = 6.75 = 405 mph. ∆x x2 − x1 1.5 min − 1 min 0.5 min min (d) The rocket’s velocity after 1.5 min is equal to the slope of the tangent line (blue) to the position curve (red) at t = 2. Using the points (2, 8) and (1.5, 2) on the line, we find the velocity as ∆y y2 − y1 6 mi mi 8 mi − 2 mi = = = 12 = 720 mph. = ∆x x2 − x1 2 min − 1.5 min 0.5 min min 2. Graph of the derivative f 0 (t) (velocity function):

7

CALCULUS AND SOCIAL JUSTICE

204

* From the graph we see that the derivative function is f 0 (t) = 3t2 .

We now present several techniques for determining the derivative of a function. Rules for differentiation: 1. Power Rule:

d n (x ) = nxn−1 dx

for any constant n. 2. Derivative of a Constant Function is

d (c) = 0, dx

7

CALCULUS AND SOCIAL JUSTICE

205

since the graph of a constant function is a horizontal line (which has slope 0). 3. Derivative of a Linear Function is

d (mx + b) = m, dx

since the slope of a linear function is m. 4. Constant Multiple Rule: d d [cf (x)] = c f (x) dx dx for any constant c. 5. Sum/Difference Rule:

d d d [f (x) ± g(x)] = f (x) ± g(x) dx dx dx

6. Derivative of Exponential Functions: d kx e = kekx dx for any constant k. 7. Derivative of the Natural Logarithmic Function: d 1 ln x = dx x

8. Derivative of Cosine Functions: d cos(bx) = −b sin(bx) dx for any constant b. 9. Derivative of Sine Functions:

for any constant b.

d sin(bx) = b cos(bx) dx

7

CALCULUS AND SOCIAL JUSTICE

206

Example 7.3. 1. Use the power rule to determine the derivative of the position function given in Example 7.2. Solution: For f (t) = t3 , we have f 0 (t) = 3t3−1 = 3t2 . 2. Use the derivative to verify parts (a)-(d) of Example 7.2.

Example 7.4. 1. If f (x) = 6.5 (for all x), find f 0 (x). d Solution: f 0 (x) = dx (6.5) = 0 (since f is a constant function). 2. If h(t) = 3t − 9, what is the derivative of h? Solution: Since h is linear with slope 3 and the derivative is slope, we have h0 (t) = 3. 3. Differentiate x4 . d Solution: dx (x4 ) = 4x3 4. Differentiate 5x4 . d d Solution: dx (5x4 ) = 5 dx (x4 ) = 5 · 4x4−1 = 20x3 dy for the function y = 4x + 12 x2 . 5. Find dx dy =4+x Solution: dx

6. If f (x) = 3ex + x1 , what is f 0 (x)? d d Solution: f 0 (x) = 3 dx (ex ) + dx (x−1 ) = 3ex + (−1) · x−2 = 3ex −

1 x2

7. If g(t) = 8 cos(t), what is g 0 (t)? Solution: g 0 (t) = −8 sin(t) 8. Find the slope of the curve given by f (x) = x2 at the point where x = 3. Solution: We first find the derivative of f : f 0 (x) = 2x. The slope of the curve is given by the derivative evaluated at the given x-value: f 0 (3) = 2(3) = 6. 9. Suppose a road is represented by the graph of the function f (x) = sin(0.2x).

7

CALCULUS AND SOCIAL JUSTICE

207

(a) What is the grade of the road at x = 4? Solution: The grade of a road is the slope of the road (usually given as a %). Thus, we must first find the derivative of f : f 0 (x) = 0.2 cos(0.2x). The slope of the curve is given by the derivative evaluated at the given x-value: f 0 (4) = 0.2 cos(0.2(4)) = 0.2 cos(0.8) ≈ 0.14 = 14%.

(b) At what point does the car reach the top of the hill? Solution: When the car is at the top of the hill, the road is essentially flat. That is, the slope at the top is 0 (not sloped at all). Therefore, to find this point, we set the derivative (slope) equal to zero: f 0 (x) = 0.2 cos(0.2x) = 0 cos(0.2x) = 0 0.2x = cos−1 (0) = x=

π 2

5π ≈ 7.85. 2

10. Find the point on the curve given by r(p) = 2e0.3p where the slope is equal to 1. Solution: We must set the derivative of r(p) equal to 1: r0 (p) = 2 · 0.3e0.3p = 1 0.6e0.3p = 1 e0.3p = 5/3

7

CALCULUS AND SOCIAL JUSTICE

208

0.3p = ln p=

5 3

10 5 ln ≈ 1.7. 3 3

11. Determine the point(s) (if any) on the graph of f (x) = 12 x + e−10x where the slope is equal to zero. Solution: Since the slope of a curve is given by the derivative, we need to take the derivative of f and set it equal to 0: 1 f 0 (x) = + (−10)e−10x = 0. 2 Now solve for x: 1 −10e−10x = − 2 1 e−10x = 20   1 −10x = ln 20   −1 1 x= ln 10 20 x=

1 ln 20 ≈ 0.3. 10

7

CALCULUS AND SOCIAL JUSTICE

209

Example 7.5. Leaky Oil Pipeline For many members of the Standing Rock Sioux Tribe, the Dakota Access pipeline is just another hit against a group that’s been taking economic punches for generations. The tribe has lived in North and South Dakota for hundreds of years, yet members have little say in how their ancestral land is used by the federal government. Meanwhile, sky-high unemployment rates and poverty levels have afflicted the Sioux for decades. Now, tribe members and protestors say the pipeline could imperil the Sioux drinking water and the environmental health of the land they live on. The pipeline was originally slated to be built north of Bismarck, but was moved further south after public outcry that a leak would destroy drinking water. That same criticism appears to be being ignored when it’s made by native people. Source: https://en.wikipedia.org/wiki/Dakota_Access_Pipeline_protests

7

CALCULUS AND SOCIAL JUSTICE

210

Consider an oil pipeline that is leaking oil. The volume V gallons (gal) of oil leaked by time t minutes (with t = 0 corresponding to the time the leak was sprung) is given by V (t) =

1 3 t − 5t2 + 100t, 12

for 0 ≤ t ≤ 20. 1. Find the amount of oil leaked 10 minutes after the leak was sprung. Solution: 1 V (10) = (10)3 − 5(10)2 + 100(10) = 583.3 gal 12 2. Find the rate at which oil is leaking from the pipeline initially. Solution: First we must find the derivative of the volume function, which gives the rate of change of oil with respect to time: dV 1 = t2 − 10t + 100 dt 4 The words initially or initial always refer to t = 0. Thus, we need to evaluate the derivative at time t = 0. That is, we need to find V 0 (0). (Note that V 0 (t) is the same as dV dt .) V 0 (0) =

1 2 · 0 − 10 · 0 + 100 = 100 gal/min. 4

3. How fast is oil leaking from the pipeline after 10 minutes? Solution: 1 V 0 (10) = · 102 − 10 · 10 + 100 = 25 gal/min. 4

7

CALCULUS AND SOCIAL JUSTICE

211

4. When will the oil have stopped flowing from the leak? (Assuming the flow was immediately shut off when the leak was detected (at t = 0).) Solution: We need to find the time when zero oil is flowing, i.e., when dV dt = 0: 1 0 = t2 − 10t + 100 4 Multiply by 4: 0 = t2 − 40t + 400 Factor (unFOIL): 0 = (t − 20)(t − 20) t − 20 = 0 t = 20 min 5. How much oil in total leaked from the pipeline? Solution: 1 V (20) = (20)3 − 5(20)2 + 100(20) = 666.7 gal 12 6. Graph the volume function and its derivative function. Solution:

CALCULUS AND SOCIAL JUSTICE

212

Volume of oil leaked, V(t)

800

600

400

200

0 0

2

4

6

8

10

12

14

16

18

20

22

14

16

18

20

22

time, t

120 100

Rate of oil leak, V ′ (t)

7

80 60 40 20 0 0

2

4

6

8

10

12

time, t

Note that the derivative gives the slope of a function.

7

CALCULUS AND SOCIAL JUSTICE

213

Class Problem 7.1. Differentiation 1. Suppose a biodiesel rocket launched upward accelerates such that its position (miles above the ground) is given by h(t) = 18 t4 − cos t + 1 where t is time in minutes. (a) Find the rocket’s velocity 1 min after launch. (b) Find the rocket’s velocity 2 min after launch. (c) Find the rocket’s velocity 10 min after launch. (d) Graph f and its derivative (use Desmos, a calculator, or the like). 2. If y = 2e5x +

1 , x2

what is

dy dx ?

3. Suppose a road is represented by the graph of the function E(x) = −x(x − 6), where E denotes elevation and x denotes the horizontal distance from some reference point (at the origin). (a) Graph E(x). (b) What is the grade of the road at x = 2? (c) Where is the highest point on the road? 4. Consider an oil pipeline that is leaking oil. The volume V barrels (bar) of oil leaked by time t minutes (with t = 0 corresponding to the time the leak was sprung) is given by V (t) = 1000 − 1000e−0.01t − 0.5t, for 0 ≤ t ≤ 300. (a) Find the amount of oil leaked 1 hour after the leak was sprung. (b) Find the rate at which oil is leaking from the pipeline initially. (c) How fast is oil leaking from the pipeline after 1 hour? (d) When will the oil have stopped flowing from the leak? (Assuming the flow was immediately shut off when the leak was detected (at t = 0).) (e) How much oil, in total, leaked from the pipeline?

7

CALCULUS AND SOCIAL JUSTICE

214

Economic Inequality and the Equal Share Coefficient

Left: An affluent house in Holmby Hills, Los Angeles. Right: Tents of the homeless on the sidewalk in Skid Row, Los Angeles (only a few miles from Holmby Hills). https://en.wikipedia.org/wiki/Economic_inequality

“The people and the nations of the world own many resources; money, land, energy, food, oil, etc. One of the most important questions about these resources has to do with how they are distributed among people or among groups of people.” - UMAP Module 60-62: https://www.dropbox.com/s/dqu92004evrst04/UMAP60.PDF?dl=0 In what follows, we will see how the derivative can be used to measure how unequal the distribution of a resource may be for a certain population. One way to graphically represent the distribution of a resource is through a Lorenz curve.

7

CALCULUS AND SOCIAL JUSTICE

215

Lorenz Curves: A Lorenz curve is the graph of a function that gives the fraction of a resource owned by certain proportions of people, entities, or countries. Figure 7.1 shows an example Lorenz curve describing

7

CALCULUS AND SOCIAL JUSTICE

216

global income distribution in 2011.

Figure 7.1: Data and Lorenz curve (right) for global income distribution in 2011.

Example 7.6. Consider the Lorenz curve approximating the income distribution in the U.S. in the early 1970s: r(p) = p2 where r represents the fraction of the resource (income) received by the lowest p percent (proportion) of the population.

7

CALCULUS AND SOCIAL JUSTICE

217

Lorenz curve for 1970s U.S. income distribution. From the graph (or table) of the Lorenz curve, we can see that, for example, the lowest 20% of people (that is, the 20% receiving the lowest annual income) were receiving only 4% of the total income: r(0.2) = (0.2)2 = 0.04 Similarly, the lowest 40% received only 16% of the income. In a perfectly equitable society, we would expect the lowest 20% of people get 20% of the resource, and the the lowest 40% of people get 40% of the resource, and so on. This is called complete equality. On the other hand, if all of the resource is owned by a single person, we have complete inequality. For the income in early 1970s U.S., the distribution is somewhere between those two extremes, however, there is certainly a significant inequality. 1. Use the Lorenz curve to estimate the fraction of income going to the lower half of the population. What fraction is going to the upper half of the population? Solution: For p = 0.5, we have r(0.5) = 0.25. Thus, the lower half of the population receives only ≈ 25% of the income. The upper half receives 75% of the income, a disproportionately larger share.

7

CALCULUS AND SOCIAL JUSTICE

218

2. Use the Lorenz curve to estimate the lowest earning proportion of people receiving half of the income. What fraction of the highest earners is getting half the income? Solution: Since r(0.7) ≈ 0.5, approximately 70% of the lowest earners must divide up half of the income while only 30% of the highest earners get to divide up the other half. Another clear illustration of the inequality in wealth distribution. 3. The bottom 95% of people are getting what fraction of the income? What fraction is going to the richest 5%? Solution: ≈ 95% of the lowest earners get 90% of the income. Thus, the richest 5% are getting 10% of the income.

Example 7.7. Consider the Lorenz curve approximating the income distribution in the U.S. in the early 1980s: r(p) = p3 where r represents the fraction of the resource (income) received by the lowest p percent (proportion) of the population.

7

CALCULUS AND SOCIAL JUSTICE

219

Lorenz curves for U.S. income distribution. The graph shows the Lorenz curves from the early 70s and 80s as well as the ideal case of complete equality (the straight line, r = p). Note that for the 1980s income distribution, the bottom 20% of people are only getting 0.8% of the income (compared to 4% in the 1970s): r(0.2) = (0.2)3 = 0.008 Similarly, the lowest 40% received only 6.4% in the 1980s (compared to 16% in the 1970s). In the previous example, we see that there was more inequality in the 1980s income distribution than in the 1970s. This is reflected in the graphs of the Lorenz curves by how far the curves deviate (sag) from the curve of absolute equality. The 1980s curve is clearly further from absolute equality than the 1970s curve. In general, the degree to which a Lorenz curve sags below the absolute equality line determines the magnitude of inequality in the distribution. One way to quantitatively measure the degree to which a Lorenz curve sags below equality (i.e., the degree of inequality in a resource distribution) is the equal share coefficient (ESC). The ESC divides the population into two groups: those that receive less than and equal share, and those that receive more than an equal share. The next example illustrates how the ESC relates to the slope of the Lorenz curve.

7

CALCULUS AND SOCIAL JUSTICE

220

Example 7.8. Consider the early 1970s U.S. income distribution with Lorenz curve, r(p) = p2 .

Lorenz curve for 1970s U.S. income distribution. The 10% between the lowest 10% and lowest 20% receives only 3% of the total income, a less than equal share. We see from the graph above that the lowest 10% receives only 1% of the income, and the lowest 20% receives only 4%. Now, the 10% between the lowest 10% and lowest 20% receives 4% − 1% = 3% of the income. Hence, this 10% receives less than an equal share (an equal share for 10% of the population would be 10% of the income). Note that the slope of the line containing the points corresponding to the lowest 10% and lowest 20% is 0.04 − 0.01 0.03 slope = = = 0.3. 0.2 − 0.1 0.1 Note that a slope exactly equal to 1 would correspond to an equal share. Since the calculated slope is less than 1, the corresponding segment of the population (between the 10th and 20th percentiles) gets less than an equal share of the income. 1. Calculate the share received by the segment between the lowest 80% and the lowest 90%. Solution:

7

CALCULUS AND SOCIAL JUSTICE

221

Lorenz curve for 1970s U.S. income distribution. The 10% of the population (between the 80th and 90th percentiles) receives 17% of the total income, a more than an equal share. The share we seek is equal to the slope of the line containing the points corresponding to the lowest 80% and lowest 90%: slope =

0.81 − 0.64 0.17 = = 1.7. 0.9 − 0.8 0.1

Since this slope is greater than 1, the 10% of the population between the 80th and 90th percentiles receives more than an equal share. So, we have determined that the group between the 10th and 20th percentiles receives less than an equal share, and the group between the 80th and 90th percentiles receives more than an equal share. Therefore, there must be a point which divides the population into those who receive less than a fair share (slope < 1) and those who receive more than a fair share (slope > 1), i.e., the point on the Lorenz curve with slope = 1.

7

CALCULUS AND SOCIAL JUSTICE

222

2. Find the point on the Lorenz curve where the slope equals 1. Solution: To find the slope of the Lorenz curve, we take the derivative: r0 (p) = 2p Now, set the derivative equal to 1 (we want slope to equal 1) and solve for p: 2p = 1 1 = 0.5. 2 The ESC for this Lorenz curve is 0.5. The slope of the curve is less than one for p < 0.5 and greater than one for p > 1. Hence, the bottom 50% receives less than an equal share, while the top 50% receives more than an equal share. p=

The p-value where the slope of the Lorenz curve equals 1 is the equal share coefficient (ESC). (The slope of the dashed line is 1.) The ESC for the 1970s income distribution Lorenz curve is p = 0.5. That is, the bottom 50% of the population receives less than an equal share, while the top 50% receives more than an equal share.

Definition 7.2. The Equal Share Coefficient (ESC) The equal share coefficient (ESC) for a resource distribution with Lorenz curve, r(p), is the p-value

7

CALCULUS AND SOCIAL JUSTICE

223

corresponding to the point on the Lorenz curve where the slope is 1. The ESC represents the proportion of the population that receives less than an equal share of a resource. (1 − ESC is the proportion that receives a more than equal share.) Note that the ESC is always between 0 and 1, 0 ≤ ESC ≤ 1. The closer the ESC is to 0, the closer the resource distribution is to complete equality. On the other hand, the closer the ESC is to 1, the closer the distribution is to complete inequality. Therefore, the ESC measures the degree of inequality in a resource distribution. Example 7.9. Find the ESC for the early 1980s U.S. income distribution with Lorenz curve given by r(p) = p3 . Solution: We set the derivative of the Lorenz curve equal to 1 and solve for p: r0 (p) = 3p2 = 1 p2 =

1 3

 ( 1 ) 1 2 p= ≈ 0.58 = 58%. 3 Hence, 58% of the population receives less than their fair share.

7

CALCULUS AND SOCIAL JUSTICE

224

Class Problem 7.2. Lorenz Curves and the ESC * For every question below, comment on what your answer implies about inequality in the resource distribution. 1. 2011 world income distribution. (a) Use the data given in Fig. 7.1 to construct a Lorenz curve of the form r(p) = pn (for some constant n) for global income in 2011. Note that r is the fraction of the total income - thus, you will need to divide each income value by the total income, 26,316 (recall that this is in billions of dollars). To construct the Lorenz curve, plot the data points in Desmos and graph the function r(p) = pn with a slider for n. Use the slider to choose the value of n that makes the Lorenz curve fit the data best. (b) Use the Lorenz curve to estimate the fraction of income going to the lower half of the population. What fraction is going to the upper half of the population? (c) Estimate the lowest earning proportion of people receiving half of the income. What fraction of the highest earners is getting half the income? (d) Find the fraction of income going to the segment of the population between the 30th and 35th percentiles. What is the slope of the Lorenz curve for this segment? (e) Find the fraction of people getting less than an equal share.

2. Consider the 2010 distribution of wealth in the U.S. represented by the Lorenz curve, r(p) = 0.000000015(e18p − 1). (a) The bottom 90% of people are getting what fraction of the income? (b) What fraction is going to the richest 1%? (c) Find and interpret the equal share coefficient.

7

CALCULUS AND SOCIAL JUSTICE

7.2

225

Integration

Another measure of inequality in a resource distribution, the Gini index, involves calculating areas above and below Lorenz curves. Areas can be calculated using the calculus process of integration, which involves reversing the process of differentiation, i.e., finding antiderivatives.

Definition 7.3. Antiderivative A function F is called an antiderivative of the function f if the derivative of F is f , that is, if F 0 (x) = f (x).

Example 7.10. 1. An antiderivative of f (x) = x + 1 is F (x) = 21 x2 + x since x + 1 is the derivative of 12 x2 + x, i.e., F 0 (x) = f (x). 2. An antiderivative of g(x) = cos x + ex is G(x) = sin x + ex since G0 (x) = g(x). 3. Find an antiderivative of f (x) = 2x. Solution: F (x) = x2 is an antiderivative of f since F 0 (x) = 2x = f (x). 4. Find an antiderivative of f (t) = 3t2 . Solution:

7

CALCULUS AND SOCIAL JUSTICE

226

F (t) = t3 is an antiderivative of f since F 0 (t) = 3t2 = f (t). 5. Find an antiderivative of h(x) = 4x3 . Solution: H(x) = x4 is an antiderivative of h since H 0 (x) = 4x3 = h(x). 6. Find an antiderivative of g(t) =

1 . t2

Solution: Note that g(t) can be written as g(t) = t−2 . Hence, H(t) = −t−1 = − 1t is an antiderivative of h since H 0 (t) = h(t). 7. Find an antiderivative of f (x) = x3 − sin x. Solution: F (x) = 14 x4 + cos x is an antiderivative of f since F 0 (x) = f (x).

Example 7.11. 1. An antiderivative of f (x) = x + 1 is F (x) = 12 x2 + x since F 0 (x) = f (x). 2. An antiderivative of f (x) = x + 1 is F (x) = 12 x2 + x + 2 since F 0 (x) = f (x). 3. An antiderivative of f (x) = x + 1 is F (x) = 12 x2 + x − π since F 0 (x) = f (x). 4. An antiderivative of f (x) = x+1 is F (x) = 12 x2 +x+C for any constant C since F 0 (x) = f (x).

Theorem 7.1. If F is an antiderivative of f , then so is F (x) + C for any constant C.

Example 7.12. 1. Find all antiderivatives of g(x) = cos x + ex . Solution: G(x) = sin x + ex + C 2. Find all antiderivatives of f (x) = 1. Solution: F (x) = x + C 3. Find all antiderivatives of f (x) = x.

7

CALCULUS AND SOCIAL JUSTICE Solution: F (x) = 12 x2 + C 4. Find all antiderivatives of f (x) = x2 . Solution: F (x) = 13 x3 + C 5. Find all antiderivatives of f (x) = x3 . Solution: F (x) = 14 x4 + C 6. Find all antiderivatives of f (x) = xn for any real number n 6= −1. Solution: 1 n+1 + C n+1 x

Definition 7.4. Indefinite Integral The set of all antiderivatives of a function f (x) is called the indefinite integral of f and is denoted by Z f (x) dx. The process of finding an indefinite integral is called integration.

Example 7.13. 1. If g(x) = cos x + ex , find

R

g(x)dx.

RSolution: g(x)dx = sin x + ex + C R 2. Find (x + 1)dx. Solution: R (x + 1)dx = 12 x2 + x + C 3. Evaluate

R

4t3 − 2t2 dt.

RSolution: 4t3 − 2t2 dt = t4 − 23 t3 + C

227

7

CALCULUS AND SOCIAL JUSTICE

228

Indefinite Integral Rules: Assume k and n are real number constants.

1. Constant Multiple Rule:

R

kf (x) dx = k

R

f (x) dx.

R R R 2. Sum/Difference Rule: [f (x) ± g(x)] dx = f (x) dx ± g(x) dx. 3. Power Rule for Integration: 4. Integral of a Constant:

R

5.

R

ekx dx = k1 ekx + C.

6.

R

1 x+k

7.

R

cos x dx = sin x + C.

8.

R

sin x dx = − cos x + C.

R

xn dx =

1 xn+1 n+1

+ C.

k dx = kx + C.

dx = ln(x + k) + C.

Example 7.14. 1. Find

R

10 x

dx.

RSolution: R 10 x dx = 10 2. Find

R

1 x

dx = 10 ln x + C

dx.

Solution: R dx = x + C 3. Find the indefinite integral of f (x) =



x.

Solution: Z √ 4. Integrate



Z x dx =

1

1

3

1

x 2 dx =

1x 2

1 +1 2

3

+ C = 2x 2 + C

x3 .

Solution: Z √

x3 dx =

Z

x 2 dx =

3 2

3 2 5 x 2 +1 + C = x 2 + C 3

7

CALCULUS AND SOCIAL JUSTICE 5. Evaluate

R

229

e−2x dx.

Solution: Z

e−2x dx =

1 −2x 1 e + C = − e−2x + C. −2 2

Example 7.15. Initial Value Problems (IVP) 1. Find the function f so that f 0 (x) = ex + cos(x) and f (0) = 3. Solution: FirstRwe need to find a function f whose derivative is ex + cos(x). In other words, we need to find ex + cos(x) dx: Z ex + cos(x) dx = ex + sin x + C

So let f (x) = ex + sin x + C. Next we need to determine the value of C so that f (0) = 3: set

f (0) = e0 + sin(0) + C = 1 + 0 + C = 1 + C = 3 So

must

C = 2 Therefore, the function we are looking for is f (x) = ex + sin x + 2. Check: (differentiate and make sure the initial value is satisfied) f 0 (x) = ex + cos(x), and f (0) = e0 + sin(0) + 2 = 1 + 0 + 3 = 3.

2. Find the function y(t) so that

dy dt

= 4et + sin(t) −

Solution: We need an antiderivative of 4et + sin(t) − Z y=

4et + sin(t) −

1 t+1 .

1 t+1

and y(0) = 10.

I.e., we need to integrate:

1 dx = 4et − cos(t) − ln(t + 1) + C. t+1

Now we incorporate the initial value: set

y(0) = 4e0 − cos(0) − ln(1) + C = 4 − 1 − 0 + C = 3 + C = 10.

7

CALCULUS AND SOCIAL JUSTICE

230

Thus, C = 7, and we have y(t) = 4et − cos(t) − ln(t + 1) + 7 as the solution.

There are two types of integrals: the indefinite integral and the definite integral. The definite integral (in conjunction with the indefinite integral) can be used to measure wealth inequality by comparing areas around a Lorenz curve. Definition 7.5. Definite Integral The definite integral of the function f from a to b is the area under the graph of f (and above the x-axis) between the vertical lines x = a and x = b.

The definite integral is denoted by Z

b

f (x)dx. a

Example 7.16. 1. Evaluate Solution:

R5 2

7 dx.

7

CALCULUS AND SOCIAL JUSTICE

231

The area we are looking for is a rectangle with height equal to 7 and length equal to 5 − 2 = 3. Thus, since area = height × length for a rectangle, the area is 21. So, the definite integral is Z 5 7 dx = area of rectangle = 21. 2

2. Calculate Solution:

R2 0

f (x) dx if f (x) = 4x.

7

CALCULUS AND SOCIAL JUSTICE

232

The area we are looking for is a triangle with height equal to 8 and length equal to 2. Thus, since area = 12 × height × length for a triangle, the area is 12 · 8 · 2 = 8. So, the definite integral is Z

2

f (x) dx = area of triangle = 8. 0

3. Find

R3 1

3t + 1 dt.

Solution:

7

CALCULUS AND SOCIAL JUSTICE

233

The area we are looking for is a combination of two sub-areas: a rectangle and a triangle. Thus, we have Z 3 1 3t + 1 dt = area of rectangle + area of triangle = 4 · 2 + · 6 · 2 = 14. 2 1 4. What is the value of Solution:

R1 √ −1

1 − x2 dx?

7

CALCULUS AND SOCIAL JUSTICE

234

√ First note that if we let y = 1 − x2 , square both sides, and add the x2 term to both sides, we get x2 + y 2 = 1 which is the√equation of the unit circle. Keep in mind that the original function in the integral was positive 1 − x2 ; hence, we need to find the area under the top half of the circle. The formula for the area of a circle is area = πr2 . Since the radius of the unit circle is 1, the area of the top half is 21 π(1)2 = 21 π. Thus, Z

1

p 1 1 − x2 dx = area of circle = π. 2 −1

In the previous example, the areas we were required to compute have known formulas. However, if the area does not have Ra pre-derived formula, we must resort to calculus techniques. For example, how could we go 1 about finding 0 x2 dx?

The general procedure for computing the definite integral involves approximating the area under the curve by rectangles (see Fig. 7.2).

7

CALCULUS AND SOCIAL JUSTICE

235

Figure 7.2: In computing the definite integral, we approximate the area under the curve by rectangles. Definite integral animations: https://www.dropbox.com/s/ucfpa8c0vgskt9p/intAnim1.gif?dl=0 https://www.dropbox.com/s/yob5asmrirreqd4/intAnim2.gif?dl=0 Here we present a way of finding the definite integral (area) which employs the indefinite integral (an antiderivative). We first need a connection between the integral and the derivative. This connection is encapsulated in the Fundamental Theorem of Calculus: Theorem 7.2. The Fundamental Theorem of Calculus (FTC) If F is an antiderivative of f , then Z

b

f (x)dx = F (b) − F (a). a

Example 7.17. Use FTC to 1. Evaluate

R5 2

7 dx.

Solution: An antiderivative of f (x) = 7 is F (x) = 7x. Following FTC, we have Z

5

7 dx = F (5) − F (2) = 7(5) − 7(2) = 35 − 14 = 21. 2

2. Calculate

R2 0

f (x) dx if f (x) = 4x.

7

CALCULUS AND SOCIAL JUSTICE

236

Solution: An antiderivative of f (x) = 4x is F (x) = 2x2 (using the power rule). Following FTC, we have Z

2

4x dx = F (2) − F (0) = 2(2)2 − 2(0)2 = 8 − 0 = 8.

0

*Notational Note: We sometimes use the shorthand b F (x) = F (b) − F (a). a

Example 7.18. Use FTC to 1. Find

R3 1

3t + 1 dt.

Solution: Z 1

2. Integrate

R1 0

3



     3 2 3 2 3 2 3 3t + 1 dt = t +t = (3) + (3) − (1) + (1) 2 2 2 1     27 3 = +3 − + 1 = 14. 2 2

x2 dx.

Solution: 1

Z 0

1 1 1 1 1 x2 dx = x3 = (1)3 − (0)3 = . 3 0 3 3 3

Example 7.19. Find the following: 1.

R1 0

x4 dx

Solution: Z 0

2.

R1 0

x − x4 dx

1

1 5 1 1 5 1 5 1 x dx = x = (1) − (0) = = 0.2. 5 0 5 5 5 4

7

CALCULUS AND SOCIAL JUSTICE

237

Solution:       Z 1 1 2 1 5 1 1 2 1 5 1 2 1 5 3 4 x − x dx = x − x = (1) − (1) − (0) − (0) = = 0.3. 2 5 2 5 2 5 10 0 0 3.

Rπ 0

sin x dx

Solution: Z 0

4.

R0

−1 e

t

π

π sin x dx = − cos x = − cos(π) −− cos(0) = −(− 1) + 1 = 2. 0

dt

Solution: 0

Z

0 1 et dt = et = e0 − e−1 = 1 − . e −1 −1

5.

R2

1 1 x

dx Z 1

2

2 1 dx = ln x = ln(2) − ln(1) = ln 2 x 1

Example 7.20. Finding Areas 1. Find the area under f (x) = x between x = 0 and x = 1.

7

CALCULUS AND SOCIAL JUSTICE

238

Solution: Z area = 0

1

1 1 1 1 1 x dx = x2 = (1)2 − (0)2 = . 2 0 2 2 2

2. Find the area under g(x) = x4 between x = 0 and x = 1.

7

CALCULUS AND SOCIAL JUSTICE

239

Solution: Z area = 0

1

1 1 1 1 1 x4 dx = x5 = (1)5 − (0)5 = . 5 0 5 5 5

3. Find the area between f (x) = x and g(x) = x4 (between x = 0 and x = 1).

7

CALCULUS AND SOCIAL JUSTICE

240

Solution: Z

1

Z f (x) dx −

area = 0

1

Z

0

1

Z x dx −

g(x) dx = 0

0

1

x4 dx =

1 1 3 − = . 2 5 10

7

CALCULUS AND SOCIAL JUSTICE

241

Note that the area could equivalently be found by combining the two integrals involved: Z 1 Z 1 Z 1 area = f (x) dx − g(x) dx = [f (x) − g(x)] dx. 0

0

0

Note: The general formula for finding the area between two curves defined by functions f and g (such that the graph of f is above that of g) is Z b area = [f (x) − g(x)] dx a

on the interval [a, b].

7

CALCULUS AND SOCIAL JUSTICE

242

Example 7.21. Area Between Curves Find the area between f (x) = sin x and g(x) = e−x on the interval [1, 2].

Solution: Z

b

Z [f (x) − g(x)] dx =

area = a

2

−x

sin x − e

1



 1 −x 2 e dx = − cos x − −1 1 

  2   = e−x − cos x = e−2 − cos 2 − e−1 − cos 1 ≈ 0.72. 1

7

CALCULUS AND SOCIAL JUSTICE

Class Problem 7.3. Integration 1. Let f (x) = x5 + cos(x). (a) Find an antiderivative of f (x). (b) Find the indefinite integral of f (x). (c) Find the function F (x) whose derivative is f (x) such that F (0) = 1. (d) Find the definite integral of f (x) on [0, π]. R 3 2. Find e0.2t − t−1 dt. 3. Find the area under f (x) = 12 x between x = 2 and x = 5 using (a) known area formulas; (b) calculus. 4. Find the area between f (x) =

1 x

and g(x) = 1 − 21 x on [1, 2].

243

7

CALCULUS AND SOCIAL JUSTICE

244

Economic Inequality and the Gini Index Recall that the further a Lorenz curve sags below the curve of absolute equality, the more unequal the distribution of the resource. Hence, the larger the area between the two curves, the larger the inequality – this is the idea behind the Gini Index.

We wish to use the area between a Lorenz curve (given by r(p)) and the curve of absolute equality (r(p) = p), called the area of inequality, to measure the degree of inequality in a resource distribution: Z 1 area of inequality = [p − r(p)] dp 0

However, we want this measure to be relative to absolute equality. Thus, since the area under the curve of absolute equality is 21 , we will divide the area of inequality by 12 ; this is the Gini Index (GI). So we have R1 Z 1 area of inequality 0 [p − r(p)] dp GI = = =2 [p − r(p)] dp 1/2 1/2 0 Definition 7.6. The Gini Index The Gini index, GI, is a real number between 0 and 1 which measures the degree of inequality in a resource distribution. GI = 0 (0%) corresponds to absolute equality, while GI = 1 (100%) corresponds to absolute inequality. Z 1

[p − r(p)] dp

GI = 2 0

Example 7.22. Calculate the Gini index for the 1970s income distribution with Lorenz curve r(p) = p2 . Solution: 1

 1 2 1 3 1 GI = 2 p − p dp = 2 p − p = 2 3 0 0       1 1 1 1 1 1 1 2 (1)2 − (1)3 − 2 (0)2 − (0)3 = 2 − = ≈ 0.33. 2 3 2 3 2 3 3 Z

2





7

CALCULUS AND SOCIAL JUSTICE

245

Example 7.23. Calculate the Gini index for the 1980s income distribution with Lorenz curve r(p) = p3 . Solution: 1



 1 2 1 4 1 p − p dp = 2 p − p = GI = 2 2 4 0 0       1 2 1 4 1 1 2 1 4 1 1 2 (1) − (1) − 2 (0) − (0) = 2 = ≈ 0.5. − 2 4 2 4 2 4 2 Z

3



Note that the Gini index for the 1980s income distribution is larger than that of the 1970s (GI= 0.33 in the 1970s compared to GI= 0.50 in the 1980s), indicating that the inequality in income distribution was more severe in the 1980s. This agrees with the ESC assessment where we saw ESC= 0.50 for the 1970s and ESC= 0.58 for the 1980s.

7

CALCULUS AND SOCIAL JUSTICE

246

Class Problem 7.4. The Gini Index 1. Using Definition 7.6, integrate the first term to derive an alternative formula for the Gini index. 2. Use the Lorenz curve from Class Problem 7.2 to find the Gini index for the global income distribution in 2011. 3. Consider the 2010 distribution of wealth in the U.S. represented by the Lorenz curve, r(p) = 0.000000015(e18p − 1). Find the Gini index for this distribution. 4. Which distribution exhibits a greater level of inequality, the 2011 world income distribution, or the 2010 distribution of wealth in the U.S.?

7

CALCULUS AND SOCIAL JUSTICE

7.3

247

Exercises

1. Read UMAP Module 61, sections 2.1-2.3. https://www.dropbox.com/s/dqu92004evrst04/UMAP60.PDF?dl=0 2. From UMAP Module 60, Exercises 1, 3, 4, 5, 9(a&b) 3. Watch Political Geometry: Voting districts, compactness, and ideas about fairness talk by Dr. Moon Duchin, Tufts University: https://vimeopro.com/vcubeinc/jmm2018/page/3. Write a 1 page summary. 4. The derivative of a function at an x-value, f 0 (x), describes what feature of the graph of f at x? 5. Suppose your position (distance from your home), y = f (t), with respect to time t is given by the following graph:

Assume you were leaving work 2 miles from home at time t = 0. (a) How fast were you traveling in the first 3 minutes? (b) What direction were you traveling in the first 3 minutes? (c) At what time did you pass by your home? (d) How fast were you traveling between minutes 3 and 5?

7

CALCULUS AND SOCIAL JUSTICE

248

(e) Find f 0 (1). (f) Find f 0 (2.5). (g) Find f 0 (t) for 3 < t < 5.

6.

(a) If f (x) = −142 (for all x), find f 0 (x). (b) If g(x) = −5x + 62, what is the derivative of g? (c) Differentiate 4t7 (with respect to t). (d) Find

dy dx

for the function y = 12 x +

1 . x2

(e) If f (x) = 5ex + x, what is f 0 (x)? What is f 0 (2)? (f) Find

d dx

(3 sin x + ln x).

7. From UMAP Module 61, Exercises 1(a&d), 2, 5, 11(a&b) 8. From UMAP Module 61, Model Exam (section 2.8): 4 (just find the ESC)

9. Read UMAP Module 62, sections 3.1-3.4. https://www.dropbox.com/s/dqu92004evrst04/UMAP60.PDF?dl=0 10. (Refer to Exercise 5(c).) Find an antiderivative of 28t6 . R 11. Find 28t6 dt. 12. (Refer to Exercise 5(f).) Find an antiderivative of 3 cos x + x1 . R 13. Find 3 cos x + x1 dx. 14. Find the indefinite integral of f (x) = 2x + cos x. √ 15. Find f so that f 0 (x) = x and f (1) = 53 . R3 16. (a) Evaluate −1 3 dx. R1 (b) Calculate 0 f (x) dx if f (x) = 2x. R4 (c) Find 2 12 t + 1 dt. R1 17. (a) 0 x6 dx

7

CALCULUS AND SOCIAL JUSTICE R1

(c)

x R0π

(d)

R1

(e)

R2

(b)

π 2

0

− x6 dx

cos t dt

ex dx

1 1 x

+ x dx

18. From UMAP Module 62, Exercises 1, 2, 4 19. From UMAP Module 62, Model Exam (section 3.8): 2, 4 (just find the Gini index for #4)

249