Assignment 5: Under (blood) pressure
Raymond Guo
2020-02-19
Exercise 1
blood_pressure %>%
gather(Age:Pulse, key = "measurement", value = "value") %>%
ggplot() +
geom_point(mapping = aes(x = value, y = Systol)) +
facet_wrap(~ measurement, scales = "free_x")
Age Calf Chin
160
140
120
20 30 40 50 0 5 10 15 20 5.0 7.5 10.0
Forearm Height Pulse
160
Systol
140
120
2.5 5.0 7.5 10.0 12.5 1500 1550 1600 1650 50 60 70 80 90
Weight Years
160
140
120
60 70 80 0 10 20 30 40
value
Exercise 2
THe years graph shows a negative correlation.
The variables that show a positive correlation are Forearm, Weight, Calf, and Height.
1
Exercise 3
blood_pressure_updated <- blood_pressure%>%
mutate(urban_frac_life = Years / Age)
Exercise 4
systol_urban_frac_model <- lm(Systol ~ urban_frac_life, data = blood_pressure_updated
)
Exercise 5
systol_urban_frac_model %>%
tidy()
term estimate [Link] statistic [Link]
(Intercept) 133.49572 4.038011 33.059770 0.0000000
urban_frac_life -15.75182 9.012962 -1.747686 0.0888139
systol_urban_frac_model %>%
glance() %>%
select([Link])
[Link]
0.0762564
Exercise 6
systol_urban_frac_df <- blood_pressure_updated %>%
add_predictions(systol_urban_frac_model) %>%
add_residuals(systol_urban_frac_model)
i. The column that holds the response value is pred
ii. The column that holds the residuals is resid
Exercise 7
We can tell if it is reliable if the dependent variable Y has a linear relationship to the independent
variable X.
ggplot(systol_urban_frac_df) +
geom_point(mapping = aes(x = urban_frac_life, y = Systol)) +
geom_abline(slope = systol_urban_frac_model$coefficients[2], intercept = systol_urban_frac_mo
2
160
Systol
140
120
0.00 0.25 0.50 0.75
urban_frac_life
Exercise 8
ggplot(systol_urban_frac_df) +
geom_point(mapping = aes(pred, Systol)) +
geom_abline(
slope = 1,
intercept = 0,
color = "red",
size = 1
)
160
Systol
140
120
120 125 130
pred
ggplot(systol_urban_frac_df) +
geom_point(aes(pred, resid)) +
geom_ref_line(h = 0)
3
30
20
resid
10
−10
120 125 130
pred
i. The plots suggest the condition was not violated. There is no curve even remotely shown.
ii. The plots suggest the condition was not violated. There is an equilibrium of the points from
above and below the line. ## Exercise 9
ggplot(data = systol_urban_frac_df) +
geom_histogram(
mapping=aes(x = resid), binwidth = 5
)
10.0
7.5
count
5.0
2.5
0.0
−20 −10 0 10 20 30 40
resid
i. It looks very right skewed and the center is around the value 5 of resid.
ii. The skewed nature of the bell violated the nearly normal residuals because there is a dis-
proportion amount of negative residual values compared to the positive ones ## Exercise
10
ggplot(data = systol_urban_frac_df) +
geom_qq(mapping = aes(sample = resid)) +
geom_qq_line(mapping = aes(sample = resid))
4
40
20
sample
0
−20
−2 −1 0 1 2
theoretical
This graph clearly shows a violation within the nearly normal residual condition. There are more
points plotted above the linear line than below which explains the right skewed image of the bell
shape curve.
Exercise 11
systol_weight_model <- lm(Systol ~ Weight, data = blood_pressure_updated
)
systol_weight_model %>%
glance() %>%
select([Link])
[Link]
0.2718207
Yes, because [Link] is closer to 1 compared to what urban_frac_life can provide.
systol_weight_df <- blood_pressure_updated %>%
add_predictions(systol_urban_frac_model) %>%
add_residuals(systol_urban_frac_model)
ggplot(systol_weight_df) +
geom_point(mapping = aes(pred, Systol)) +
geom_abline(
slope = 1,
intercept = 0,
color = "red",
size = 1
)
5
160
Systol
140
120
120 125 130
pred
ggplot(systol_weight_df) +
geom_point(aes(pred, resid)) +
geom_ref_line(h = 0)
30
20
resid
10
−10
120 125 130
pred
For the first condition, there is a linear relationship between pred and Systol. For the second
condition, there is a single outliear on the graph which might alter the bell shape curve but not
much. This is not much of a violation. For the third condition, the points around h = 0 looks like
an equilibrium. All three conditions are met so the new model is reliable.
Exercise 12
systol_combo_model <- lm(Systol ~ urban_frac_life + Weight, data = blood_pressure_updated)
systol_combo_model_df <- blood_pressure_updated %>%
add_predictions(systol_urban_frac_model) %>%
add_residuals(systol_urban_frac_model)
6
ggplot(systol_combo_model_df) +
geom_point(mapping = aes(pred, Systol)) +
geom_abline(
slope = 1,
intercept = ,
color = "red",
size = 1
)
160
Systol
140
120
120 125 130
pred
ggplot(systol_combo_model_df) +
geom_point(aes(pred, resid)) +
geom_ref_line(h = 0)
30
20
resid
10
−10
120 125 130
pred
For the first condition, there is a linear relationship between pred and Systol. For the second
condition, there is a single outliear on the graph which might alter the bell shape curve but not
much. This is not much of a violation. For the third condition, the points around h = 0 looks like
an equilibrium. All three conditions are met so the new model is reliable.
7
systol_combo_model %>%
glance() %>%
select([Link])
[Link]
0.4731078
This mult-variable system performed better because [Link] got closer to the value 1 compared
single-variable.