PS06: Multiple Regression
Overview
Practice multiple regression with both numerical and categorical predictors — including parallel slopes models, interaction models, model comparison using R², and residual analysis. The problem set has two parts: a guided analysis of US state income data, and an independent analysis of vole habitat preferences.
Read Chapter 6 of ModernDive before attempting this problem set.
Download
Download the problem set template, open it in RStudio, and complete the exercises directly in the document.
Setup
Run this at the top of your document to install and load the required packages:
if (!require(pacman)) install.packages("pacman")
pacman::p_load(ggplot2, dplyr, moderndive, readr)Exercises
Part 1: Income, education, and urbanization
Model median household income across US states using high-school education rate and urbanization level as predictors. Compare parallel slopes and interaction models.
Part 2: Vole habitat
Apply multiple regression independently to ecological count data, using vegetation cover and soil type to predict vole populations.
Saving your plots
Save any plots you create to the figures/ folder using ggsave(). Use descriptive file names that reflect the content of the plot:
income_plot <- ggplot(data = hate_crimes, aes(x = hs, y = income, color = urbanization)) +
geom_point()
ggsave("figures/income-vs-hs-by-urbanization.png", plot = income_plot,
width = 16/2, height = 9/2)When you are done, render to HTML and submit on Moodle. Name your file PS06_yourname.html.