PS06: Multiple Regression

Published

June 1, 2026

Overview

Practice multiple regression with both numerical and categorical predictors — including parallel slopes models, interaction models, model comparison using R², and residual analysis. The problem set has two parts: a guided analysis of US state income data, and an independent analysis of vole habitat preferences.

Read Chapter 6 of ModernDive before attempting this problem set.


Download

Download the problem set template, open it in RStudio, and complete the exercises directly in the document.

PS06.zip


Setup

Run this at the top of your document to install and load the required packages:

if (!require(pacman)) install.packages("pacman")
pacman::p_load(ggplot2, dplyr, moderndive, readr)

Exercises

Part 1: Income, education, and urbanization

Model median household income across US states using high-school education rate and urbanization level as predictors. Compare parallel slopes and interaction models.

Part 2: Vole habitat

Apply multiple regression independently to ecological count data, using vegetation cover and soil type to predict vole populations.


Saving your plots

Save any plots you create to the figures/ folder using ggsave(). Use descriptive file names that reflect the content of the plot:

income_plot <- ggplot(data = hate_crimes, aes(x = hs, y = income, color = urbanization)) +
  geom_point()

ggsave("figures/income-vs-hs-by-urbanization.png", plot = income_plot,
       width = 16/2, height = 9/2)

When you are done, render to HTML and submit on Moodle. Name your file PS06_yourname.html.