Course Introduction: Data Types, R and RStudio

Published

April 14, 2026

Overview

This week we get our first hands-on experience with R and RStudio. We will look at the different types of data you will encounter in data science, set up our working environment, and take our first steps with R as a tool for data analysis.

Readings

R for Non-Programmers

Slides

Code demo: UN Votes

This is a taste of what R can do. The demo below steps through building a ggplot visualisation of UN General Assembly voting patterns layer by layer — each slide adds one line of code and shows you the result.

Open the interactive flipbook

Don’t worry about understanding every line yet — just enjoy seeing what R can do. When you are ready to run it yourself, download the R script below and try tweaking the countries or issues.

## One step install and load ----
if (!require(pacman)) install.packages("pacman")
pacman::p_load(tidyverse, scales, unvotes)

## Wrangle the data ----
us_uk_turkey_votes <- un_votes |>
  filter(country %in% c("United States", "United Kingdom", "Turkey")) |>
  inner_join(un_roll_calls, by = "rcid", relationship = "many-to-many") |>
  inner_join(un_roll_call_issues, by = "rcid", relationship = "many-to-many") |>
  mutate(year = year(date)) |>
  group_by(country, year, issue) |>
  summarize(
    percent_yes = mean(as.character(vote) == "yes"),
    .groups = "drop"
  )

## Plot the result ----
us_uk_turkey_votes |>
  ggplot() +
  aes(x = year, y = percent_yes, color = country) +
  geom_point(alpha = 0.4) +
  geom_smooth(method = "loess", se = FALSE) +
  facet_wrap(~issue) +
  labs(
    title = "Percentage of 'Yes' votes in the UN General Assembly",
    subtitle = "1946 to 2015",
    y = "% Yes",
    x = "Year",
    color = "Country"
  ) +
  scale_y_continuous(labels = label_percent())

Download the R script

Code demo: World Telephones

Here is a second example that shows how R can be used to explore and visualise a dataset in just a few steps. This one uses a dataset that ships with base R — no packages to install first.

The goal is to visualise how telephone usage changed across different world regions between 1951 and 1961. Along the way you will see two common plot types — a scatter plot and a stacked bar chart — and notice how the same data can tell very different stories depending on how it is displayed.

Again, don’t worry about understanding every line of code yet. Focus on what the output looks like and how it changes at each step.

## Load tidyverse ----
if (!require(pacman)) install.packages("pacman")
pacman::p_load(tidyverse)

## Inspect the data ----
# WorldPhones is a built-in R dataset — no download needed
WorldPhones

## Tidy the data ----
# Convert from a wide table to a long format suitable for ggplot
transformed <- WorldPhones |>
  as_tibble(rownames = NA) |>
  rownames_to_column("Year") |>
  pivot_longer(cols = 2:8, names_to = "Region", values_to = "Count")

transformed

## Scatter plot ----
# Shows the absolute number of telephones per region over time
ggplot(transformed, aes(x = Year, y = Count)) +
  geom_point(aes(color = Region)) +
  labs(
    title = "Number of telephones by world region, 1951–1961",
    x = "Year", y = "Count"
  )

## Stacked bar chart ----
# Shows the proportion of telephones per region over time
ggplot(transformed) +
  geom_bar(aes(x = Year, y = Count, fill = Region),
           stat = "identity", position = "fill") +
  labs(
    title = "Share of world telephones by region, 1951–1961",
    x = "Year", y = "Proportion"
  )

What do the two plots tell you?

The scatter plot shows that North America, Europe, and Asia all grew steadily in absolute terms. The stacked bar chart reveals something the scatter plot hides — the proportional share of telephones shifted over the decade, with Asia and Africa gradually increasing their share relative to other regions.

Same data, different stories. This is one of the key ideas in data visualisation.