library(opendatatoronto)
library(dplyr)
library(tidyr)
library(sf)
library(leaflet)
library(htmltools)
Create interactive 2023 Toronto mayoral election map in R with leaflet
With the help of the internet and ChatGPT, I was able to create this interactive election map showing the top five candidates with the most votes in each ward. It took longer than expected, but it was fun to learn new things, such as working with JSON in R and exploring libraries like sf
and leaflet
. I will include the resources that helped me in the reference section.
The data source is from the City of Toronto Open Data. Initially, I planned to analyze opinion polls, but the aggregated data from polling firms lacked the level of detail I needed. So instead I used the unofficial by-election results from Open Data Toronto, which provided more information and allowed me to create an election map similar to those shown on local news. Although the Toronto municipal election is a direct election, it’s still interesting to see voter preferences in different areas.
This is my first time creating a map, and these notes are based on my understanding. There may be better or more efficient ways to accomplish this. If you notice any errors or have other ideas, please feel free to leave a comment :)
In this exercise, I used the following packages.
Importing data from Open Data Toronto
The unofficial by-election data was published by Open Data Toronto and can be found here. JSON files were available for direct download, and there was even an R library library(opendatatoronto)
for downloading the data, which I used.
The data package contained two JSON files, but I only needed the second one. To import the data, I used tail(1)
to select the second file and get_resource()
to download it. The imported JSON file was a list of three lists, so I converted it to a data frame using the base R function as.data.frame()
.
# Import from open data toronto and convert to data frame
<-
elections list_package_resources("6b1a2631-9b12-4242-a76a-1a707b5c00e4") %>%
tail(1) %>%
get_resource() %>%
as.data.frame()
Flattening and cleaning the data
The output included a nested list in the column office.candidates.ward
, which is common for JSON files. To flatten it into regular columns, I used tidyr::unnest
to expand both rows and columns.
Few more things I’ve done during the data cleaning process:
- Remove unnecessary columns
- Renamed the
name
andnum
variables towardName
andwardNum
respectively, as their names appeared confusing after flattening the list-column - Converted all variables to numeric data type, except for the candidate name and ward name
# Flattening list-column to regular columns
<- elections %>%
elections unnest(office.candidate.ward)
# Remove unnecessary columns and rename ward related columns
<- elections %>%
elections select(7:15) %>%
rename(wardName = name,
wardNum = num)
# Change data types to numeric except candidate and ward
<- elections %>%
elections mutate(across(-c(office.candidate.name, wardName), as.numeric))
After cleaning the data, I used str()
to verify the data structure.
# Verify the updated data types
str(elections)
tibble [2,550 × 9] (S3: tbl_df/tbl/data.frame)
$ office.candidate.name : chr [1:2550] "Olivia Chow" "Olivia Chow" "Olivia Chow" "Olivia Chow" ...
$ office.candidate.votesReceived: num [1:2550] 269372 269372 269372 269372 269372 ...
$ wardName : chr [1:2550] "Etobicoke North" "Etobicoke Centre" "Etobicoke-Lakeshore" "Parkdale-High Park" ...
$ wardNum : num [1:2550] 1 2 3 4 5 6 7 8 9 10 ...
$ polls : num [1:2550] 52 67 82 65 62 52 46 62 54 90 ...
$ pollsReceived : num [1:2550] 52 67 82 65 62 52 46 62 54 90 ...
$ totalVoters : num [1:2550] 70378 89293 103271 82195 76398 ...
$ votesCounted : num [1:2550] 17822 37925 42189 39170 25024 ...
$ votesReceived : num [1:2550] 4972 8049 12424 19569 7399 ...
Now the data frame looks much cleaner and is ready for some fun data wrangling!
Data wrangling to prepare for visualization
Apparently, it is unrealistic to include all of the 102 candidates on the map. Instead, I wanted to show the top five candidates with the most votes in each ward. This can be achieved using group_by()
and slice_max()
. I first grouped the data by ward and candidates to create a variable ward_votes
to sum the number of votes they received in each ward so that I can use it later for the visuals. Then, I grouped the data by ward again and used slice_max()
to select the top five entries within each ward group.
# Filter the top five candidates
<- elections %>%
top_candidates group_by(wardName, office.candidate.name) %>%
mutate(ward_votes = sum(votesReceived)) %>%
group_by(wardName) %>%
slice_max(ward_votes, n = 5)
I also wanted to know the candidate who received the most votes in each ward, so that I could map the district with a colour representing the candidate1. Similarly to the previous step, I used slice_max()
to find the winner.
# Get the winner for each ward
<- top_candidates %>%
ward_winner group_by(wardName) %>%
slice_max(ward_votes)
# Check the number of winners
unique(ward_winner$office.candidate.name)
[1] "Olivia Chow" "Ana Bailão"
By checking the number of winners using unique()
, I confirmed that either Chow or Bailão received the most votes in each of the 25 wards. I then defined their colours as purple and avocado, respectively, as they are the main colours of their websites.
Lastly, I removed all other columns except for the ward and the winner_colour
, as this data frame would be merged with the main data frame later.
# Define colours for each winner
<- ward_winner %>%
ward_winner mutate(winner_colour = if_else(
== "Ana Bailão",
office.candidate.name "#9dbd89",
"#a989bd")) %>%
select(wardName, winner_colour)
Here comes the hard part. For the text labels in the map, I wanted to display the ward name followed by the top five candidates and their corresponding votes. Needless to say, the information should be presented in multiple lines.
However, when I created the map, it didn’t process the <br>
(line break) in the defined label as I expected. After some trial and error and internet search, I discovered that defining the text labels within the data frame and using lapply(names, htmltools::HTML)
seemed to be the only feasible way to display the line break in leaflet map.
Here are the steps to make it work.
- Get and arrange the candidate names. Since we have already identified the top five candidates, I simply grouped the data by ward and arranged the votes in descending order.
- Create the text labels. I created a variable called
names
and concatenated the candidate’s name and their votes within each group. To ensure that the ward name appears only on the first line, I used anifelse
conditional statement to identify the first row (row_number() == 1
). Simply pasting the ward name with the candidates’ names wouldn’t work, as the ward name would appear on each line. - Fine-tune the labels. I added HTML styling such as
<b>
(bold) and<br>
(line break) to improve the aesthetics. As mentioned earlier, we need to applyhtmltools::HTML
for leaflet to effectively process the HTML tags in the map. - Similar to the
ward_winner
data frame, this data frame will also be merged later, so I only kept the ward name and the text label column. It also makes sense to removed other duplicate rows usingdistinct()
, because the information is only meaningful at the ward level.
<- top_candidates %>%
names_label group_by(wardName) %>%
arrange(desc(ward_votes)) %>%
mutate(names = ifelse(
row_number() == 1,
paste("<b>", wardName, "</b><br>", paste(office.candidate.name, ":", votesReceived, collapse = "<br>")),
paste(office.candidate.name, ":", votesReceived)),
collapse = "<br>") %>%
mutate(names = lapply(names, htmltools::HTML)) %>%
distinct(wardName, .keep_all = TRUE) %>%
select(wardName, names)
Now we can load the shapefile for the geometry. It was my first time working with shapefiles, and it turned out to be quite straightforward. The city wards data can be downloaded here from Open Data Toronto. The model is based on the 2018 election, and I believe there haven’t been any changes since then. I used sf::read_sf
to load the shapefile.
<- read_sf("Input/25-ward-model-december-2018-wgs84-latitude-longitude/WARD_WGS84.shp") to_shapes
The last step of data wrangling was to create a merged data frame that included all the information I had collected. I did this by using multiple left_join()
operations.
Once I had the merged data frame top_sf
, I converted it to an sf object so that the geometry information could be read properly.
<-
top_sf left_join(top_candidates, to_shapes,by = c("wardName" = "AREA_NAME")) %>%
left_join(., ward_winner, by = "wardName") %>%
left_join(., names_label, by = "wardName") %>%
st_as_sf()
Creating the interactive map with leaflet
Finally, it’s time to create the interactive map!
Before creating the map, I defined the legend to indicate the candidate who received the most votes in each ward, representing the community preferences. I could utilize the previous data frames, but I got lazy and created a 2x2 tibble for the two candidates.
# Define the legend
<- tibble(lg_labels = c("Olivia Chow",
legends "Ana Bailão"),
lg_colours = c("#a989bd",
"#9dbd89"))
Creating a leaflet map is not very different from using ggplot2
. The official documentation for R is not as detailed compared to ggplot2
, but it still provides helpful information, and it also supports piping.
Here are the steps to create the map with leaflet:
- Set the
leafletOptions()
to control the zoom level within a specified limit - Use
addProviderTiles
to define the tile style for the map. The complete provider set can be viewed here - Use
addPolygons
to map the appearance based on the data. ThefillColor
will represent the colour of the winner in each ward, and thelabels
will be the text labels we created. I also customized the polygons to make it semi-transparent with a smooth white boundary - Add the colour legend and its title using
addLegend
- Finally, print the map
# Create the interactive map
<- leaflet(options = leafletOptions(minZoom = 10, maxZoom = 18)) %>%
map addProviderTiles("CartoDB.Positron") %>%
addPolygons(data = top_sf, fillColor = ~winner_colour,
fillOpacity = 0.2, color = "white", weight = 0.5, smoothFactor = 1,
label = ~names,
labelOptions = labelOptions(textsize = "12px")) %>%
addLegend(position = "bottomright", colors = legends$lg_colours,
labels = legends$lg_labels, title = "The candidate won the most votes")
# Print the map
map
Voilà! There it is!
There were many steps to prepare before actually creating the map, but it was much easier to work with a clean and comprehensive data set instead of multiple data sets with different structures. I tried to incorporate multiple data sets into the map without joining them, but it didn’t work and caused some incorrect mappings.
Overall, I am very happy with the result. With this trial-and-error learning, I hope that the next time will be easier when it comes to the next election :).
The complete R script is here:
Click to expland
Reference
Download the source data of Toronto 2023 mayoral by-election: Open Data Dataset - City of Toronto Open Data Portal
Download the shape file for city wards: Open Data Dataset - City of Toronto Open Data Portal
Working with JSON data: Working with JSON Data
Static 2018 Toronto municipal election maps: RPubs - 2018 Toronto municipal election maps
Leaflet R documentation: Leaflet for R - Introduction
Add line breaks in leaflet label: R and Leaflet: How to arrange label text across multiple lines - Stack Overflow