tryCatch({
data(montreal.bikes, package="animint2")
}, warning=function(w){
remotes::install_github("animint/animint2")
})9 Montreal bikes
In this chapter we will explore several data visualizations of the Montreal bike data set.
Chapter outline:
- We begin with some static data visualizations.
- We create an interactive visualization of accident frequency over time.
- We create a interactive data viz with four plots, showing monthly accident trends, daily details, and a map of counter locations.
9.1 Static figures
We begin by loading the montreal.bikes data set, which is not available in the CRAN release of animint2, in order to save space on CRAN. Therefore to access this data set, you will need to install animint2 from GitHub:
The data are two time series:
-
montreal.bikes$counter.countsare daily counts of bikers from counting machines, with one row per combination of location and day. -
montreal.bikes$accidentshas one row per accident.
We will compute monthly summaries of these two time series.
9.1.1 Counters
We begin by examining the data table of counts.
month_str <- function(POSIXct)strftime(POSIXct, "%Y-%m")
library(data.table)
data(montreal.bikes, package="animint2")#works if installed from github
(counts_dt <- data.table(montreal.bikes$counter.counts)[, .(
location, date, month.str=month_str(date), count)]) location date month.str count
1: Berri 2009-01-01 05:00:00 2009-01 29
2: Berri 2009-01-02 05:00:00 2009-01 19
---
13382: Totem_Laurier 2013-09-17 04:00:00 2013-09 3745
13383: Totem_Laurier 2013-09-18 04:00:00 2013-09 3921
Above we see one row for each combination of location and date. The bike counts are time series data which we visualize below.
counts_dt[, loc.lines := gsub("[- _]", "\n", location)]
library(animint2)
ggplot()+
theme_bw()+
theme(panel.margin=grid::unit(0, "lines"))+
facet_grid(loc.lines ~ .)+
geom_point(aes(
date, count, color=count==0),
data=counts_dt)+
scale_color_manual(values=c("TRUE"="grey", "FALSE"="black"))Warning: Removed 407 rows containing missing values (geom_point).

In the figure above, we clearly see the seasonal regularity (fewer bikers in winter). It is also easy to see the difference between zeros and missing values.
9.1.2 Accidents
Next we examine one row from the accidents data table.
montreal.bikes$accidents[1,] date.str time.str deaths people.severely.injured people.slightly.injured
1 2012-01-02 18:35 0 0 1
street.number street cross.street location.int position.int
1 NA ST JEAN BAPTISTE O AV ROULEAU 32 6
position location
1 Voie de circulation En intersection (moins de 5 mètres)
Each accident has data about its date, time, location, and counts of death and slight/severe injury. Some of the values are in French (for example, Voie de circulation, En intersection, etc). For the injury count columns, we create abbreviated column names using the code below.
severity <- c(
deaths="deaths",
severe="people.severely.injured",
slight="people.slightly.injured")
montreal.bikes$accidents[, names(severity)] <-
montreal.bikes$accidents[, severity]
accidents_dt <- data.table(montreal.bikes$accidents[, c(
"date.str", "time.str", names(severity),
"street", "street.number", "cross.street")])In the code below, we add a column for the month.
ymd2POSIXct <- function(date.str){
as.POSIXct(strptime(date.str, "%Y-%m-%d"))
}
(accidents_dt[
, date := ymd2POSIXct(date.str)
][
, month.str := month_str(date)
][]) date.str time.str deaths severe slight street street.number
1: 2012-01-02 18:35 0 0 1 ST JEAN BAPTISTE O NA
2: 2012-01-05 21:50 0 0 1 FOSTER NA
---
5594: 2014-12-27 12:35 0 0 1 CH DES PATRIOTES NA
5595: 2014-12-30 11:55 0 0 1 PIERREFONDS BD 14965
cross.street date month.str
1: AV ROULEAU 2012-01-02 2012-01
2: JANELLE 2012-01-05 2012-01
---
5594: 1RE RUE 2014-12-27 2014-12
5595: JACQUES BIZARD 2014-12-30 2014-12
In the output above, we see that the last months for the accidents are not the same as for the counter data. We compare the time intervals using the code below:
accidents counts
[1,] "2012-01" "2009-01"
[2,] "2014-12" "2013-09"
The output above shows that the accidents start and finish after the counts time series. Next, to compute a summary for each month, we begin by computing the unique values of month in the range of the two data sets.
uniq.month.vec <- unique(c(
accidents_dt$month.str,
counts_dt$month.str))
month_01 <- function(mois)ymd2POSIXct(paste0(mois, "-01"))
month_dt <- data.table(month.01 = month_01(uniq.month.vec))The code below defines the locale so that we can be sure to have the month names in English.
old.locale <- Sys.setlocale(locale="en_US.UTF-8")
month_english <- function(POSIXct)strftime(POSIXct, "%B %Y")
month_dt[, month.english := month_english(month.01)][] month.01 month.english
1: 2012-01-01 January 2012
2: 2012-02-01 February 2012
---
71: 2011-11-01 November 2011
72: 2011-12-01 December 2011
The output above contains a row for each month. Note that we have created the month.english variable which will be used for month selection. In the code below, we compute the total number of accidents of each type per month.
month.str deaths severe slight
1: 2012-01 1 0 10
2: 2012-02 0 0 20
---
35: 2014-11 1 2 69
36: 2014-12 0 0 10
The result above shows one row per month, with different columns for each level of severity. We reshape those columns into long format using the code below.
month.str severity people
1: 2012-01 deaths 1
2: 2012-02 deaths 0
---
107: 2014-11 slight 69
108: 2014-12 slight 10
Above we see one row for each combination of month and severity. In the code below, we use these data to plot the number of accidents in each month.
severity.colors <- c(
slight="#FEE0D2",#dark red
severe="#FB6A4A",
deaths="#A50F15")#lite red
ggplot()+
theme_bw()+
geom_bar(aes(
month_01(month.str), people, fill=severity),
stat="identity",
data=accidents.tall)+
scale_fill_manual(
values=severity.colors, breaks=names(severity.colors))+
scale_x_datetime("month")
The figure above is a time series of the number of deaths and injuries. The output above shows that accidents with only slight injuries are most frequent, and accidents with at least one death are least frequent.
9.2 Interactive viz of accident frequency
In this section, we want to compare, for each month, the data for counters and accidents. Do we have more accidents when there are more bikes on the road? To check, we will fit a regression model, where each observation is a month with both kinds of data. First, we compute a summary of counts per month using the code below.
location month.str count_length count_mean count_sum
1: Berri 2009-01 31 100.3226 3110
2: Berri 2009-02 28 159.6786 4471
---
441: Totem_Laurier 2013-08 31 3162.7097 98044
442: Totem_Laurier 2013-09 18 2888.7778 51998
The output above has one row per combination of location and month, with columns for:
-
count_length: the number of days. -
count_mean: the mean count per day. -
count_sum: the total number of counts in that month.
We can see that some months have missing days. For example, there are only 18 days for Totem_Laurier in September 2013. To model only whole months, we would like to remove months with missing days. Therefore, we use the code below to compute the number of days in each month:
one.day <- 60 * 60 * 24
next_month <- function(POSIXct)month_01(POSIXct + one.day * 31)
counts.per.month[, days.in.month := as.integer(round(difftime(
month_01(month_str(next_month(month_01(month.str)))),
month_01(month.str),
units="days")))][] location month.str count_length count_mean count_sum days.in.month
1: Berri 2009-01 31 100.3226 3110 31
2: Berri 2009-02 28 159.6786 4471 28
---
441: Totem_Laurier 2013-08 31 3162.7097 98044 31
442: Totem_Laurier 2013-09 18 2888.7778 51998 30
The output above contains the new column days.in.month. We use the code below to print only the months with missing days:
counts.per.month[
count_length < days.in.month,
.(location, month.str, count_length, days.in.month)] location month.str count_length days.in.month
1: Berri 2009-04 29 30
2: Berri 2011-11 3 30
---
22: Rachel 2013-09 18 30
23: Totem_Laurier 2013-09 18 30
The data shown above are excluded from regression analysis using the code below.
complete.months <- counts.per.month[count_length == days.in.month]Next, we make a table with counts and accidents in different columns:
city.wide.complete <- complete.months[count_sum>0, .(
locations=.N,
total.counts=sum(count_sum)
), keyby=month.str]
city.wide.accidents <- accidents_dt[, .(
total.accidents=.N
), keyby=month.str]
(scatter.not.na <- city.wide.accidents[
city.wide.complete, nomatch=0L
][, month.01 := month_01(month.str)][]) month.str total.accidents locations total.counts month.01
1: 2012-01 11 7 20386 2012-01-01
2: 2012-02 19 7 26727 2012-02-01
---
17: 2013-07 315 8 916662 2013-07-01
18: 2013-08 326 8 856066 2013-08-01
The output above shows one row per month with both count and accident data. Next, we fit a linear model which uses counts to predict accidents.
(fit <- lm(total.accidents ~ total.counts - 1, scatter.not.na))
Call:
lm(formula = total.accidents ~ total.counts - 1, data = scatter.not.na)
Coefficients:
total.counts
0.0003723
scatter.not.na[, mean(total.accidents/total.counts)][1] 0.0003847625
[1] 0.0003693805
The output above shows that the estimated linear model coefficient is similar to the estimated empirical means. Finally, we use the code below to create an interactive graphic.
scatter.not.na[, let(
pred.accidents = predict(fit),
month.english = month_english(month.01)
)]
animint(
regression=ggplot()+
theme_bw()+
ggtitle("Numbers of accidents and cyclists")+
geom_line(aes(
total.counts, pred.accidents),
color="grey",
data=scatter.not.na)+
geom_point(aes(
total.counts, total.accidents),
clickSelects="month.english",
size=5,
alpha=0.75,
data=scatter.not.na)+
ylab("Total bike accidents (all Montreal locations)")+
xlab("Total cyclists (all Montreal locations)"),
timeSeries=ggplot()+
theme_bw()+
ggtitle("Time series of accident frequency")+
xlab("Month")+
geom_point(aes(
month.01, total.accidents/total.counts),
clickSelects="month.english",
size=5,
alpha=0.75,
data=scatter.not.na))The data viz above shows two data visualizations of city-wide accident frequency over time.
- The first plot shows that the number of accidents grows with the number of cyclists.
- The second plot shows the frequency of accidents over time.
9.3 Interactive viz with map and details
In this section, we will create a visualization with several linked plots:
- Summary of counters: map of counters or min/max dates for each counter, to select a counter.
- Details of a counter, summary of months: time series of monthly totals for counters and accidents. Click to select a month.
- Details of a counter and a month: time series of daily totals, for the selected month.
9.3.1 Counter summary with map
Before examining the data table of counter locations, we first convert the name variable to unicode strings:
(counter.locations <- data.table(montreal.bikes$counter.locations)[, .(
lon = coord_X, lat = coord_Y,
nom_comptage=iconv(nom_comptage, "latin1", "UTF-8"))]) lon lat nom_comptage
1: -73.58888 45.51955 Saint-Urbain
2: -73.57398 45.52741 Brebeuf
---
20: -73.58221 45.51370 Parc U-Zelt Test
21: -73.60311 45.52782 Saint-Laurent U-Zelt Test
In the output above, we see that the nom_comptage column indicates the location of the counter, but the values are not exactly the same as in the location column in the table of counts. We use the code below to establish correspondence between the names in the two tables.
loc.name.code <- c(
Berri1="Berri",
Brebeuf="Brébeuf",
CSC="Côte-Sainte-Catherine",
Maisonneuve_1="Maisonneuve 1",
Maisonneuve_2="Maisonneuve 2",
Parc="du Parc",
PierDup="Pierre-Dupuy",
"Rachel/Papineau"="Rachel",
"Saint-Urbain"="Saint-Urbain",
Totem_Laurier="Totem_Laurier")
(show.locations <- counter.locations[
, location := loc.name.code[nom_comptage]
][!is.na(location)]) lon lat nom_comptage location
1: -73.58888 45.51955 Saint-Urbain Saint-Urbain
2: -73.57398 45.52741 Brebeuf Brébeuf
---
9: -73.58883 45.52777 Totem_Laurier Totem_Laurier
10: -73.56284 45.51613 Berri1 Berri
The output above shows the geographical position of each counter. The counter locations above are plotted below.
map.lim <- show.locations[, lapply(.SD, range), .SDcols=c("lat","lon")]
diff.vec <- sapply(map.lim, diff)
diff.mat <- c(-1, 1) * matrix(diff.vec, 2, 2, byrow=TRUE)
scale.mat <- as.matrix(map.lim) + diff.mat
bike.paths <- data.table(montreal.bikes$path.locations)
show.paths <- bike.paths[(
lat %between% scale.mat[, "lat"]
) & (
lon %between% scale.mat[, "lon"]
)]
(mtl.map <- ggplot()+
theme_bw()+
theme(
panel.margin=grid::unit(0, "lines"),
axis.line=element_blank(), axis.text=element_blank(),
axis.ticks=element_blank(), axis.title=element_blank(),
panel.background = element_blank(),
panel.border = element_blank())+
coord_cartesian(xlim=map.lim$lon, ylim=map.lim$lat)+
scale_x_continuous(limits=map.lim$lon)+
scale_y_continuous(limits=map.lim$lat)+
geom_path(aes(
lon, lat,
tooltip=TYPE_VOIE,
group=paste(feature.i, path.i)),
color="grey",
data=show.paths)+
geom_text(aes(
lon, lat,
label=location),
clickSelects="location",
data=show.locations))Warning: Removed 96 rows containing missing values (geom_path).

The figure above shows a map of Montreal, with text for each of the ten counters.
9.3.2 Summary of extreme dates for each counter
In this section, we compute the min and max dates for each counter.
location month.01_min month.01_max
1: Berri 2009-01-01 2013-09-01
2: Brébeuf 2009-07-01 2010-11-01
---
9: Saint-Urbain 2009-01-01 2010-11-01
10: Totem_Laurier 2013-02-01 2013-09-01
The output above shows a row for each counter, with columns for the min and max dates observed. The plot below shows the time period that each counter was in operation.
location.colors <- c(#dput(RColorBrewer::brewer.pal(12, "Set3"))
"#8DD3C7", "grey50", "#BEBADA", "#FB8072", "#80B1D3", "#FDB462",
"#B3DE69", "#FCCDE5", "#D9D9D9", "#BC80BD", "#CCEBC5", "#FFED6F")
names(location.colors) <- show.locations$location
seg.size <- 10
(CounterRanges <- ggplot()+
theme_bw()+
xlab("min/max dates")+
ylab("source de données")+
scale_color_manual(values=location.colors)+
guides(color="none")+
geom_segment(aes(
month.01_min, location,
xend=month.01_max, yend=location),
showSelected="location",
data=location.ranges,
size=seg.size+2)+
geom_segment(aes(
month.01_min, location,
xend=month.01_max, yend=location,
color=location),
clickSelects="location",
data=location.ranges,
size=seg.size))
The figure above shows a segment for each counter. With the code below, we add a segment to represent the date range of accidents.
accidents.range <- dcast(
data.table(lieu="accidents", accidents_dt),
lieu ~ .,
list(min, max),
value.var="date")
(MonthSummary <- CounterRanges+
geom_segment(aes(
date_min, lieu,
xend=date_max, yend=lieu),
color=severity.colors[["deaths"]],
data=accidents.range,
size=seg.size))
In the figure above, we see another segment (for the accidents at the bottom).
9.3.3 Monthly time series
The code below is used to plot the count data time series.
ggplot()+
theme_bw()+
geom_line(aes(
date, count, group=location),
data=counts_dt)+
scale_color_manual(values=location.colors)+
geom_point(aes(
date, count, color=location),
data=counts_dt)Warning: Removed 407 rows containing missing values (geom_point).

The figure below visualizes the same count data, but summarized for each month.
FACET <- function(DT, facet)data.table(DT, facet)
COMPTEURS <- function(DT)FACET(DT, "counts/day")
(MonthSeries <- ggplot()+
guides(color="none")+
theme_bw()+
facet_grid(facet ~ ., scales="free")+
geom_tallrect(aes(
xmin=month.01-15*one.day, xmax=month.01+15*one.day),
clickSelects="month.english",
data=month_dt,
alpha=1/2)+
geom_line(aes(
month_01(month.str), count_mean, group=location,
color=location),
showSelected="location",
clickSelects="location",
data=COMPTEURS(counts.per.month))+
scale_color_manual(values=location.colors)+
xlab("month")+
ylab(""))
The figure above shows a curve for each counter. The code below adds two geoms.
month.text <- counts.per.month[
, .SD[which.max(count_mean)]
, by=location]
(MonthText <- MonthSeries+
geom_point(aes(
month_01(month.str), count_mean, color=location,
tooltip=paste(
count_mean, "bikes at",
location, "in", month_english(month_01(month.str)))),
showSelected="location",
clickSelects="location",
size=5,
data=COMPTEURS(counts.per.month))+
geom_text(aes(
month_01(month.str), count_mean+300,
color=location, label=location),
showSelected="location",
clickSelects="location",
data=COMPTEURS(month.text)))
The code below adds the accident data in another facet.
ACCIDENTS <- function(DT)FACET(DT, "accidents")
(MonthFacet <- MonthText+
facet_grid(facet ~ ., scales="free")+
scale_fill_manual(
values=severity.colors, breaks=names(severity.colors))+
geom_bar(aes(
month_01(month.str), people,
fill=severity),
showSelected="severity",
stat="identity",
position="identity",
color=NA,
data=ACCIDENTS(accidents.tall[order(-severity)])))
The figure above shows a panel for each of the data types (accidents and counts).
9.3.4 Details for a month
The goal in this section is to create a dotplot of accidents for each month, where each dot represents one person in an accident. In each accident, there are counts of people who died, along with people who suffered severe and slight injuries. Below we classify the severity of each accident according to the worst outcome among the people affected.
severity
slight severe deaths
5262 289 44
The result above shows that minor injuries are most frequent, and deaths are least frequent. In the code below, we create the accident.i variable, which serves to number the accidents in a day.
day_in_month <- function(POSIXct)as.integer(strftime(POSIXct, "%d"))
add_day_month <- function(DT)DT[, let(
day.in.month = day_in_month(date),
month.english = month_english(date))]
accidents.cumsum <- add_day_month(accidents_dt[
order(date, -severity)
][
, accident.i := seq_len(.N)
, by=date
])
ggplot()+
theme_bw()+
theme(panel.margin=grid::unit(0, "cm"))+
facet_wrap("month.str")+
scale_fill_manual(values=severity.colors)+
scale_x_continuous("day in month", breaks=c(1, 10, 20, 30))+
geom_point(aes(
day.in.month, accident.i, fill=severity),
data=accidents.cumsum)
The figure above shows a circle for each accident. In the code below, we create a grid of days.
date weekday
1: 2009-01-01 Thu
2: 2009-01-02 Fri
---
2191: 2014-12-31 Wed
2192: 2015-01-01 Thu
The output above shows one row per day in the period of observed data. The code below creates a table to highlight weekends.
(weekend.dt <- add_day_month(days.dt[
grepl("Sat|Sun", weekday)])[]) date weekday day.in.month month.english
1: 2009-01-03 Sat 3 January 2009
2: 2009-01-04 Sun 4 January 2009
---
625: 2014-12-27 Sat 27 December 2014
626: 2014-12-28 Sun 28 December 2014
The output above has one row per weekend day. Next, we create a table to visualize the name of each location.
add_day_month(counts_dt)
(day.text <- counts_dt[
, .SD[which.max(count)]
, by=.(location, month.english)]) location month.english date month.str count
1: Berri January 2009 2009-01-11 05:00:00 2009-01 318
2: Berri February 2009 2009-02-18 05:00:00 2009-02 326
---
441: Totem_Laurier August 2013 2013-08-21 04:00:00 2013-08 4293
442: Totem_Laurier September 2013 2013-09-18 04:00:00 2013-09 3921
loc.lines day.in.month
1: Berri 11
2: Berri 18
---
441: Totem\nLaurier 21
442: Totem\nLaurier 18
The output above shows the day with the max count, for each location and month. Next, we use the code below to plot the daily count data.
(DaysCounters <- ggplot()+
geom_tallrect(aes(
xmin=day.in.month-0.5, xmax=day.in.month+0.5,
key=paste(date)),
showSelected="month.english",
fill="grey",
color="white",
data=weekend.dt)+
guides(color="none", fill="none")+
theme_bw()+
facet_grid(facet ~ ., scales="free")+
geom_line(aes(
day.in.month, count, group=location,
key=location, color=location),
showSelected=c("location", "month.english"),
clickSelects="location",
chunk_vars=c("month.english"),
data=COMPTEURS(counts_dt))+
scale_color_manual(values=location.colors)+
ylab("")+
geom_point(aes(
day.in.month, count, color=location,
key=paste(day.in.month, location),
tooltip=paste(
count, "cyclistes à",
location, "en",
date)),
showSelected=c("location", "month.english"),
clickSelects="location",
size=5,
chunk_vars=c("month.english"),
fill="white",
data=COMPTEURS(counts_dt)))Warning: Removed 407 rows containing missing values (geom_point).

The figure above is over-plotted, because it shows all of the months at the same time, whereas only one month will be shown in the interactive version. The code below adds the accident data.
(DaysFacet <- DaysCounters+
scale_fill_manual(
values=severity.colors, breaks=names(severity.colors))+
geom_text(aes(
15, 23, label=month.english, key=1),
showSelected="month.english",
data=ACCIDENTS(month_dt))+
scale_x_continuous("Day in month", breaks=c(1, 10, 20, 30))+
geom_point(aes(
day.in.month, accident.i,
key=paste(date.str, accident.i),
fill=severity),
showSelected=c("severity","month.english"),
size=4,
chunk_vars="month.english",
data=ACCIDENTS(accidents.cumsum)))Warning: Removed 407 rows containing missing values (geom_point).

The figure above has a new panel for accident data on top.
9.3.5 Interactive graphic
Finally, we combine the previous ggplots into an interactive data visualization using the code below.
animint(
MonthFacet+
ggtitle("All data, select month"),
DaysFacet+
ggtitle("Selected month (weekends in grey)")+
geom_label_aligned(aes(
day.in.month, count+1500, color=location, label=location,
key=location),
showSelected=c("location", "month.english"),
clickSelects="location",
data=COMPTEURS(day.text))+
theme_animint(last_in_row=TRUE),
MonthSummary+theme_animint(width=450, height=250),
mtl.map+theme_animint(height=250),
selector.types=list(severity="multiple"),
duration=list(month.english=2000),
first=list(
location="Maisonneuve 2",
month.english="July 2012"))The visualization above contains 4 plots:
- Upper left: time series with summary data for each month.
- Upper right: time series of daily details for selected month.
- Bottom left: min and max dates for each data source.
- Bottom right: counter locations on the Montreal map.
9.4 Chapter summary and exercises
We have explored several methods for visualizing time series data of bike counts and accidents in Montreal.
Exercises:
- Change
locationto a multiple selection variable. - On the map, draw a circle for each location, with size that changes based on the
countof the accidents in the currently selectedmonth. - On the
MonthSummaryplot, add background rectangles that can be used to select themonth. - Remove the
MonthSummaryplot and add a similar visualization as a third panel in theMonthFacetplot. - In
DaysFacet, addaes(tooltip)with details of each accident (address, number of people involved, etc).
Next, Chapter 10 explains how to visualize the K-Nearest-Neighbors machine learning model.