9  Montreal bikes

In this chapter we will explore several data visualizations of the Montreal bike data set.

Chapter outline:

9.1 Static figures

We begin by loading the montreal.bikes data set, which is not available in the CRAN release of animint2, in order to save space on CRAN. Therefore to access this data set, you will need to install animint2 from GitHub:

tryCatch({
  data(montreal.bikes, package="animint2")
}, warning=function(w){
  remotes::install_github("animint/animint2")
})

The data are two time series:

  • montreal.bikes$counter.counts are daily counts of bikers from counting machines, with one row per combination of location and day.
  • montreal.bikes$accidents has one row per accident.

We will compute monthly summaries of these two time series.

9.1.1 Counters

We begin by examining the data table of counts.

month_str <- function(POSIXct)strftime(POSIXct, "%Y-%m")
library(data.table)
data(montreal.bikes, package="animint2")#works if installed from github
(counts_dt <- data.table(montreal.bikes$counter.counts)[, .(
  location, date, month.str=month_str(date), count)])
            location                date month.str count
    1:         Berri 2009-01-01 05:00:00   2009-01    29
    2:         Berri 2009-01-02 05:00:00   2009-01    19
   ---                                                  
13382: Totem_Laurier 2013-09-17 04:00:00   2013-09  3745
13383: Totem_Laurier 2013-09-18 04:00:00   2013-09  3921

Above we see one row for each combination of location and date. The bike counts are time series data which we visualize below.

counts_dt[, loc.lines := gsub("[- _]", "\n", location)]
library(animint2)
ggplot()+
  theme_bw()+
  theme(panel.margin=grid::unit(0, "lines"))+
  facet_grid(loc.lines ~ .)+
  geom_point(aes(
    date, count, color=count==0),
    data=counts_dt)+
  scale_color_manual(values=c("TRUE"="grey", "FALSE"="black"))
Warning: Removed 407 rows containing missing values (geom_point).

In the figure above, we clearly see the seasonal regularity (fewer bikers in winter). It is also easy to see the difference between zeros and missing values.

9.1.2 Accidents

Next we examine one row from the accidents data table.

montreal.bikes$accidents[1,]
    date.str time.str deaths people.severely.injured people.slightly.injured
1 2012-01-02    18:35      0                       0                       1
  street.number             street cross.street location.int position.int
1            NA ST JEAN BAPTISTE O   AV ROULEAU           32            6
             position                            location
1 Voie de circulation En intersection (moins de 5 mètres)

Each accident has data about its date, time, location, and counts of death and slight/severe injury. Some of the values are in French (for example, Voie de circulation, En intersection, etc). For the injury count columns, we create abbreviated column names using the code below.

severity <- c(
  deaths="deaths",
  severe="people.severely.injured",
  slight="people.slightly.injured")
montreal.bikes$accidents[, names(severity)] <-
  montreal.bikes$accidents[, severity]
accidents_dt <- data.table(montreal.bikes$accidents[, c(
  "date.str", "time.str", names(severity),
  "street", "street.number", "cross.street")])

In the code below, we add a column for the month.

ymd2POSIXct <- function(date.str){
  as.POSIXct(strptime(date.str, "%Y-%m-%d"))
}
(accidents_dt[
, date := ymd2POSIXct(date.str)
][
, month.str := month_str(date)
][])
        date.str time.str deaths severe slight             street street.number
   1: 2012-01-02    18:35      0      0      1 ST JEAN BAPTISTE O            NA
   2: 2012-01-05    21:50      0      0      1             FOSTER            NA
  ---                                                                          
5594: 2014-12-27    12:35      0      0      1   CH DES PATRIOTES            NA
5595: 2014-12-30    11:55      0      0      1     PIERREFONDS BD         14965
        cross.street       date month.str
   1:     AV ROULEAU 2012-01-02   2012-01
   2:        JANELLE 2012-01-05   2012-01
  ---                                    
5594:        1RE RUE 2014-12-27   2014-12
5595: JACQUES BIZARD 2014-12-30   2014-12

In the output above, we see that the last months for the accidents are not the same as for the counter data. We compare the time intervals using the code below:

data.list <- list(accidents=accidents_dt, counts=counts_dt)
sapply(data.list, function(DT)range(DT$month.str))
     accidents counts   
[1,] "2012-01" "2009-01"
[2,] "2014-12" "2013-09"

The output above shows that the accidents start and finish after the counts time series. Next, to compute a summary for each month, we begin by computing the unique values of month in the range of the two data sets.

uniq.month.vec <- unique(c(
  accidents_dt$month.str,
  counts_dt$month.str))
month_01 <- function(mois)ymd2POSIXct(paste0(mois, "-01"))
month_dt <- data.table(month.01 = month_01(uniq.month.vec))

The code below defines the locale so that we can be sure to have the month names in English.

old.locale <- Sys.setlocale(locale="en_US.UTF-8")
month_english <- function(POSIXct)strftime(POSIXct, "%B %Y")
month_dt[, month.english := month_english(month.01)][]
      month.01 month.english
 1: 2012-01-01  January 2012
 2: 2012-02-01 February 2012
---                         
71: 2011-11-01 November 2011
72: 2011-12-01 December 2011

The output above contains a row for each month. Note that we have created the month.english variable which will be used for month selection. In the code below, we compute the total number of accidents of each type per month.

(accidents.per.month <- dcast(
  accidents_dt,
  month.str ~ .,
  sum,
  value.var=names(severity)))
    month.str deaths severe slight
 1:   2012-01      1      0     10
 2:   2012-02      0      0     20
---                               
35:   2014-11      1      2     69
36:   2014-12      0      0     10

The result above shows one row per month, with different columns for each level of severity. We reshape those columns into long format using the code below.

(accidents.tall <- melt(
  accidents.per.month,
  measure.vars=names(severity),
  variable.name="severity",
  value.name="people"))
     month.str severity people
  1:   2012-01   deaths      1
  2:   2012-02   deaths      0
 ---                          
107:   2014-11   slight     69
108:   2014-12   slight     10

Above we see one row for each combination of month and severity. In the code below, we use these data to plot the number of accidents in each month.

severity.colors <- c(
  slight="#FEE0D2",#dark red
  severe="#FB6A4A",
  deaths="#A50F15")#lite red
ggplot()+
  theme_bw()+
  geom_bar(aes(
    month_01(month.str), people, fill=severity),
    stat="identity",
    data=accidents.tall)+
  scale_fill_manual(
    values=severity.colors, breaks=names(severity.colors))+
  scale_x_datetime("month")

The figure above is a time series of the number of deaths and injuries. The output above shows that accidents with only slight injuries are most frequent, and accidents with at least one death are least frequent.

9.2 Interactive viz of accident frequency

In this section, we want to compare, for each month, the data for counters and accidents. Do we have more accidents when there are more bikes on the road? To check, we will fit a regression model, where each observation is a month with both kinds of data. First, we compute a summary of counts per month using the code below.

(counts.per.month <- dcast(
  counts_dt[!is.na(count)],
  location + month.str ~ .,
  list(length, mean, sum),
  value.var="count"))
          location month.str count_length count_mean count_sum
  1:         Berri   2009-01           31   100.3226      3110
  2:         Berri   2009-02           28   159.6786      4471
 ---                                                          
441: Totem_Laurier   2013-08           31  3162.7097     98044
442: Totem_Laurier   2013-09           18  2888.7778     51998

The output above has one row per combination of location and month, with columns for:

  • count_length: the number of days.
  • count_mean: the mean count per day.
  • count_sum: the total number of counts in that month.

We can see that some months have missing days. For example, there are only 18 days for Totem_Laurier in September 2013. To model only whole months, we would like to remove months with missing days. Therefore, we use the code below to compute the number of days in each month:

one.day <- 60 * 60 * 24
next_month <- function(POSIXct)month_01(POSIXct + one.day * 31)
counts.per.month[, days.in.month := as.integer(round(difftime(
  month_01(month_str(next_month(month_01(month.str)))),
  month_01(month.str),
  units="days")))][]
          location month.str count_length count_mean count_sum days.in.month
  1:         Berri   2009-01           31   100.3226      3110            31
  2:         Berri   2009-02           28   159.6786      4471            28
 ---                                                                        
441: Totem_Laurier   2013-08           31  3162.7097     98044            31
442: Totem_Laurier   2013-09           18  2888.7778     51998            30

The output above contains the new column days.in.month. We use the code below to print only the months with missing days:

counts.per.month[
  count_length < days.in.month,
  .(location, month.str, count_length, days.in.month)]
         location month.str count_length days.in.month
 1:         Berri   2009-04           29            30
 2:         Berri   2011-11            3            30
---                                                   
22:        Rachel   2013-09           18            30
23: Totem_Laurier   2013-09           18            30

The data shown above are excluded from regression analysis using the code below.

complete.months <- counts.per.month[count_length == days.in.month]

Next, we make a table with counts and accidents in different columns:

city.wide.complete <- complete.months[count_sum>0, .(
  locations=.N,
  total.counts=sum(count_sum)
), keyby=month.str]
city.wide.accidents <- accidents_dt[, .(
  total.accidents=.N
), keyby=month.str]
(scatter.not.na <- city.wide.accidents[
  city.wide.complete, nomatch=0L
][, month.01 := month_01(month.str)][])
    month.str total.accidents locations total.counts   month.01
 1:   2012-01              11         7        20386 2012-01-01
 2:   2012-02              19         7        26727 2012-02-01
---                                                            
17:   2013-07             315         8       916662 2013-07-01
18:   2013-08             326         8       856066 2013-08-01

The output above shows one row per month with both count and accident data. Next, we fit a linear model which uses counts to predict accidents.

(fit <- lm(total.accidents ~ total.counts - 1, scatter.not.na))

Call:
lm(formula = total.accidents ~ total.counts - 1, data = scatter.not.na)

Coefficients:
total.counts  
   0.0003723  
scatter.not.na[, mean(total.accidents/total.counts)]
[1] 0.0003847625
scatter.not.na[, sum(total.accidents)/sum(total.counts)]
[1] 0.0003693805

The output above shows that the estimated linear model coefficient is similar to the estimated empirical means. Finally, we use the code below to create an interactive graphic.

scatter.not.na[, let(
  pred.accidents = predict(fit),
  month.english = month_english(month.01)
)]
animint(
  regression=ggplot()+
    theme_bw()+
    ggtitle("Numbers of accidents and cyclists")+
    geom_line(aes(
      total.counts, pred.accidents),
      color="grey",
      data=scatter.not.na)+
    geom_point(aes(
      total.counts, total.accidents),
      clickSelects="month.english",
      size=5,
      alpha=0.75,
      data=scatter.not.na)+
    ylab("Total bike accidents (all Montreal locations)")+
    xlab("Total cyclists (all Montreal locations)"),
  timeSeries=ggplot()+
    theme_bw()+
    ggtitle("Time series of accident frequency")+
    xlab("Month")+
    geom_point(aes(
      month.01, total.accidents/total.counts),
      clickSelects="month.english",             
      size=5,
      alpha=0.75,
      data=scatter.not.na))

The data viz above shows two data visualizations of city-wide accident frequency over time.

  • The first plot shows that the number of accidents grows with the number of cyclists.
  • The second plot shows the frequency of accidents over time.

9.3 Interactive viz with map and details

In this section, we will create a visualization with several linked plots:

  • Summary of counters: map of counters or min/max dates for each counter, to select a counter.
  • Details of a counter, summary of months: time series of monthly totals for counters and accidents. Click to select a month.
  • Details of a counter and a month: time series of daily totals, for the selected month.

9.3.1 Counter summary with map

Before examining the data table of counter locations, we first convert the name variable to unicode strings:

(counter.locations <- data.table(montreal.bikes$counter.locations)[, .(
  lon = coord_X, lat = coord_Y,
  nom_comptage=iconv(nom_comptage, "latin1", "UTF-8"))])
          lon      lat              nom_comptage
 1: -73.58888 45.51955              Saint-Urbain
 2: -73.57398 45.52741                   Brebeuf
---                                             
20: -73.58221 45.51370          Parc U-Zelt Test
21: -73.60311 45.52782 Saint-Laurent U-Zelt Test

In the output above, we see that the nom_comptage column indicates the location of the counter, but the values are not exactly the same as in the location column in the table of counts. We use the code below to establish correspondence between the names in the two tables.

loc.name.code <- c(
  Berri1="Berri",
  Brebeuf="Brébeuf",
  CSC="Côte-Sainte-Catherine",
  Maisonneuve_1="Maisonneuve 1",
  Maisonneuve_2="Maisonneuve 2",
  Parc="du Parc",
  PierDup="Pierre-Dupuy",
  "Rachel/Papineau"="Rachel",
  "Saint-Urbain"="Saint-Urbain",
  Totem_Laurier="Totem_Laurier")
(show.locations <- counter.locations[
, location := loc.name.code[nom_comptage]
][!is.na(location)])
          lon      lat  nom_comptage      location
 1: -73.58888 45.51955  Saint-Urbain  Saint-Urbain
 2: -73.57398 45.52741       Brebeuf       Brébeuf
---                                               
 9: -73.58883 45.52777 Totem_Laurier Totem_Laurier
10: -73.56284 45.51613        Berri1         Berri

The output above shows the geographical position of each counter. The counter locations above are plotted below.

map.lim <- show.locations[, lapply(.SD, range), .SDcols=c("lat","lon")]
diff.vec <- sapply(map.lim, diff)
diff.mat <- c(-1, 1) * matrix(diff.vec, 2, 2, byrow=TRUE)
scale.mat <- as.matrix(map.lim) + diff.mat
bike.paths <- data.table(montreal.bikes$path.locations)
show.paths <- bike.paths[(
  lat %between% scale.mat[, "lat"]
) & (
  lon %between% scale.mat[, "lon"]
)]
(mtl.map <- ggplot()+
   theme_bw()+
   theme(
     panel.margin=grid::unit(0, "lines"),
     axis.line=element_blank(), axis.text=element_blank(),
     axis.ticks=element_blank(), axis.title=element_blank(),
     panel.background = element_blank(),
     panel.border = element_blank())+
   coord_cartesian(xlim=map.lim$lon, ylim=map.lim$lat)+
   scale_x_continuous(limits=map.lim$lon)+
   scale_y_continuous(limits=map.lim$lat)+
   geom_path(aes(
     lon, lat,
     tooltip=TYPE_VOIE,
     group=paste(feature.i, path.i)),
     color="grey",
     data=show.paths)+
   geom_text(aes(
     lon, lat,
     label=location),
     clickSelects="location",
     data=show.locations))
Warning: Removed 96 rows containing missing values (geom_path).

The figure above shows a map of Montreal, with text for each of the ten counters.

9.3.2 Summary of extreme dates for each counter

In this section, we compute the min and max dates for each counter.

(location.ranges <- dcast(
  counts.per.month[0 < count_sum][
  , month.01 := month_01(month.str)],
  location ~ .,
  list(min, max),
  value.var="month.01"))
         location month.01_min month.01_max
 1:         Berri   2009-01-01   2013-09-01
 2:       Brébeuf   2009-07-01   2010-11-01
---                                        
 9:  Saint-Urbain   2009-01-01   2010-11-01
10: Totem_Laurier   2013-02-01   2013-09-01

The output above shows a row for each counter, with columns for the min and max dates observed. The plot below shows the time period that each counter was in operation.

location.colors <- c(#dput(RColorBrewer::brewer.pal(12, "Set3"))
  "#8DD3C7", "grey50", "#BEBADA", "#FB8072", "#80B1D3", "#FDB462",
  "#B3DE69", "#FCCDE5", "#D9D9D9", "#BC80BD", "#CCEBC5", "#FFED6F")
names(location.colors) <- show.locations$location
seg.size <- 10
(CounterRanges <- ggplot()+
  theme_bw()+
  xlab("min/max dates")+
  ylab("source de données")+
  scale_color_manual(values=location.colors)+
  guides(color="none")+
  geom_segment(aes(
    month.01_min, location,
    xend=month.01_max, yend=location),
    showSelected="location",
    data=location.ranges,
    size=seg.size+2)+
  geom_segment(aes(
    month.01_min, location,
    xend=month.01_max, yend=location,
    color=location),
    clickSelects="location",
    data=location.ranges,
    size=seg.size))

The figure above shows a segment for each counter. With the code below, we add a segment to represent the date range of accidents.

accidents.range <- dcast(
  data.table(lieu="accidents", accidents_dt),
  lieu ~ .,
  list(min, max),
  value.var="date")
(MonthSummary <- CounterRanges+
  geom_segment(aes(
    date_min, lieu,
    xend=date_max, yend=lieu),
    color=severity.colors[["deaths"]],
    data=accidents.range,
    size=seg.size))

In the figure above, we see another segment (for the accidents at the bottom).

9.3.3 Monthly time series

The code below is used to plot the count data time series.

ggplot()+
  theme_bw()+
  geom_line(aes(
    date, count, group=location),
    data=counts_dt)+
  scale_color_manual(values=location.colors)+
  geom_point(aes(
    date, count, color=location),
    data=counts_dt)
Warning: Removed 407 rows containing missing values (geom_point).

The figure below visualizes the same count data, but summarized for each month.

FACET <- function(DT, facet)data.table(DT, facet)
COMPTEURS <- function(DT)FACET(DT, "counts/day")
(MonthSeries <- ggplot()+
  guides(color="none")+
  theme_bw()+
  facet_grid(facet ~ ., scales="free")+
  geom_tallrect(aes(
    xmin=month.01-15*one.day, xmax=month.01+15*one.day),
    clickSelects="month.english",
    data=month_dt,
    alpha=1/2)+
  geom_line(aes(
    month_01(month.str), count_mean, group=location,
    color=location),
    showSelected="location",
    clickSelects="location",
    data=COMPTEURS(counts.per.month))+
  scale_color_manual(values=location.colors)+
  xlab("month")+
  ylab(""))

The figure above shows a curve for each counter. The code below adds two geoms.

month.text <- counts.per.month[
, .SD[which.max(count_mean)]
, by=location]
(MonthText <- MonthSeries+
  geom_point(aes(
    month_01(month.str), count_mean, color=location,
    tooltip=paste(
      count_mean, "bikes at",
      location, "in", month_english(month_01(month.str)))),
    showSelected="location",
    clickSelects="location",
    size=5,
    data=COMPTEURS(counts.per.month))+
  geom_text(aes(
    month_01(month.str), count_mean+300,
    color=location, label=location),
    showSelected="location",
    clickSelects="location",
    data=COMPTEURS(month.text)))

The code below adds the accident data in another facet.

ACCIDENTS <- function(DT)FACET(DT, "accidents")
(MonthFacet <- MonthText+
   facet_grid(facet ~ ., scales="free")+
   scale_fill_manual(
     values=severity.colors, breaks=names(severity.colors))+
   geom_bar(aes(
     month_01(month.str), people,
     fill=severity),
     showSelected="severity",
     stat="identity",
     position="identity",
     color=NA,
     data=ACCIDENTS(accidents.tall[order(-severity)])))

The figure above shows a panel for each of the data types (accidents and counts).

9.3.4 Details for a month

The goal in this section is to create a dotplot of accidents for each month, where each dot represents one person in an accident. In each accident, there are counts of people who died, along with people who suffered severe and slight injuries. Below we classify the severity of each accident according to the worst outcome among the people affected.

accidents_dt[, severity.str := fcase(
  0 < deaths, "deaths",
  0 < severe, "severe",
  default="slight")
][
, severity := factor(severity.str, names(severity.colors))
][, table(severity)]
severity
slight severe deaths 
  5262    289     44 

The result above shows that minor injuries are most frequent, and deaths are least frequent. In the code below, we create the accident.i variable, which serves to number the accidents in a day.

day_in_month <- function(POSIXct)as.integer(strftime(POSIXct, "%d"))
add_day_month <- function(DT)DT[, let(
  day.in.month = day_in_month(date),
  month.english = month_english(date))]
accidents.cumsum <- add_day_month(accidents_dt[
  order(date, -severity)
][
, accident.i := seq_len(.N)
, by=date
])
ggplot()+
  theme_bw()+
  theme(panel.margin=grid::unit(0, "cm"))+
  facet_wrap("month.str")+
  scale_fill_manual(values=severity.colors)+
  scale_x_continuous("day in month", breaks=c(1, 10, 20, 30))+
  geom_point(aes(
    day.in.month, accident.i, fill=severity),
    data=accidents.cumsum)

The figure above shows a circle for each accident. In the code below, we create a grid of days.

(days.dt <- month_dt[, .(date=seq(
  min(month.01),
  max(next_month(month.01)),
  by="day"
))][, weekday := strftime(date, "%a")][])
            date weekday
   1: 2009-01-01     Thu
   2: 2009-01-02     Fri
  ---                   
2191: 2014-12-31     Wed
2192: 2015-01-01     Thu

The output above shows one row per day in the period of observed data. The code below creates a table to highlight weekends.

(weekend.dt <- add_day_month(days.dt[
  grepl("Sat|Sun", weekday)])[])
           date weekday day.in.month month.english
  1: 2009-01-03     Sat            3  January 2009
  2: 2009-01-04     Sun            4  January 2009
 ---                                              
625: 2014-12-27     Sat           27 December 2014
626: 2014-12-28     Sun           28 December 2014

The output above has one row per weekend day. Next, we create a table to visualize the name of each location.

add_day_month(counts_dt)
(day.text <- counts_dt[
, .SD[which.max(count)]
, by=.(location, month.english)])
          location  month.english                date month.str count
  1:         Berri   January 2009 2009-01-11 05:00:00   2009-01   318
  2:         Berri  February 2009 2009-02-18 05:00:00   2009-02   326
 ---                                                                 
441: Totem_Laurier    August 2013 2013-08-21 04:00:00   2013-08  4293
442: Totem_Laurier September 2013 2013-09-18 04:00:00   2013-09  3921
          loc.lines day.in.month
  1:          Berri           11
  2:          Berri           18
 ---                            
441: Totem\nLaurier           21
442: Totem\nLaurier           18

The output above shows the day with the max count, for each location and month. Next, we use the code below to plot the daily count data.

(DaysCounters <- ggplot()+
  geom_tallrect(aes(
    xmin=day.in.month-0.5, xmax=day.in.month+0.5,
    key=paste(date)),
    showSelected="month.english",
    fill="grey",
    color="white",
    data=weekend.dt)+
  guides(color="none", fill="none")+
  theme_bw()+
  facet_grid(facet ~ ., scales="free")+
  geom_line(aes(
    day.in.month, count, group=location,
    key=location, color=location),
    showSelected=c("location", "month.english"),
    clickSelects="location",
    chunk_vars=c("month.english"),
    data=COMPTEURS(counts_dt))+
  scale_color_manual(values=location.colors)+
  ylab("")+
  geom_point(aes(
    day.in.month, count, color=location,
    key=paste(day.in.month, location),
    tooltip=paste(
      count, "cyclistes à",
      location, "en",
      date)),
    showSelected=c("location", "month.english"),
    clickSelects="location",
    size=5,
    chunk_vars=c("month.english"),
    fill="white",
    data=COMPTEURS(counts_dt)))
Warning: Removed 407 rows containing missing values (geom_point).

The figure above is over-plotted, because it shows all of the months at the same time, whereas only one month will be shown in the interactive version. The code below adds the accident data.

(DaysFacet <- DaysCounters+
   scale_fill_manual(
     values=severity.colors, breaks=names(severity.colors))+
   geom_text(aes(
     15, 23, label=month.english, key=1),
     showSelected="month.english",
     data=ACCIDENTS(month_dt))+
   scale_x_continuous("Day in month", breaks=c(1, 10, 20, 30))+
   geom_point(aes(
     day.in.month, accident.i,
     key=paste(date.str, accident.i),
     fill=severity),
     showSelected=c("severity","month.english"),
     size=4,
     chunk_vars="month.english",
     data=ACCIDENTS(accidents.cumsum)))
Warning: Removed 407 rows containing missing values (geom_point).

The figure above has a new panel for accident data on top.

9.3.5 Interactive graphic

Finally, we combine the previous ggplots into an interactive data visualization using the code below.

animint(
  MonthFacet+
    ggtitle("All data, select month"),
  DaysFacet+
    ggtitle("Selected month (weekends in grey)")+
    geom_label_aligned(aes(
      day.in.month, count+1500, color=location, label=location,
      key=location),
      showSelected=c("location", "month.english"),
      clickSelects="location",
      data=COMPTEURS(day.text))+
    theme_animint(last_in_row=TRUE),
  MonthSummary+theme_animint(width=450, height=250),
  mtl.map+theme_animint(height=250),
  selector.types=list(severity="multiple"),
  duration=list(month.english=2000),
  first=list(
    location="Maisonneuve 2",
    month.english="July 2012"))

The visualization above contains 4 plots:

  • Upper left: time series with summary data for each month.
  • Upper right: time series of daily details for selected month.
  • Bottom left: min and max dates for each data source.
  • Bottom right: counter locations on the Montreal map.

9.4 Chapter summary and exercises

We have explored several methods for visualizing time series data of bike counts and accidents in Montreal.

Exercises:

  • Change location to a multiple selection variable.
  • On the map, draw a circle for each location, with size that changes based on the count of the accidents in the currently selected month.
  • On the MonthSummary plot, add background rectangles that can be used to select the month.
  • Remove the MonthSummary plot and add a similar visualization as a third panel in the MonthFacet plot.
  • In DaysFacet, add aes(tooltip) with details of each accident (address, number of people involved, etc).

Next, Chapter 10 explains how to visualize the K-Nearest-Neighbors machine learning model.