Warning: Removed 1490 rows containing missing values (geom_point).
3 The showSelected
keyword
This chapter explains showSelected, one of the two main keywords that animint introduces for interactive data visualization. After reading this chapter, you will be able to
- Use the showSelected keyword in your plot sketches to specify geoms for which only a subset of data should be plotted at any time.
- Use selection menus in animint to change the subset of plotted data.
- Specify smooth transitions between data subsets using the duration option and key aesthetic.
- Create animated data visualizations using the time option.
3.1 Sketching with showSelected
In this section, we will explain how the showSelected keyword can be used in plot sketches. The showSelected keyword specifies a variable to use for subsetting the data before plotting. Each geom in a data visualization has its own data set, and its own definition of showSelected variables. That means different geoms can specify different data sets and showSelected keywords to show different data subsets.
In fact, we have already used the showSelected keyword, which was automatically created by the interactive legends that we created in the previous two chapters. For example, consider the sketch below of the Keeling Curve data visualization from Chapter 1.
The sketch above includes showSelected=month
for the geom_point
, meaning that it should show the subset of data for the selected months. In contrast, since the geom_line
does not include showSelected
keywords, it always shows the entire data set (regardless of the selected months).
As another example, consider the sketch below of the first WorldBank data visualization from Chapter 2.
The sketch above specifies showSelected=region
for the geom_point
, meaning that it should show the subset of data for the selected regions.
Note that the code we used in chapter 2 did not explicitly specify showSelected=region
. Instead, we specified aes(color=region)
, and animint automatically assigned a showSelected keyword. In general, animint will assign a showSelected keyword for each variable that is used in a categorical legend.
However, the showSelected keyword is not limited to use with categorical legends. You can use showSelected keywords for any data variables you like, by explicitly specifying the variable names in the showSelected argument of the geom.
Each variable that is used with showSelected is treated by animint as a selection variable. For example, the Keeling Curve data viz has one selection variable (month), and so does the WorldBank data viz (region). For each selection variable, animint keeps track of the currently selected values. When the selection changes, animint updates the subset of data that is shown.
Each of the data visualizations sketched above has only one selection variable. However, a data visualization can have any number of selection variables. In the next section, we will explore a visualization of the World Bank data that has selection variables for region
and year
.
3.3 Transitions: the duration option and key aesthetic
You may have noticed that there are buttons at the bottom of each data visualization created by animint. Try clicking the “Show animation controls” button above. This table contains a row for each selection variable. The text boxes show the number of milliseconds that are used for transition durations after updating each selection variable. The default transition duration for each selection variable is 0, meaning data will be immediately placed at their new positions after updating each variable.
To illustrate the significance of transition durations, try changing the transition duration of the year variable to 2000. Then, change the selected value of the year variable. You should see the data points move slowly to their new positions, over a duration of 2 seconds.
Some transitions result in points moving only a little bit, to nearby positions (e.g. 1979-1980). Other transitions result in points moving a lot more, to far away locations (e.g. 1980-1981). Why is that?
Smooth transitions only make sense for data points that exist both before and after changing the selection. In the R code below we compute a table of counts of data points that can be plotted in each of these three years.
can.plot
FALSE TRUE
1979 27 187
1980 27 187
1981 26 188
It is clear from the table above that there are 187 points that can be plotted in 1979 and 1980. However, in 1981 there is one more data point, corresponding to a country for which we did not have data in 1980. Below we show the data for that country, Kosovo.
subset(three.years, country=="Kosovo")
iso2c country year fertility.rate life.expectancy population
5850 KV Kosovo 1979 NA NA 1491000
5851 KV Kosovo 1980 NA NA 1521000
5852 KV Kosovo 1981 4.5758 65.93268 1552000
GDP.per.capita.Current.USD 15.to.25.yr.female.literacy iso3c
5850 NA NA KSV
5851 NA NA KSV
5852 NA NA KSV
region capital longitude latitude
5850 Europe & Central Asia (all income levels) Pristina 20.926 42.565
5851 Europe & Central Asia (all income levels) Pristina 20.926 42.565
5852 Europe & Central Asia (all income levels) Pristina 20.926 42.565
income lending
5850 Lower middle income IDA
5851 Lower middle income IDA
5852 Lower middle income IDA
Indeed, the table above shows that fertility rate and life expectancy are missing for Kosovo during 1979-1980. Thus it does not make sense to do a smooth transition for countries such as Kosovo which would not be plotted either before or after the transition. How to specify that in the data visualization? In the code below, we use aes(key=country)
to specify that the country
variable should be used to match data points before and after changing the selection.
scatter.key <- ggplot()+
geom_point(aes(
x=life.expectancy, y=fertility.rate, color=region,
key=country),
showSelected="year",
data=WorldBank)
The key
aesthetic in the ggplot above is only meaningful for interactive data visualization, so it ignored when rendering with the usual R graphics devices. However, if we render this ggplot using animint2
, the country
variable will be used to make sure transtion durations are meaningful. To specify a default transition duration for the year
variable, we use the duration
option in the data viz below.
The duration
option must be a named list. Each name should be a selection variable, and each value should specify the number of milliseconds to use for a transition duration when the selected value of that variable is changed.
If you click “Show animation controls” in the data viz above, you will see that the text box for the year variable is 2000, as specified in the R code. If you change the selection from 1980 to 1981, you should see a proper transition.
In general the key
aesthetic should be specified for all geoms that use showSelected
with a variable that appears in the duration
option. In this example, we used the duration
option to specify a smooth transition for the year
variable. Since we use showSelected=year
in the geom_point
, we also specified the key
aesthetic for this geom.
3.4 Animation: the time option
The time
option is used to specify a variable to use for animation. The code below specifies year
as the variable to animate over time, with an update every 2000 milliseconds.
viz.duration.time <- viz.duration
viz.duration.time$time <- list(variable="year", ms=2000)
viz.duration.time
The data visualization above is animated, because the selected year advances every two seconds.
Exercise: make an animated data visualization that does NOT use smooth transitions. Hint: make a list of ggplots that has the time
option but no duration
option.
3.5 Chapter summary and exercises
This chapter explained the showSelected
keyword, selection menus, transition durations, and animation.
Exercises:
- Make an improved version of
viz.aligned
from the previous chapter. Instead of fixing the year at 1975, useshowSelected=year
so that the user can select a year. Add geoms that show the selected year: ageom_text
on the scatterplot, and ageom_vline
on the time series. - Translate one of the animation package examples to an animint. Hint: in the code for the animation package there is always a for loop over the time variable. Instead of calling a plotting function inside the for loop, use the list of data tables idiom to store the data that should be plotted. Then use those data along with
showSelected
to create ggplots, and render them using animint.
Next, Chapter 4 explains the clickSelects
keyword, which indicates a geom that can be clicked to update a selection variable.