Ch9 Spatial Data Visualization

In this chapter, we discuss spatial data visualization techniques. Spatial data contains measurements of various geographical locations, areas, path, and etc. Therefore, its visualizations are often based on maps. In this chapter, we will go over a few typical map visualizations. To generate these visualizations, we need the following R packages.

library(tidyverse)
library(ggmap) # devtools::install_github("dkahle/ggmap")
library(RColorBrewer)
library(statebins)
library(viridis)
library(viridisLite)
library(geofacet) # takes long time to install.
library(geosphere)
library(grid)
library(gridExtra)
library(mapproj)
#For windows OS, you may need to install Rtools from: https://cran.r-project.org/bin/windows/Rtools/rtools40.html

1 Choropleth Map

A choropleth map is a type of thematic map in which areas are shaded or patterned in proportion to a statistical variable that represents an aggregate summary of a geographic characteristic within each area, such as population density or per capita income.

Choropleth maps provide an easy way to visualize how a measurement varies across a geographic area or show the level of variability within a region. A heat map is similar but does not use the geographic areas. They are the most common type of thematic map because published statistical data (from government or other sources) is generally aggregated into well-known geographic units, such as countries, states, provinces, and counties.

We have the data of the GDP per capita of each state in US in 2019. We can simply draw a barplot to visualize the data. Even though we reorder the bar according to the bar length, it is still hard to see where are the economically strong states and where are the weak ones. If we could combine the barplot with the map, the visualization will be much more informative. To generate the choropleth map, we can ggplot and maps packages together.

library(tidyverse)
library(gridExtra)
state_gdp = read.csv("data/state_gdp_per_capita_2019.csv", header = TRUE)
state_gdp %>%
  filter(State != "District of Columbia") %>%
  ggplot() +
  geom_col(aes(x = reorder(State,-gdp_per_capita2019),
               y = gdp_per_capita2019, 
               fill = gdp_per_capita2019),
           width = 0.8)+
  scale_fill_viridis_c(limits = c(40000, 100000),
                       breaks = seq(40000, 100000, 10000),
                       labels = format(seq(40000, 100000, 10000), scientific=F))+
  ylab("GDP per capita")+
  theme(axis.text.x = element_text(angle = 90, vjust = 0.3, hjust = 1),
        axis.title.x = element_blank())

# create a map
state_boundary = map_data("state") # from ggplot2
#head(state_boundary)
ggplot() + 
  geom_polygon(data=state_boundary, aes(x=long, y=lat, group=group), # try removing "group=group"
               color="black", fill="lightblue" ) +
  coord_map("conic", lat0 = 30)

If we can combine the two plot above, the visualization will be much more informative. If we fill in each region with a color corresponding to the value, we have the choropleth map.

state_gdp$state = tolower(state_gdp$State)
state_gdp_boud <- left_join(state_boundary, state_gdp, 
                            by = c("region"="state"))
ggplot() +
  geom_polygon( data = state_gdp_boud,
                aes(x = long, y = lat, group = group, 
                    fill = gdp_per_capita2019),
                color="white", size = 0.2) +
  scale_fill_viridis_c(limits = c(40000, 100000),
                       breaks = seq(40000, 100000, 10000),
                       labels = format(seq(40000, 100000, 10000),
                                       scientific=F))+
  coord_map("conic", lat0 = 30)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

We can generate choropleth map for the counties or the states.

county_boundary = map_data("county")
#head(county_boundary)
ggplot() + 
  geom_polygon(data=county_boundary, 
               aes(x=long, y=lat, group=group), # try removing "group=group"
               color="black", fill="lightblue") +
  coord_map("conic", lat0 = 30)

Lastly, we can generate world map too.

world_boundary = map_data("world")
ggplot() + 
  geom_polygon(data=world_boundary, 
               aes(x=long, y=lat, group=group), # try removing "group=group"
               color="black", fill="lightblue" ) +
  coord_fixed(ratio = 1.3)

We now analyze the college data set. Suppose we are interested in the distribution of the colleges across US, we can use choropleth map to display the number of colleges in each state.

college = read.csv("data/college.csv", header = TRUE)
state_info = data.frame(abbrev = state.abb, 
                       fullname = state.name, 
                       fullname_lower = tolower(state.name),
                       area = state.x77[,"Area"])
college_info = left_join(college, state_info, by = c("state" = "abbrev"))
college_info = college_info %>%
  group_by(fullname_lower) %>%
  summarize(college_number = length(state),
            area = max(area)) %>%
  mutate(college_number_per_1000sqmile = college_number/area*1000)
state_boundary = map_data("state") # create a data frame containing the state boundaries. this function is from ggplot2
state_boundary = left_join(state_boundary, college_info, by = c("region" = "fullname_lower"))
ggplot() + 
  geom_polygon(data=state_boundary, 
               aes(x=long, y=lat, group=group, fill=college_number),
               color="black") +
  scale_fill_viridis_c() +
  coord_map("conic", lat0 = 30)

ggplot() + 
  geom_polygon(data=state_boundary, 
               aes(x=long, y=lat, group=group, fill=college_number_per_1000sqmile),
               color="black") +
  scale_fill_viridis_c(name = "# of colleges \nper 1000 sq. mile", trans="log10") +
  coord_map("conic", lat0 = 30)

2 Scatterplot on Map

We can use the map as a canvas and plot locations of the places of interest, for example, a scatterplot on map. The techniques we learn for the scatterplot can be directly transferred to map visualization.

For example, we can generate a bubble plot on a map. Suppose we visualize the data set on colleges in the US. We can plot the location of the each college, the tuition, and the number of students on the map.

college = read.csv("data/college.csv",header = TRUE)
ggplot() + 
  geom_polygon(data = state_boundary, 
               aes(x = long, y = lat, group = group),
               color = "black", fill = "lightblue" ) +
  geom_point(data = college %>% 
                      filter(!(state %in% c("AK","HI"))) %>%
                      arrange(tuition), 
             mapping = aes(x = lon, y = lat, 
                           color = tuition, size = undergrads)) +
  scale_color_viridis_c() +
  scale_size(range = c(0.1,5)) +
  coord_map("conic", lat0 = 30)

We can further conduct analysis on the map by generating density estimate and so on.

ggplot() + 
  geom_polygon(data = state_boundary, 
               aes(x = long, y = lat, group = group),
               color = "black", fill = "lightblue") +
  geom_point(data = college %>% filter(!(state %in% c("HI","AK"))), 
             mapping = aes(x = lon, y = lat), size = 1) +
  stat_density2d(data = college %>% filter(!(state %in% c("HI","AK"))),
                 mapping = aes(x = lon, y = lat, 
                               fill = after_stat(level)),
                 alpha = 0.3, size = 2, bins = 10, geom = "polygon") +
  coord_map("conic", lat0 = 30)

3 Google Map via ggmap

The visualization so far is based on the plotted map which contains only the boundaries of the states or counties. When plotting more local information such as crime location, we need the reference of the road, building, and etc. Therefore, it will be more convenient if we can access the map information from one of the map website, such as Google Map. Fortunately, ggmap is an R package that makes it easy to retrieve raster map tiles from popular online mapping services like Google Maps and Stamen Maps and plot them using the ggplot2 framework.

library(ggmap)

The instruction for setting up the API is at here. Another tutorial for Google platform is at here

For this course, you can use this API key, but please be careful when using the key.

ggmap::register_google(key = "AIzaSyB2dXu4azkdzKjFg8Tskk5NFPurFc2__r0")

Geocoding: We can use the geocode function to get the latitude, longitude and other information:

# geocode('Univerisity of Cincinnati',output = 'more')

## # A tibble: 1 x 9
##     lon   lat type      loctype address                  north south  east  west
##   <dbl> <dbl> <chr>     <chr>   <chr>                    <dbl> <dbl> <dbl> <dbl>
## 1 -84.5  39.1 establis~ rooftop 2600 clifton ave, cinci~  39.1  39.1 -84.5 -84.5

First, we need to us get_map() function to download the map from various sources, e.g., Google map. The get_map() function is a wrapper function for the underlying functions get_googlemap(), get_openstreetmap(), get_stamenmap(), and get_cloudmademap() which accepts a wide array of arguments and returns a classed raster object for plotting. Once the map is downloaded, we use ggmap() to create the canvas and generate visualization.

UClayer_goo_ter <- get_map(location = "University of Cincinnati", zoom = 15, 
                           source = "google", maptype = "terrain")
UClayer_goo_sat <- get_map(location = "University of Cincinnati", zoom = 15, 
                           source = "google", maptype = "hybrid")
UClayer_goo_roa <- get_map(location = "University of Cincinnati", zoom = 15, 
                           source = "google", maptype = "roadmap")
UClayer_sta_ter <- get_map(location = "University of Cincinnati", zoom = 15, 
                           source = "stamen", maptype = "terrain")
UClayer_sta_wat <- get_map(location = "University of Cincinnati", zoom = 15, 
                           source = "stamen", maptype = "watercolor")
UClayer_sta_ton <- get_map(location = "University of Cincinnati", zoom = 15, 
                           source = "stamen", maptype = "toner")
Houstonlayer    <- get_map(location = "Houston, TX",              zoom = 11, 
                           source = "google", maptype = "roadmap")
# We could directly use get_googlemap() instead of get_map()
Cincinnatilayer <- get_googlemap(center = c(-84.50, 39.137580),
                                 zoom = 12, scale = 2, maptype ='terrain', color = 'color')
USA_lon_lat     <- c(left = -128, bottom = 23, right = -65, top =52)
USAlayer        <- get_stamenmap(USA_lon_lat, zoom = 5)
library(grid)
library(gridExtra)
g1=ggmap(UClayer_goo_ter,extent = 'device') + ggtitle("Google Maps Terrain") 
g2=ggmap(UClayer_goo_sat,extent = 'device') + ggtitle("Google Maps Satellite")
g3=ggmap(UClayer_goo_roa,extent = 'device') + ggtitle("Google Maps Road")
g4=ggmap(UClayer_sta_ter,extent = 'device') + ggtitle("Stamen Maps Terrain")
g5=ggmap(UClayer_sta_wat,extent = 'device') + ggtitle("Stamen Maps Watercolor")
g6=ggmap(UClayer_sta_ton,extent = 'device') + ggtitle("Stamen Maps Toner")
grid.arrange(g1,g2,g3,g4,g5,g6,nrow = 2)

3.1 Scatterplot on Map with ggmap

Using the Cincinnati police data which is availabe at the Cincinnati Open Data website, we can download the location of different type of crimes in Cincinnati. We use the visualization below to display the location of types of crimes.

UC_crime = read.csv("data/UC_crime.csv", header = TRUE)
ggmap(UClayer_goo_ter) + 
  geom_point(data = filter(UC_crime, OFFENSE %in% c("VANDALISM","BURGLARY","ROBBERY", "ASSAULT")) ,
             mapping = aes(x=LONGITUDE_X, y=LATITUDE_X), size=0.5, alpha=0.5)+
  scale_color_viridis_d(guide = FALSE)+
  facet_wrap(~OFFENSE,ncol = 2)
## Warning: Removed 1329 rows containing missing values (`geom_point()`).
## Warning: The `guide` argument in `scale_*()` cannot be `FALSE`. This was deprecated in
## ggplot2 3.3.4.
## ℹ Please use "none" instead.
## ℹ The deprecated feature was likely used in the ggmap package.
##   Please report the issue at <https://github.com/dkahle/ggmap/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

As we can see assault takes place mostly to the south of the campus with more incidents concentrated around the southwest corner of the campus. The burglary mostly happens to the residential houses to the south of the campus. Robbery often happens to the south and to the east of the campus, where there are more shops and restaurants. This visualization clearly tells us the safe and risk areas around the campus. We can further conduct some analysis using density estimation as follows.

ggmap(UClayer_goo_ter) + 
  geom_point(data = filter(UC_crime, OFFENSE %in% c("ASSAULT", "ROBBERY")) ,
             mapping = aes(x=LONGITUDE_X, y=LATITUDE_X), size=0.5, alpha=0.5)+
  geom_density2d_filled(data = filter(UC_crime, OFFENSE %in% c("ASSAULT", "ROBBERY")) , 
                        mapping = aes(x=LONGITUDE_X, y=LATITUDE_X, fill=..level..), 
                        size = 0.1, alpha=0.4)+
  geom_density_2d(data = filter(UC_crime, OFFENSE %in% c("ASSAULT", "ROBBERY")) , 
                  mapping = aes(x=LONGITUDE_X, y=LATITUDE_X), 
                  size = 0.25, colour = "black")+
  scale_fill_viridis_d()+
  facet_wrap(~OFFENSE,ncol = 2)
## Warning: The dot-dot notation (`..level..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(level)` instead.
## ℹ The deprecated feature was likely used in the ggmap package.
##   Please report the issue at <https://github.com/dkahle/ggmap/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: Removed 772 rows containing non-finite values
## (`stat_density2d_filled()`).
## Warning: Removed 772 rows containing non-finite values (`stat_density2d()`).
## Warning: Removed 772 rows containing missing values (`geom_point()`).

As we can, the areas with bright color indicate the higher frequencies of the crimes, which are consistent with what we find in the previous visualization.

ggmap(UClayer_goo_ter) + 
  geom_point(data = filter(UC_crime, OFFENSE %in% c("ASSAULT", "ROBBERY")) ,
             mapping = aes(x=LONGITUDE_X, y=LATITUDE_X), size=0.5, alpha=0.5)+
  stat_density_2d(data = filter(UC_crime, OFFENSE %in% c("ASSAULT", "ROBBERY")) , 
                  mapping = aes(x=LONGITUDE_X, y=LATITUDE_X, fill=..level..), 
                  size = 0.1, bins = 6, alpha=0.2,
                  geom = "polygon")+
  facet_wrap(~OFFENSE,ncol = 2)
## Warning: Removed 772 rows containing non-finite values (`stat_density2d()`).
## Warning: Removed 772 rows containing missing values (`geom_point()`).

We can further generate route information as follows.

UC_dist = mapdist('2715 Digby Ave, Cincinnati, OH 45220',
                  '2733 Short Vine St, Cincinnati, OH 45219')
UC_trek <- trek("2715 Digby Ave, Cincinnati, OH 45220", 
                "2733 Short Vine St, Cincinnati, OH 45219", 
                structure = "route")
UC_dist
## # A tibble: 1 × 9
##   from                      to        m    km miles seconds minutes  hours mode 
##   <chr>                     <chr> <int> <dbl> <dbl>   <int>   <dbl>  <dbl> <chr>
## 1 2715 Digby Ave, Cincinna… 2733…  1902  1.90  1.18     339    5.65 0.0942 driv…
UC_trek
## # A tibble: 37 × 3
##      lat   lon route
##    <dbl> <dbl> <chr>
##  1  39.1 -84.5 A    
##  2  39.1 -84.5 A    
##  3  39.1 -84.5 A    
##  4  39.1 -84.5 A    
##  5  39.1 -84.5 A    
##  6  39.1 -84.5 A    
##  7  39.1 -84.5 A    
##  8  39.1 -84.5 A    
##  9  39.1 -84.5 A    
## 10  39.1 -84.5 A    
## # ℹ 27 more rows
g1 + 
  geom_path(
    aes(x = lon, y = lat),  colour = "blue",
    size = 1.5, alpha = .5,
    data = UC_trek, lineend = "round"
  )

Next, we look at another data set about the Cincinnati housing price. We further download the map of Cincinnati.

Cincinnatilayer <- get_googlemap(center = c(-84.50, 39.137580),
                                 zoom = 12, scale = 2, maptype ='terrain', color = 'color')

We analyze the Cincinnati housing price.

cincinnati_SFH = read.csv("data/cincinnati_SFH_sales_2016to2019.csv",header = TRUE)
ggmap(Cincinnatilayer)+
  geom_point(data=cincinnati_SFH,aes(x=Longitude, y=Latitude, color = SalePrice),size=0.5)+
  scale_color_viridis_c(limits = c(0, 500000),
                       breaks = seq(0, 500000, 100000),
                       labels = format(seq(0, 500000, 100000), scientific=F))
## Warning: Removed 11144 rows containing missing values (`geom_point()`).

ggmap(Cincinnatilayer)+
  stat_summary_2d(data=cincinnati_SFH,
                  aes(x=Longitude, y=Latitude, z = SalePrice), fun = median, alpha=0.5, bins = 40)+
  scale_fill_viridis_c(limits = c(0, 500000),
                        breaks = seq(0, 500000, 100000),
                        labels = format(seq(0, 500000, 100000), scientific=F))
## Warning: Removed 11144 rows containing non-finite values (`stat_summary2d()`).
## Warning: Removed 49 rows containing missing values (`geom_tile()`).

ggmap(Cincinnatilayer)+
  geom_point(data=cincinnati_SFH, aes(x=Longitude, y=Latitude), size=0.2, alpha = 0.1)+
  geom_bin2d(data=cincinnati_SFH, aes(x=Longitude, y=Latitude), alpha=0.5, bins = 40)+
  scale_fill_viridis_c()
## Warning: Removed 11144 rows containing non-finite values (`stat_bin2d()`).
## Warning: Removed 11144 rows containing missing values (`geom_point()`).
## Warning: Removed 49 rows containing missing values (`geom_tile()`).

3.2 Choropleth Map with ggmap

USA_lon_lat     <- c(left = -128, bottom = 23, right = -65, top =52)
USAlayer        <- get_stamenmap(USA_lon_lat, zoom = 5)
ggmap(USAlayer) +
  geom_polygon( data=state_gdp_boud,
                aes(x=long, y=lat, group=group, 
                    fill = gdp_per_capita2019), 
          color="white", size = 0.2, alpha=0.7) +
  scale_fill_viridis_c(name="State GDP Per Capita",
                       limits = c(40000, 100000),
                       breaks = seq(40000, 100000, 20000),
                       labels = format(seq(40000, 100000, 20000), scientific=F))

3.3 Vector Field on ggmap

A vector field is an assignment of a vector to each point in a space. Visualization of vector field is useful for display movement of wind or current, and also surface of a function. The vector field in the plane is usually visualized by a set of arrows of various directions and lengths (magnitude). Each arrow starts from a coordinate in a grid in the plane. We visualize the vector field of wind speed and direction for hurricane Isabel. The wind speed and direction are three dimensional. So we can create a facet visualization.

east_coast = get_googlemap(center = c(-72.50, 32.50),
                           zoom = 5, scale = 2, maptype ='terrain', color = 'color')
wind_isabel = read_csv("data/isabel.csv")
every_n <- function(x, by = 2) {
  x <- sort(x)
  x[seq(1, length(x), by = by)]
}
keepx <- every_n(unique(wind_isabel$x), by = 4)
keepy <- every_n(unique(wind_isabel$y), by = 4)
keepz <- every_n(unique(wind_isabel$z), by = 3)
# ggmap(east_coast) +
#     geom_segment(data = wind_isabel %>% 
#                         filter(x %in% keepx  &  y %in% keepy & z == min(z)) %>%
#                         mutate(speedxy = sqrt(vx^2 + vy^2)), 
#                  aes(x = x - 0.5, y = y, xend = x-0.5+vx/50, yend = y+vy/50, color = speed),
#                  arrow = arrow(length = unit(0.1,"cm")), size = 0.6) +
#     scale_color_viridis_c()# +
    #geom_path(aes(x = long, y = lat, group = group), data = usa) +
    #coord_cartesian(xlim = range(islicesub$x), ylim = range(islicesub$y))
ggmap(east_coast)+
  geom_segment(data = wind_isabel %>% 
                 filter(x %in% keepx & y %in% keepy & z %in% keepz),
               aes(x = x - 0.5, y = y + 0.25, 
                   xend = x - 0.5 + vx/50, yend = y + 0.25 + vy/50, 
                   color = speed),
               arrow = arrow(length = unit(0.1,"cm")), size = 0.5) +
  scale_color_viridis_c() +
  facet_wrap(~z)

3.4 Connection Map with ggmap

A connection map shows the connections between several positions on a map. The link between 2 locations is usually drawn using great circle: the shortest route between them. It results in a rounded line that gives a pleasant look to the map. In R, this is made possible thanks to the packages like geosphere.

We use the United Airlines’ flights as an example and generate the following visualization to show the flight routes. The airport and flight information are in airports.csv and flights.csv, obtained from International Air Transport Association (IATA). The R package geosphere provides gcIntermediate() to calculate the great circle path. Then we can use the geom_path() function to draw the path.

library(tidyverse)
library(maps)
library(geosphere)
airports <- read.csv("data/airports.csv", header = TRUE) 
flights <- read.csv("data/flights.csv", header = TRUE, as.is=TRUE)
# flight_map = get_googlemap(center = c(-110, 30),
#                            zoom = 3, scale = 2, maptype ='terrain', color = 'color')
# save(list=c("flight_map"), file="data/flight_map.RData")
load("data/flight_map.RData")
fligts_sub <- flights %>% filter(airline == "UA")  #flights of  UA
traj = data.frame()
for (j in 1:dim(fligts_sub)[1]) {        
    airport1 <- airports[airports$iata == fligts_sub[j,"airport1"],]  #iata code for j flights in airport1
    airport2 <- airports[airports$iata == fligts_sub[j,"airport2"],] #iata code for j flights in airport2
    inter <- gcIntermediate(airport1[1,c("long", "lat")], 
                            airport2[1,c("long", "lat")], n=100, addStartEnd=TRUE)  #calculate the great circles
    traj = rbind(traj, data.frame(inter, flight_num = j))
    #lines(inter, col="black", lwd=0.8)  #draw the line
}
ggmap(flight_map) +
  geom_path(data = traj,
            aes(x = lon, y = lat, group = flight_num), 
            linewidth = 0.3)

4 Facet Map

A facet map is faceting the geographic regions and draws the data according to the corresponding geographic location. The graph can intuitively compare the distribution of data in different regions.

Facet map uses the geofacet package to perform faceting according to different geographic areas, and make the relative position of each area consistent with the actual geographic location.

The details of the package are explain on their website1 and github page2.

library(tidyverse)
library(geofacet)

Dataset: We are using the election dataset in the geofacet which is the information of 2016 election.

head(election)
##     state candidate   votes       pct
## 1 Alabama   Clinton  729547 34.357946
## 2 Alabama     Trump 1318255 62.083092
## 3 Alabama     Other   75570  3.558962
## 4  Alaska   Clinton  116454 36.550871
## 5  Alaska     Trump  163387 51.281512
## 6  Alaska     Other   38767 12.167617
ggplot(election, aes(candidate, pct, fill = candidate))+
  geom_bar(stat = 'identity')+
  scale_y_continuous(limits = c(0,100),breaks = seq(0,100,30),minor_breaks = T)+
  scale_fill_manual(name = "Candidate",
                    breaks = c("Clinton", "Trump", "Other"),
                    values = c("steelblue", "red", "grey"))+
  labs(title = '2016 US Presidential Election')+
  ylab("Percentage") + xlab("") +
  facet_geo(~state, grid = us_state_grid1) +         ## facet by the geographic regions
  theme(axis.text.x = element_text(angle = 90, vjust = 0.2, hjust = 1))

In the previous step, we use facet_geo to facet the data. The position of each panel is controlled by the grid, and the position information of the regions is in the default of facet_geo function, which can be used directly as (us_state_grid1, US region). We can use get_grid_name() to view the name code of the built-in grid location information, and the specific location name.

ggplot(state_unemp, aes(year, rate)) +
  geom_line() +
  facet_geo( ~state, grid = "us_state_grid2") +
  scale_x_continuous(labels = function(x) paste0("'", substr(x, 3, 4))) +
  ylab("Unemployment Rate (%)")

head(get_grid_names(),20)
## Note: More grids are available by name as listed here: https://raw.githubusercontent.com/hafen/grid-designer/master/grid_list.json
##  [1] "us_state_grid1"               "us_state_grid2"              
##  [3] "eu_grid1"                     "aus_grid1"                   
##  [5] "sa_prov_grid1"                "gb_london_boroughs_grid"     
##  [7] "nhs_scot_grid"                "india_grid1"                 
##  [9] "india_grid2"                  "argentina_grid1"             
## [11] "br_states_grid1"              "sea_grid1"                   
## [13] "mys_grid1"                    "fr_regions_grid1"            
## [15] "de_states_grid1"              "us_or_counties_grid1"        
## [17] "us_wa_counties_grid1"         "us_in_counties_grid1"        
## [19] "us_in_central_counties_grid1" "se_counties_grid1"

We can adjust the geographic information in the grid by grid_design(). It will open the application “Grid Designer” to adjust the geographic information.

grid_design()

We can also add a URL of a map to refer the geographic information.

grid_design(data = cn_bj_districts_grid2,img ='https://upload.wikimedia.org/wikipedia/commons/thumb/c/c4/Administrative_Division_Beijing.svg/900px-Administrative_Division_Beijing.svg.png' )

Time series plot by geographic information

We are using the Scottish children’s dental health data in the geofacet package, including the region and corresponding location code, the year of the survey, and the proportion of students with no dental disease in the first grade.

ggplot(nhs_scot_dental, aes(year,percent))+
  geom_line()+
  facet_geo(~name,grid = 'nhs_scot_grid')+
  scale_y_continuous(limits = c(min(nhs_scot_dental$percent),max(nhs_scot_dental$percent)))+
  scale_x_continuous(breaks = seq(2003,2014,3))+
  labs(title = 'Percentage of P1 children with no dental disease')+
  theme_bw()

5 Cartogram Heatmap

The cartogram heatmap does not express the value by changing the shape or size of the map like cartogram. Each area cartogram heatmap in is represented by a square of the same size, and the value is expressed by color. It is widely used in the data of the states in the United States than other regions.

Package: The package statebins is used to draw the cartogram heatmap as the extension of ggplot2.

library(tidyverse)
library(statebins)
library(viridis)
library(viridisLite)

Dataset: We select the data of the cumulative number of deaths of COVID-19 in each state of the United States as of September 9, 2020.

Resource: https://coronavirus.jhu.edu/region

covid<-read.csv("data/covid.csv")
head(covid)
##   X state.abb state.name deaths
## 1 1        AL    Alabama   2277
## 2 2        AK     Alaska     42
## 3 3        AZ    Arizona   5251
## 4 4        AR   Arkansas    928
## 5 5        CA California  13983
## 6 6        CO   Colorado   1977
##preparation 
covid$state.name<-as.character(covid$state.name)  
covid$deaths<-as.numeric(covid$deaths)            

##draw the plot
statebins(covid,                                  #dataframe
          state_col="state.name",                 #the name of state
          value_col = "deaths",                   #the number of death
          round=TRUE,                             
          dark_label = "white",                   #dark label as white
          light_label = "black")+                 #light label as black
  scale_fill_viridis(option = "E",begin = 0.95,end = 0,name="Deaths")+ # the color of fill
  labs(title = "COVID-19 U.S. Cumulative Deaths by September 9,2020",    #title and caption
       caption="Source:Johns Hopkins University & Medicine, Coronavirus Resource Center")+
  theme_statebins(legend_position = "right")    
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.

Group the number of death and use discrete colors.

##preparation
covid$death[which(covid$deaths<=100)]<-1                      
#group the number of death
covid$death[which(covid$deaths>100&covid$deaths<=1000)]<-2
covid$death[which(covid$deaths>1000&covid$deaths<=5000)]<-3
covid$death[which(covid$deaths>5000&covid$deaths<=10000)]<-4
covid$death[which(covid$deaths>10000&covid$deaths<=30000)]<-5
covid$death[which(covid$deaths>30000)]<-6
covid$death<-as.factor(covid$death)                           # change them to factor

ggplot(covid,aes(state=state.name,fill=death))+               
  geom_statebins()+                                           
  scale_fill_brewer(palette = 3,                              
                    direction = -1,                           
                    name="Deaths",                            
                    limits=c("6","5","4","3","2","1"),        #order the group
                    labels=c(">30000","10001-30000",
                             "5001-10000","1001-5000",
                             "101-1000","<=100")) + # chage the label
   labs(title = "COVID-19 U.S. Cumulative Deaths by September 9,2020", #add title
        caption="Source:Johns Hopkins University & Medicine, Coronavirus Resource Center")+#caption
   theme_statebins(legend_position = "right")