Ch7P2 Visualization Types - II

In this chapter, we continue to discuss the commonly used visualization types. The following R packages are required to run the examples in this chapter.

library(tidyverse)
library(ggforce)
library(readxl)
library(sunburstR)
library(RColorBrewer)
library(grid)
library(gridExtra)
library(graphics)
library(vcd)
library(ggrepel)
library(ggsci)
library(ggtern)
library(sqldf)
library(waterfalls)

1 Visualization of Association for Discrete Data

1.1 Mosaic Plot

A mosaic plot (also known as a Marimekko diagram) is a graphical method for visualizing data from two or more qualitative variables. It is the multidimensional extension of spineplots, which graphically display the same information for only one variable. It gives an overview of the data and makes it possible to recognize relationships between different variables.

Dataset

We are using the Titanic dataset from the ggplot2. This dataset provides the information about the passengers who died on the Titanic’s maiden voyage with economic status (class), gender, age and survival status.

Package

There are many choices in R to draw mosaic plot. We can use geom_rect() in ggplot2 or geom_mosaic() in ggmosaic or mosaicplot() in graphics or mosaic() in vcd.

library(tidyverse)
library(graphics)
library(vcd)

Example by using graphics package

titanic=read.csv("data/titanic.csv")
dim(titanic)

## [1] 2201    4

titanic[seq(1,dim(titanic)[1],by=50),]

##      Class    Sex   Age Survived
## 1      3rd   Male Child       No
## 51     3rd Female Child       No
## 101    1st   Male Adult       No
## 151    1st   Male Adult       No
## 201    2nd   Male Adult       No
## 251    2nd   Male Adult       No
## 301    2nd   Male Adult       No
## 351    3rd   Male Adult       No
## 401    3rd   Male Adult       No
## 451    3rd   Male Adult       No
## 501    3rd   Male Adult       No
## 551    3rd   Male Adult       No
## 601    3rd   Male Adult       No
## 651    3rd   Male Adult       No
## 701    3rd   Male Adult       No
## 751   Crew   Male Adult       No
## 801   Crew   Male Adult       No
## 851   Crew   Male Adult       No
## 901   Crew   Male Adult       No
## 951   Crew   Male Adult       No
## 1001  Crew   Male Adult       No
## 1051  Crew   Male Adult       No
## 1101  Crew   Male Adult       No
## 1151  Crew   Male Adult       No
## 1201  Crew   Male Adult       No
## 1251  Crew   Male Adult       No
## 1301  Crew   Male Adult       No
## 1351  Crew   Male Adult       No
## 1401   3rd Female Adult       No
## 1451   3rd Female Adult       No
## 1501   2nd   Male Child      Yes
## 1551   1st   Male Adult      Yes
## 1601   1st   Male Adult      Yes
## 1651   3rd   Male Adult      Yes
## 1701  Crew   Male Adult      Yes
## 1751  Crew   Male Adult      Yes
## 1801  Crew   Male Adult      Yes
## 1851  Crew   Male Adult      Yes
## 1901   1st Female Adult      Yes
## 1951   1st Female Adult      Yes
## 2001   1st Female Adult      Yes
## 2051   2nd Female Adult      Yes
## 2101   2nd Female Adult      Yes
## 2151   3rd Female Adult      Yes
## 2201  Crew Female Adult      Yes

titanic_tab=table(titanic)
titanic_tab

## , , Age = Adult, Survived = No
## 
##       Sex
## Class  Female Male
##   1st       4  118
##   2nd      13  154
##   3rd      89  387
##   Crew      3  670
## 
## , , Age = Child, Survived = No
## 
##       Sex
## Class  Female Male
##   1st       0    0
##   2nd       0    0
##   3rd      17   35
##   Crew      0    0
## 
## , , Age = Adult, Survived = Yes
## 
##       Sex
## Class  Female Male
##   1st     140   57
##   2nd      80   14
##   3rd      76   75
##   Crew     20  192
## 
## , , Age = Child, Survived = Yes
## 
##       Sex
## Class  Female Male
##   1st       1    5
##   2nd      13   11
##   3rd      14   13
##   Crew      0    0

mosaicplot(~ Class + Sex , data = titanic, 
           main = "Survival on the Titanic", color = TRUE)

mosaicplot(~ Class + Sex + Age , data = titanic, 
           main = "Survival on the Titanic", color = TRUE)

mosaicplot(~ Class + Sex + Age + Survived, data = titanic, 
           main = "Survival on the Titanic", color = TRUE)

#mosaicplot(~ Class + Sex + Survived, data = titanic)
#mosaicplot(~ Sex + Class + Survived, data = titanic)

Another package for mosaic plots.

#vcd package
#data("Titanic")
#head(Titanic) # the same as titanic_tab
#mosaic(Titanic) 
mosaic(~ Sex + Age + Survived + Class, data = titanic,
  main = "Survival on the Titanic", shade = TRUE, legend = TRUE)

#assoc(Titanic, shade=TRUE, legend=TRUE)

In this incident, people tended to protect women and children. Adult men sacrificed themselves to give children and women opportunities, and the crew sacrificed themselves to give higher-class people opportunities.

Another example by using vcd package.

data001 <-read.csv("data/cinema.csv")
mosaicplot(~ year + release_date,  data =data001, 
           shade = T, color = T, main ="cinema" ) +
  theme(axis.text.x=element_text(angle=-45, hjust= .1))

1.2 Parallel Sets

A parallel sets plot is a new method for the visualization and interactive exploration of categorical data that shows data frequencies instead of the individual data points. The method is based on the axis layout of parallel coordinates, with boxes representing the categories and parallelograms between the axes showing the relations between categories.

We are using the Titanic data set from the ggplot2. This data set provides the information about the passengers on the Titanic’s maiden voyage, including their ticket class (economic status), gender, age, and survival status. Note that we have to reorganize the data set before using the parallel sets plot.

library(ggforce) # Package
titanic_tab=table(titanic)
titanic_tab

## , , Age = Adult, Survived = No
## 
##       Sex
## Class  Female Male
##   1st       4  118
##   2nd      13  154
##   3rd      89  387
##   Crew      3  670
## 
## , , Age = Child, Survived = No
## 
##       Sex
## Class  Female Male
##   1st       0    0
##   2nd       0    0
##   3rd      17   35
##   Crew      0    0
## 
## , , Age = Adult, Survived = Yes
## 
##       Sex
## Class  Female Male
##   1st     140   57
##   2nd      80   14
##   3rd      76   75
##   Crew     20  192
## 
## , , Age = Child, Survived = Yes
## 
##       Sex
## Class  Female Male
##   1st       1    5
##   2nd      13   11
##   3rd      14   13
##   Crew      0    0

titanic_freq <- reshape2::melt(titanic_tab)
titanic_freq

##    Class    Sex   Age Survived value
## 1    1st Female Adult       No     4
## 2    2nd Female Adult       No    13
## 3    3rd Female Adult       No    89
## 4   Crew Female Adult       No     3
## 5    1st   Male Adult       No   118
## 6    2nd   Male Adult       No   154
## 7    3rd   Male Adult       No   387
## 8   Crew   Male Adult       No   670
## 9    1st Female Child       No     0
## 10   2nd Female Child       No     0
## 11   3rd Female Child       No    17
## 12  Crew Female Child       No     0
## 13   1st   Male Child       No     0
## 14   2nd   Male Child       No     0
## 15   3rd   Male Child       No    35
## 16  Crew   Male Child       No     0
## 17   1st Female Adult      Yes   140
## 18   2nd Female Adult      Yes    80
## 19   3rd Female Adult      Yes    76
## 20  Crew Female Adult      Yes    20
## 21   1st   Male Adult      Yes    57
## 22   2nd   Male Adult      Yes    14
## 23   3rd   Male Adult      Yes    75
## 24  Crew   Male Adult      Yes   192
## 25   1st Female Child      Yes     1
## 26   2nd Female Child      Yes    13
## 27   3rd Female Child      Yes    14
## 28  Crew Female Child      Yes     0
## 29   1st   Male Child      Yes     5
## 30   2nd   Male Child      Yes    11
## 31   3rd   Male Child      Yes    13
## 32  Crew   Male Child      Yes     0

parallel_data14 <- gather_set_data(titanic_freq, c(1,4))
parallel_data14

##    Class    Sex   Age Survived value id x    y
## 1    1st Female Adult       No     4  1 1  1st
## 2    2nd Female Adult       No    13  2 1  2nd
## 3    3rd Female Adult       No    89  3 1  3rd
## 4   Crew Female Adult       No     3  4 1 Crew
## 5    1st   Male Adult       No   118  5 1  1st
## 6    2nd   Male Adult       No   154  6 1  2nd
## 7    3rd   Male Adult       No   387  7 1  3rd
## 8   Crew   Male Adult       No   670  8 1 Crew
## 9    1st Female Child       No     0  9 1  1st
## 10   2nd Female Child       No     0 10 1  2nd
## 11   3rd Female Child       No    17 11 1  3rd
## 12  Crew Female Child       No     0 12 1 Crew
## 13   1st   Male Child       No     0 13 1  1st
## 14   2nd   Male Child       No     0 14 1  2nd
## 15   3rd   Male Child       No    35 15 1  3rd
## 16  Crew   Male Child       No     0 16 1 Crew
## 17   1st Female Adult      Yes   140 17 1  1st
## 18   2nd Female Adult      Yes    80 18 1  2nd
## 19   3rd Female Adult      Yes    76 19 1  3rd
## 20  Crew Female Adult      Yes    20 20 1 Crew
## 21   1st   Male Adult      Yes    57 21 1  1st
## 22   2nd   Male Adult      Yes    14 22 1  2nd
## 23   3rd   Male Adult      Yes    75 23 1  3rd
## 24  Crew   Male Adult      Yes   192 24 1 Crew
## 25   1st Female Child      Yes     1 25 1  1st
## 26   2nd Female Child      Yes    13 26 1  2nd
## 27   3rd Female Child      Yes    14 27 1  3rd
## 28  Crew Female Child      Yes     0 28 1 Crew
## 29   1st   Male Child      Yes     5 29 1  1st
## 30   2nd   Male Child      Yes    11 30 1  2nd
## 31   3rd   Male Child      Yes    13 31 1  3rd
## 32  Crew   Male Child      Yes     0 32 1 Crew
## 33   1st Female Adult       No     4  1 4   No
## 34   2nd Female Adult       No    13  2 4   No
## 35   3rd Female Adult       No    89  3 4   No
## 36  Crew Female Adult       No     3  4 4   No
## 37   1st   Male Adult       No   118  5 4   No
## 38   2nd   Male Adult       No   154  6 4   No
## 39   3rd   Male Adult       No   387  7 4   No
## 40  Crew   Male Adult       No   670  8 4   No
## 41   1st Female Child       No     0  9 4   No
## 42   2nd Female Child       No     0 10 4   No
## 43   3rd Female Child       No    17 11 4   No
## 44  Crew Female Child       No     0 12 4   No
## 45   1st   Male Child       No     0 13 4   No
## 46   2nd   Male Child       No     0 14 4   No
## 47   3rd   Male Child       No    35 15 4   No
## 48  Crew   Male Child       No     0 16 4   No
## 49   1st Female Adult      Yes   140 17 4  Yes
## 50   2nd Female Adult      Yes    80 18 4  Yes
## 51   3rd Female Adult      Yes    76 19 4  Yes
## 52  Crew Female Adult      Yes    20 20 4  Yes
## 53   1st   Male Adult      Yes    57 21 4  Yes
## 54   2nd   Male Adult      Yes    14 22 4  Yes
## 55   3rd   Male Adult      Yes    75 23 4  Yes
## 56  Crew   Male Adult      Yes   192 24 4  Yes
## 57   1st Female Child      Yes     1 25 4  Yes
## 58   2nd Female Child      Yes    13 26 4  Yes
## 59   3rd Female Child      Yes    14 27 4  Yes
## 60  Crew Female Child      Yes     0 28 4  Yes
## 61   1st   Male Child      Yes     5 29 4  Yes
## 62   2nd   Male Child      Yes    11 30 4  Yes
## 63   3rd   Male Child      Yes    13 31 4  Yes
## 64  Crew   Male Child      Yes     0 32 4  Yes

g1=ggplot(parallel_data14, aes(x=factor(x, levels = c("Class", "Sex","Age","Survived")), id = id, split = y, value = value)) +
  xlab("Covariates")+
  geom_parallel_sets(aes(fill = Survived),alpha = 0.3, axis.width = 0.2) +
  geom_parallel_sets_axes(axis.width = 0.2) +
  geom_parallel_sets_labels(color = 'white',size=3)
parallel_data124 <- gather_set_data(titanic_freq, c(1,2,4))
parallel_data124

##    Class    Sex   Age Survived value id x      y
## 1    1st Female Adult       No     4  1 1    1st
## 2    2nd Female Adult       No    13  2 1    2nd
## 3    3rd Female Adult       No    89  3 1    3rd
## 4   Crew Female Adult       No     3  4 1   Crew
## 5    1st   Male Adult       No   118  5 1    1st
## 6    2nd   Male Adult       No   154  6 1    2nd
## 7    3rd   Male Adult       No   387  7 1    3rd
## 8   Crew   Male Adult       No   670  8 1   Crew
## 9    1st Female Child       No     0  9 1    1st
## 10   2nd Female Child       No     0 10 1    2nd
## 11   3rd Female Child       No    17 11 1    3rd
## 12  Crew Female Child       No     0 12 1   Crew
## 13   1st   Male Child       No     0 13 1    1st
## 14   2nd   Male Child       No     0 14 1    2nd
## 15   3rd   Male Child       No    35 15 1    3rd
## 16  Crew   Male Child       No     0 16 1   Crew
## 17   1st Female Adult      Yes   140 17 1    1st
## 18   2nd Female Adult      Yes    80 18 1    2nd
## 19   3rd Female Adult      Yes    76 19 1    3rd
## 20  Crew Female Adult      Yes    20 20 1   Crew
## 21   1st   Male Adult      Yes    57 21 1    1st
## 22   2nd   Male Adult      Yes    14 22 1    2nd
## 23   3rd   Male Adult      Yes    75 23 1    3rd
## 24  Crew   Male Adult      Yes   192 24 1   Crew
## 25   1st Female Child      Yes     1 25 1    1st
## 26   2nd Female Child      Yes    13 26 1    2nd
## 27   3rd Female Child      Yes    14 27 1    3rd
## 28  Crew Female Child      Yes     0 28 1   Crew
## 29   1st   Male Child      Yes     5 29 1    1st
## 30   2nd   Male Child      Yes    11 30 1    2nd
## 31   3rd   Male Child      Yes    13 31 1    3rd
## 32  Crew   Male Child      Yes     0 32 1   Crew
## 33   1st Female Adult       No     4  1 2 Female
## 34   2nd Female Adult       No    13  2 2 Female
## 35   3rd Female Adult       No    89  3 2 Female
## 36  Crew Female Adult       No     3  4 2 Female
## 37   1st   Male Adult       No   118  5 2   Male
## 38   2nd   Male Adult       No   154  6 2   Male
## 39   3rd   Male Adult       No   387  7 2   Male
## 40  Crew   Male Adult       No   670  8 2   Male
## 41   1st Female Child       No     0  9 2 Female
## 42   2nd Female Child       No     0 10 2 Female
## 43   3rd Female Child       No    17 11 2 Female
## 44  Crew Female Child       No     0 12 2 Female
## 45   1st   Male Child       No     0 13 2   Male
## 46   2nd   Male Child       No     0 14 2   Male
## 47   3rd   Male Child       No    35 15 2   Male
## 48  Crew   Male Child       No     0 16 2   Male
## 49   1st Female Adult      Yes   140 17 2 Female
## 50   2nd Female Adult      Yes    80 18 2 Female
## 51   3rd Female Adult      Yes    76 19 2 Female
## 52  Crew Female Adult      Yes    20 20 2 Female
## 53   1st   Male Adult      Yes    57 21 2   Male
## 54   2nd   Male Adult      Yes    14 22 2   Male
## 55   3rd   Male Adult      Yes    75 23 2   Male
## 56  Crew   Male Adult      Yes   192 24 2   Male
## 57   1st Female Child      Yes     1 25 2 Female
## 58   2nd Female Child      Yes    13 26 2 Female
## 59   3rd Female Child      Yes    14 27 2 Female
## 60  Crew Female Child      Yes     0 28 2 Female
## 61   1st   Male Child      Yes     5 29 2   Male
## 62   2nd   Male Child      Yes    11 30 2   Male
## 63   3rd   Male Child      Yes    13 31 2   Male
## 64  Crew   Male Child      Yes     0 32 2   Male
## 65   1st Female Adult       No     4  1 4     No
## 66   2nd Female Adult       No    13  2 4     No
## 67   3rd Female Adult       No    89  3 4     No
## 68  Crew Female Adult       No     3  4 4     No
## 69   1st   Male Adult       No   118  5 4     No
## 70   2nd   Male Adult       No   154  6 4     No
## 71   3rd   Male Adult       No   387  7 4     No
## 72  Crew   Male Adult       No   670  8 4     No
## 73   1st Female Child       No     0  9 4     No
## 74   2nd Female Child       No     0 10 4     No
## 75   3rd Female Child       No    17 11 4     No
## 76  Crew Female Child       No     0 12 4     No
## 77   1st   Male Child       No     0 13 4     No
## 78   2nd   Male Child       No     0 14 4     No
## 79   3rd   Male Child       No    35 15 4     No
## 80  Crew   Male Child       No     0 16 4     No
## 81   1st Female Adult      Yes   140 17 4    Yes
## 82   2nd Female Adult      Yes    80 18 4    Yes
## 83   3rd Female Adult      Yes    76 19 4    Yes
## 84  Crew Female Adult      Yes    20 20 4    Yes
## 85   1st   Male Adult      Yes    57 21 4    Yes
## 86   2nd   Male Adult      Yes    14 22 4    Yes
## 87   3rd   Male Adult      Yes    75 23 4    Yes
## 88  Crew   Male Adult      Yes   192 24 4    Yes
## 89   1st Female Child      Yes     1 25 4    Yes
## 90   2nd Female Child      Yes    13 26 4    Yes
## 91   3rd Female Child      Yes    14 27 4    Yes
## 92  Crew Female Child      Yes     0 28 4    Yes
## 93   1st   Male Child      Yes     5 29 4    Yes
## 94   2nd   Male Child      Yes    11 30 4    Yes
## 95   3rd   Male Child      Yes    13 31 4    Yes
## 96  Crew   Male Child      Yes     0 32 4    Yes

g2=ggplot(parallel_data124, aes(x=factor(x, levels = c("Class", "Sex","Age","Survived")), id = id, split = y, value = value)) +
  xlab("Covariates") +
  geom_parallel_sets(aes(fill = Survived), alpha = 0.3, axis.width = 0.2) +
  geom_parallel_sets_axes(axis.width = 0.2) +
  geom_parallel_sets_labels(colour = 'white',size=3)
#grid.arrange(g1,g2,ncol=2)

data <- reshape2::melt(Titanic)
data <- gather_set_data(data, 1:4)

ggplot(data, aes(x, id = id, split = y, value = value)) +
  geom_parallel_sets(aes(fill = Sex), alpha = 0.3, axis.width = 0.1) +
  geom_parallel_sets_axes(axis.width = 0.1) +
  geom_parallel_sets_labels(colour = 'white')

parallel_data1234 <- gather_set_data(titanic_freq, c(1,2,3,4))
parallel_data1234

##     Class    Sex   Age Survived value id x      y
## 1     1st Female Adult       No     4  1 1    1st
## 2     2nd Female Adult       No    13  2 1    2nd
## 3     3rd Female Adult       No    89  3 1    3rd
## 4    Crew Female Adult       No     3  4 1   Crew
## 5     1st   Male Adult       No   118  5 1    1st
## 6     2nd   Male Adult       No   154  6 1    2nd
## 7     3rd   Male Adult       No   387  7 1    3rd
## 8    Crew   Male Adult       No   670  8 1   Crew
## 9     1st Female Child       No     0  9 1    1st
## 10    2nd Female Child       No     0 10 1    2nd
## 11    3rd Female Child       No    17 11 1    3rd
## 12   Crew Female Child       No     0 12 1   Crew
## 13    1st   Male Child       No     0 13 1    1st
## 14    2nd   Male Child       No     0 14 1    2nd
## 15    3rd   Male Child       No    35 15 1    3rd
## 16   Crew   Male Child       No     0 16 1   Crew
## 17    1st Female Adult      Yes   140 17 1    1st
## 18    2nd Female Adult      Yes    80 18 1    2nd
## 19    3rd Female Adult      Yes    76 19 1    3rd
## 20   Crew Female Adult      Yes    20 20 1   Crew
## 21    1st   Male Adult      Yes    57 21 1    1st
## 22    2nd   Male Adult      Yes    14 22 1    2nd
## 23    3rd   Male Adult      Yes    75 23 1    3rd
## 24   Crew   Male Adult      Yes   192 24 1   Crew
## 25    1st Female Child      Yes     1 25 1    1st
## 26    2nd Female Child      Yes    13 26 1    2nd
## 27    3rd Female Child      Yes    14 27 1    3rd
## 28   Crew Female Child      Yes     0 28 1   Crew
## 29    1st   Male Child      Yes     5 29 1    1st
## 30    2nd   Male Child      Yes    11 30 1    2nd
## 31    3rd   Male Child      Yes    13 31 1    3rd
## 32   Crew   Male Child      Yes     0 32 1   Crew
## 33    1st Female Adult       No     4  1 2 Female
## 34    2nd Female Adult       No    13  2 2 Female
## 35    3rd Female Adult       No    89  3 2 Female
## 36   Crew Female Adult       No     3  4 2 Female
## 37    1st   Male Adult       No   118  5 2   Male
## 38    2nd   Male Adult       No   154  6 2   Male
## 39    3rd   Male Adult       No   387  7 2   Male
## 40   Crew   Male Adult       No   670  8 2   Male
## 41    1st Female Child       No     0  9 2 Female
## 42    2nd Female Child       No     0 10 2 Female
## 43    3rd Female Child       No    17 11 2 Female
## 44   Crew Female Child       No     0 12 2 Female
## 45    1st   Male Child       No     0 13 2   Male
## 46    2nd   Male Child       No     0 14 2   Male
## 47    3rd   Male Child       No    35 15 2   Male
## 48   Crew   Male Child       No     0 16 2   Male
## 49    1st Female Adult      Yes   140 17 2 Female
## 50    2nd Female Adult      Yes    80 18 2 Female
## 51    3rd Female Adult      Yes    76 19 2 Female
## 52   Crew Female Adult      Yes    20 20 2 Female
## 53    1st   Male Adult      Yes    57 21 2   Male
## 54    2nd   Male Adult      Yes    14 22 2   Male
## 55    3rd   Male Adult      Yes    75 23 2   Male
## 56   Crew   Male Adult      Yes   192 24 2   Male
## 57    1st Female Child      Yes     1 25 2 Female
## 58    2nd Female Child      Yes    13 26 2 Female
## 59    3rd Female Child      Yes    14 27 2 Female
## 60   Crew Female Child      Yes     0 28 2 Female
## 61    1st   Male Child      Yes     5 29 2   Male
## 62    2nd   Male Child      Yes    11 30 2   Male
## 63    3rd   Male Child      Yes    13 31 2   Male
## 64   Crew   Male Child      Yes     0 32 2   Male
## 65    1st Female Adult       No     4  1 3  Adult
## 66    2nd Female Adult       No    13  2 3  Adult
## 67    3rd Female Adult       No    89  3 3  Adult
## 68   Crew Female Adult       No     3  4 3  Adult
## 69    1st   Male Adult       No   118  5 3  Adult
## 70    2nd   Male Adult       No   154  6 3  Adult
## 71    3rd   Male Adult       No   387  7 3  Adult
## 72   Crew   Male Adult       No   670  8 3  Adult
## 73    1st Female Child       No     0  9 3  Child
## 74    2nd Female Child       No     0 10 3  Child
## 75    3rd Female Child       No    17 11 3  Child
## 76   Crew Female Child       No     0 12 3  Child
## 77    1st   Male Child       No     0 13 3  Child
## 78    2nd   Male Child       No     0 14 3  Child
## 79    3rd   Male Child       No    35 15 3  Child
## 80   Crew   Male Child       No     0 16 3  Child
## 81    1st Female Adult      Yes   140 17 3  Adult
## 82    2nd Female Adult      Yes    80 18 3  Adult
## 83    3rd Female Adult      Yes    76 19 3  Adult
## 84   Crew Female Adult      Yes    20 20 3  Adult
## 85    1st   Male Adult      Yes    57 21 3  Adult
## 86    2nd   Male Adult      Yes    14 22 3  Adult
## 87    3rd   Male Adult      Yes    75 23 3  Adult
## 88   Crew   Male Adult      Yes   192 24 3  Adult
## 89    1st Female Child      Yes     1 25 3  Child
## 90    2nd Female Child      Yes    13 26 3  Child
## 91    3rd Female Child      Yes    14 27 3  Child
## 92   Crew Female Child      Yes     0 28 3  Child
## 93    1st   Male Child      Yes     5 29 3  Child
## 94    2nd   Male Child      Yes    11 30 3  Child
## 95    3rd   Male Child      Yes    13 31 3  Child
## 96   Crew   Male Child      Yes     0 32 3  Child
## 97    1st Female Adult       No     4  1 4     No
## 98    2nd Female Adult       No    13  2 4     No
## 99    3rd Female Adult       No    89  3 4     No
## 100  Crew Female Adult       No     3  4 4     No
## 101   1st   Male Adult       No   118  5 4     No
## 102   2nd   Male Adult       No   154  6 4     No
## 103   3rd   Male Adult       No   387  7 4     No
## 104  Crew   Male Adult       No   670  8 4     No
## 105   1st Female Child       No     0  9 4     No
## 106   2nd Female Child       No     0 10 4     No
## 107   3rd Female Child       No    17 11 4     No
## 108  Crew Female Child       No     0 12 4     No
## 109   1st   Male Child       No     0 13 4     No
## 110   2nd   Male Child       No     0 14 4     No
## 111   3rd   Male Child       No    35 15 4     No
## 112  Crew   Male Child       No     0 16 4     No
## 113   1st Female Adult      Yes   140 17 4    Yes
## 114   2nd Female Adult      Yes    80 18 4    Yes
## 115   3rd Female Adult      Yes    76 19 4    Yes
## 116  Crew Female Adult      Yes    20 20 4    Yes
## 117   1st   Male Adult      Yes    57 21 4    Yes
## 118   2nd   Male Adult      Yes    14 22 4    Yes
## 119   3rd   Male Adult      Yes    75 23 4    Yes
## 120  Crew   Male Adult      Yes   192 24 4    Yes
## 121   1st Female Child      Yes     1 25 4    Yes
## 122   2nd Female Child      Yes    13 26 4    Yes
## 123   3rd Female Child      Yes    14 27 4    Yes
## 124  Crew Female Child      Yes     0 28 4    Yes
## 125   1st   Male Child      Yes     5 29 4    Yes
## 126   2nd   Male Child      Yes    11 30 4    Yes
## 127   3rd   Male Child      Yes    13 31 4    Yes
## 128  Crew   Male Child      Yes     0 32 4    Yes

g3=ggplot(parallel_data1234, aes(x=factor(x, levels = c("Class", "Sex","Age","Survived")), id = id, split = y, value = value)) +
  xlab("Covariates")+
  geom_parallel_sets(aes(fill = Survived), alpha = 0.3, axis.width = 0.2) +
  geom_parallel_sets_axes(axis.width = 0.2) +
  geom_parallel_sets_labels(color = 'white',size=3)
parallel_data123 <- gather_set_data(titanic_freq, c(1,2,3))
parallel_data123

##    Class    Sex   Age Survived value id x      y
## 1    1st Female Adult       No     4  1 1    1st
## 2    2nd Female Adult       No    13  2 1    2nd
## 3    3rd Female Adult       No    89  3 1    3rd
## 4   Crew Female Adult       No     3  4 1   Crew
## 5    1st   Male Adult       No   118  5 1    1st
## 6    2nd   Male Adult       No   154  6 1    2nd
## 7    3rd   Male Adult       No   387  7 1    3rd
## 8   Crew   Male Adult       No   670  8 1   Crew
## 9    1st Female Child       No     0  9 1    1st
## 10   2nd Female Child       No     0 10 1    2nd
## 11   3rd Female Child       No    17 11 1    3rd
## 12  Crew Female Child       No     0 12 1   Crew
## 13   1st   Male Child       No     0 13 1    1st
## 14   2nd   Male Child       No     0 14 1    2nd
## 15   3rd   Male Child       No    35 15 1    3rd
## 16  Crew   Male Child       No     0 16 1   Crew
## 17   1st Female Adult      Yes   140 17 1    1st
## 18   2nd Female Adult      Yes    80 18 1    2nd
## 19   3rd Female Adult      Yes    76 19 1    3rd
## 20  Crew Female Adult      Yes    20 20 1   Crew
## 21   1st   Male Adult      Yes    57 21 1    1st
## 22   2nd   Male Adult      Yes    14 22 1    2nd
## 23   3rd   Male Adult      Yes    75 23 1    3rd
## 24  Crew   Male Adult      Yes   192 24 1   Crew
## 25   1st Female Child      Yes     1 25 1    1st
## 26   2nd Female Child      Yes    13 26 1    2nd
## 27   3rd Female Child      Yes    14 27 1    3rd
## 28  Crew Female Child      Yes     0 28 1   Crew
## 29   1st   Male Child      Yes     5 29 1    1st
## 30   2nd   Male Child      Yes    11 30 1    2nd
## 31   3rd   Male Child      Yes    13 31 1    3rd
## 32  Crew   Male Child      Yes     0 32 1   Crew
## 33   1st Female Adult       No     4  1 2 Female
## 34   2nd Female Adult       No    13  2 2 Female
## 35   3rd Female Adult       No    89  3 2 Female
## 36  Crew Female Adult       No     3  4 2 Female
## 37   1st   Male Adult       No   118  5 2   Male
## 38   2nd   Male Adult       No   154  6 2   Male
## 39   3rd   Male Adult       No   387  7 2   Male
## 40  Crew   Male Adult       No   670  8 2   Male
## 41   1st Female Child       No     0  9 2 Female
## 42   2nd Female Child       No     0 10 2 Female
## 43   3rd Female Child       No    17 11 2 Female
## 44  Crew Female Child       No     0 12 2 Female
## 45   1st   Male Child       No     0 13 2   Male
## 46   2nd   Male Child       No     0 14 2   Male
## 47   3rd   Male Child       No    35 15 2   Male
## 48  Crew   Male Child       No     0 16 2   Male
## 49   1st Female Adult      Yes   140 17 2 Female
## 50   2nd Female Adult      Yes    80 18 2 Female
## 51   3rd Female Adult      Yes    76 19 2 Female
## 52  Crew Female Adult      Yes    20 20 2 Female
## 53   1st   Male Adult      Yes    57 21 2   Male
## 54   2nd   Male Adult      Yes    14 22 2   Male
## 55   3rd   Male Adult      Yes    75 23 2   Male
## 56  Crew   Male Adult      Yes   192 24 2   Male
## 57   1st Female Child      Yes     1 25 2 Female
## 58   2nd Female Child      Yes    13 26 2 Female
## 59   3rd Female Child      Yes    14 27 2 Female
## 60  Crew Female Child      Yes     0 28 2 Female
## 61   1st   Male Child      Yes     5 29 2   Male
## 62   2nd   Male Child      Yes    11 30 2   Male
## 63   3rd   Male Child      Yes    13 31 2   Male
## 64  Crew   Male Child      Yes     0 32 2   Male
## 65   1st Female Adult       No     4  1 3  Adult
## 66   2nd Female Adult       No    13  2 3  Adult
## 67   3rd Female Adult       No    89  3 3  Adult
## 68  Crew Female Adult       No     3  4 3  Adult
## 69   1st   Male Adult       No   118  5 3  Adult
## 70   2nd   Male Adult       No   154  6 3  Adult
## 71   3rd   Male Adult       No   387  7 3  Adult
## 72  Crew   Male Adult       No   670  8 3  Adult
## 73   1st Female Child       No     0  9 3  Child
## 74   2nd Female Child       No     0 10 3  Child
## 75   3rd Female Child       No    17 11 3  Child
## 76  Crew Female Child       No     0 12 3  Child
## 77   1st   Male Child       No     0 13 3  Child
## 78   2nd   Male Child       No     0 14 3  Child
## 79   3rd   Male Child       No    35 15 3  Child
## 80  Crew   Male Child       No     0 16 3  Child
## 81   1st Female Adult      Yes   140 17 3  Adult
## 82   2nd Female Adult      Yes    80 18 3  Adult
## 83   3rd Female Adult      Yes    76 19 3  Adult
## 84  Crew Female Adult      Yes    20 20 3  Adult
## 85   1st   Male Adult      Yes    57 21 3  Adult
## 86   2nd   Male Adult      Yes    14 22 3  Adult
## 87   3rd   Male Adult      Yes    75 23 3  Adult
## 88  Crew   Male Adult      Yes   192 24 3  Adult
## 89   1st Female Child      Yes     1 25 3  Child
## 90   2nd Female Child      Yes    13 26 3  Child
## 91   3rd Female Child      Yes    14 27 3  Child
## 92  Crew Female Child      Yes     0 28 3  Child
## 93   1st   Male Child      Yes     5 29 3  Child
## 94   2nd   Male Child      Yes    11 30 3  Child
## 95   3rd   Male Child      Yes    13 31 3  Child
## 96  Crew   Male Child      Yes     0 32 3  Child

g4=ggplot(parallel_data123, aes(x=factor(x, levels = c("Class", "Sex","Age","Survived")), id = id, split = y, value = value)) +
  xlab("Covariates") +
  geom_parallel_sets(aes(fill = Survived), alpha = 0.3, axis.width = 0.2) +
  geom_parallel_sets_axes(axis.width = 0.2) +
  geom_parallel_sets_labels(colour = 'white',size=3)
grid.arrange(g3,g4,ncol=2)

## Warning: Computation failed in `stat_parallel_sets()`
## Caused by error in `FUN()`:
## ! id must be unique within axes

## Warning: Computation failed in `stat_parallel_sets_axes()`
## Caused by error in `FUN()`:
## ! id must be unique within axes

## Warning: Computation failed in `stat_parallel_sets_axes()`
## Caused by error in `FUN()`:
## ! id must be unique within axes

## Warning: Computation failed in `stat_parallel_sets()`
## Caused by error in `FUN()`:
## ! id must be unique within axes

## Warning: Computation failed in `stat_parallel_sets_axes()`
## Caused by error in `FUN()`:
## ! id must be unique within axes

## Warning: Computation failed in `stat_parallel_sets_axes()`
## Caused by error in `FUN()`:
## ! id must be unique within axes

Note that the data is now organized in a way that can be directly plot. The last four columns are the most important part. Note there are 32 different groups of passengers, i.e., 32 group = 4 class * 2 genders * 2 ages * 2 survival status. Therefore, the variable id marks these 32 groups. For each group, the variable value stores how many passengers in this group. The variables x and y indicates the group label, i.e., whether is a male group or female group, child group or adult group. This whole process is repeated four times because we have in total four categorical variables.

The figure above describes the data and the number of passengers in each category. In addition, it displays how two category variables interact with each other. For example, age and class, class and gender, and gender and survival.

However, what if we want to see how survival interacts with all other variables? Since the survival status is the most important variable, we can move it to another dimension, the color of the bands.

2 Visualization of Composition

2.1 Tree Map

Treemaps display hierarchical (tree-structured) data as a set of nested rectangles. Each branch of the tree is given a rectangle, which is then tiled with smaller rectangles representing subbranches. A leaf node’s rectangle has an area proportional to a specified dimension of the data.Often the leaf nodes are colored to show a separate dimension of the data.

Dataset: We are using the data set about Indonesia’s mobile phone market sales in the first half of 2020. The data is available at: https://www.kaggle.com/kurniakh/marketplace-data

phone<-read.csv("data/phone.csv")
dim(phone)

## [1] 361   6

phone[seq(1,nrow(phone),by=10),]

##                                   type     rmb  sold   brand    region    lnrmb
## 1                         Apple iPhone 3459.90   597   Apple not-china 8.148995
## 11                      Apple iPhone 6 1742.90  6736   Apple not-china 7.463306
## 21                     Apple iPhone XR 5584.20  8067   Apple not-china 8.627696
## 31     Asus Zenfone 4 Max Plus ZC554KL 1207.39   568    ASUS     china 7.096216
## 41              Asus Zenfone 6 ZS630KL 3535.03   503    ASUS     china 8.170477
## 51         Asus Zenfone Selfie ZD551KL 1206.00   147    ASUS     china 7.095064
## 61                  Huawei Ascend Y540  674.88   110  Huawei     china 6.514535
## 71                       Huawei nova 2 1491.49   650  Huawei     china 7.307531
## 81         Huawei P30 lite New Edition 1062.18  1304  Huawei     china 6.968079
## 91                       Infinix Hot 5  769.43   120 Infinix     china 6.645650
## 101                         Infinix S5 1010.63  3038 Infinix     china 6.918329
## 111                        Nokia 3.1 A  960.46   138   Nokia not-china 6.867412
## 121                    Oppo A31 (2020) 1222.82  1264    OPPO     china 7.108915
## 131                           Oppo A7x 2026.08    49    OPPO     china 7.613858
## 141                            Oppo F5 1304.81  9683    OPPO     china 7.173813
## 151                         Oppo Neo 7 1447.20   235    OPPO     china 7.277386
## 161                           Realme 3  961.18  1688  Realme     china 6.868162
## 171                 Samsung Galaxy A10  826.96 38036 Samsung not-china 6.717756
## 181                 Samsung Galaxy A50 1904.87 35262 Samsung not-china 7.552169
## 191                 Samsung Galaxy A70 2384.87 22884 Samsung not-china 7.776900
## 201       Samsung Galaxy Express Prime  258.08    20 Samsung not-china 5.553270
## 211      Samsung Galaxy J2 Core (2020)  584.67   470 Samsung not-china 6.371048
## 221                  Samsung Galaxy J7 1567.39   516 Samsung not-china 7.357167
## 231                 Samsung Galaxy M30 1366.03  7412 Samsung not-china 7.219664
## 241            Samsung Galaxy S10 Lite 4050.50   728 Samsung not-china 8.306596
## 251             Samsung Galaxy S7 edge 2392.92  2661 Samsung not-china 7.780270
## 261                 vivo V11 (V11 Pro) 1636.07 27259    vivo     china 7.400052
## 271                            vivo V7 1342.66   330    vivo     china 7.202408
## 281                           vivo Y53  771.79    69    vivo     china 6.648712
## 291                         vivo Z1Pro 1523.47 15567    vivo     china 7.328746
## 301               Xiaomi Mi 8 Explorer 2966.76   122  Xiaomi     china 7.995226
## 311                      Xiaomi Mi Max 2714.95   407  Xiaomi     china 7.906529
## 321                     Xiaomi Mi Play 1291.49  1970  Xiaomi     china 7.163552
## 331                Xiaomi Redmi 4 (4X)  878.08  3438  Xiaomi     china 6.777738
## 341               Xiaomi Redmi 8A Dual  313.46   829  Xiaomi     china 5.747672
## 351 Xiaomi Redmi Note 5 AI Dual Camera 1077.19  5448  Xiaomi     china 6.982111
## 361          Xiaomi Redmi Y1 (Note 5A)  987.64  3882  Xiaomi     china 6.895318

Package

library(tidyverse)
library(treemapify)

Tree map

ggplot(phone, 
       aes(area = sold,
           subgroup = region,
           subgroup2 = brand,
           subgroup3 = type,
           fill=lnrmb))+
  geom_treemap()+
  geom_treemap_subgroup3_border(color="white",size=1)+
  geom_treemap_subgroup2_border(color="red",size=2)+
  geom_treemap_subgroup_border(color="blue",size=3)+
  #geom_treemap_subgroup_text(place = "centre", grow = TRUE, alpha = 0.5, colour ="white")+
  geom_treemap_subgroup2_text(place = "bottom", grow = TRUE, alpha = 0.3, colour ="red")+
  geom_treemap_text(aes(label=type),colour = "white", place = "topleft", reflow = TRUE, size=10)+
  scale_fill_distiller(palette="Blues",name="phone\nprice\n(RMB)", breaks = log(c(250, 500, 1000, 2000, 4000, 8000, 16000)), labels = c(250, 500, 1000, 2000, 4000, 8000, 16000))

area=sold，the area is the sales volume of the mobile phone.

fill=lnrmb，the shade of the color represents the price (the price takes the logarithm ) of the phone. The cheaper price, the darker color.

subgroup=region，According to the origin of mobile phone brands, it is divided into two types: China (left of the black line) and not-China. As we can be seen from the figure, the Indonesian mobile phone market does not have local brands. Most of them are Chinese brands.

subgroup2=brand, Chinese brands include Xiaomi, oppo, vivo, realme, etc., and non-Chinese brands include Apple, Samsung and Nokia.

subgroup3=type, based on the type of phone models, the most popular models in Indonesia are mainly cheaper models.

We can use tree map to visualize the titanic data set, however, it is less efficient than mosaic plot.

titanic_df=as.data.frame(table(titanic[,c("Class","Sex","Survived")]))
ggplot(titanic_df, 
       aes(area = Freq,
           subgroup = Class,
           subgroup2 = Sex,
           subgroup3 = Survived,
           label=paste("Class",Class,Sex,"Survived",Survived)))+
  geom_treemap()+
  geom_treemap_subgroup3_border(color="yellow",size=1)+
  geom_treemap_subgroup2_border(color="red",size=3)+
  geom_treemap_subgroup_border(color="blue",size=5)+
  geom_treemap_subgroup_text(place = "centre", grow = TRUE, alpha = 0.2, colour ="blue",fontface = "bold")+
  geom_treemap_subgroup2_text(place = "bottom", grow = FALSE, alpha = 0.2, colour ="red",fontface = "italic")+
  #geom_treemap_subgroup3_text(place = "top", grow = FALSE, alpha = 0.2, colour ="white",fontface = "italic")+
  geom_treemap_text(colour = "yellow", place = "topleft", reflow = FALSE,size=10)
# alternatively
titanic_df2=as.data.frame(table(titanic))
ggplot(titanic_df2, 
       aes(area = Freq,
           subgroup = Class,
           subgroup2 = Sex,
           subgroup3 = Age,
           subgroup4 = Survived,
           label=paste("Class",Class,Sex,Age,"Survived",Survived)))+
  geom_treemap()+
  geom_treemap_subgroup3_border(color="yellow",size=1)+
  geom_treemap_subgroup2_border(color="red",size=3)+
  geom_treemap_subgroup_border(color="blue",size=5)+
  geom_treemap_subgroup_text(place = "centre", grow = TRUE, alpha = 0.2, colour ="blue",fontface = "bold")+
  geom_treemap_subgroup2_text(place = "bottom", grow = FALSE, alpha = 0.2, colour ="red",fontface = "italic")+
  geom_treemap_subgroup3_text(place = "left", grow = FALSE, alpha = 0.2, colour ="yellow",fontface = "italic",size=20)+
  geom_treemap_text(colour = "white", place = "topleft", reflow = FALSE,size=10)

2.2 Sunburst Chart

A sunburst chart is typically used to visualize hierarchical data structures. A sunburst chart is also called wedge stack graph, radial hierarchy, circular bar plot, ring chart, multi-level pie chart, and radial treemap. The sunburst chart consists of an inner circle surrounded by rings of deeper hierarchy levels. The angle of each segment is either proportional to a value or divided equally under its parent node. All segments in sunburst charts may be colored according to which category or hierarchy level they belong to.

library(tidyverse)
library(readxl)
library(sunburstR)
library(RColorBrewer)

Data set: We visualize the data set on Indonesia’s mobile phone market sales in the first half of 2020 (https://www.kaggle.com/kurniakh/marketplace-data).

phone=read.csv("data/phone_sunburst.csv",header = TRUE)
phone_sun=phone[,c("PhoneModel","BrandCountry","Brand","Sold")]
phone_sun$Category=paste(phone$BrandCountry,phone$Brand,phone$PhoneModel,sep="-")
phone_sun=phone_sun[,c("Category","Sold")]
sund2b(phone_sun,
       color=colorRampPalette(brewer.pal(11,"Set3"))(35),              #add new colors
         showLabels = FALSE, 
         rootLabel = "Total Phone Sold in Indonesia 2020")

2.3 Ternary Plot

A ternary plot, ternary graph, triangle plot, simplex plot, Gibbs triangle or de Finetti diagram is a barycentric plot on three variables which sum to a constant. It graphically depicts the ratios of the three variables as positions in an equilateral triangle. It is used in physical chemistry, petrology, mineralogy, metallurgy, and other physical sciences to show the compositions of systems composed of three species. In population genetics, it is often called a de Finetti diagram. In game theory, it is often called a simplex plot. Ternary plots are tools for analyzing compositional data in the three-dimensional case.

library(ggtern)

We now use a simple example generating from uniform distribution U(0,1) to show a ternary plot

df=data.frame(prop1=0.1,prop2=0.3,prop3=0.6)
df2=data.frame(prop1=0.05,prop2=0.03,prop3=0.92)
df=rbind(df,df2)
g1=ggtern(data=df,mapping=aes(x=prop1,y=prop2,z=prop3))+
  geom_point(size=2)+
  geom_Tline(Tintercept=c(0.3))+
  geom_Lline(Lintercept=c(0.1))+
  geom_Rline(Rintercept=c(0.6))
g1

USDA textural classification chart

We are using the “USDA” data set from package ggtern which is issued by the United States Department of Agriculture (USDA) in the form of a ternary diagram.

data("USDA", package = "ggtern")
dfLabels <- plyr::ddply(USDA, "Label", function(df) {
   label <- as.character(df$Label[ 1 ])
  df$Angle <- switch(label, "Loamy Sand" = -35, 0)
  colMeans(df[setdiff(colnames(df), "Label")])
   })
f5a<-ggtern(data = USDA, mapping = aes(x = Sand, y = Clay, z = Silt))+ #three axes for Sand、Clay、Silt
  geom_polygon(mapping = aes(fill = Label),
                 alpha = 0.75, size = 0.5, color = "black")+ #add polygons，set the fill color based on Label，transparency as 0.75，size as 0.5，the color of the edge is black
  geom_text(data = dfLabels, mapping = aes(label = Label, angle = Angle),
              size = 2.5) +#add text which contains dfLabels's Label，Angle as tne angle
     theme_rgbw() + #R default background
     theme_showsecondary() +#R default background with Scale
     theme_showarrows() +
     custom_percent("Percent") +#add'percent'
     guides(color = "none", fill = "none")+ 
  labs(title = "USDA Textural Classification Chart",#title it。。。
          fill = "Textural Class", #the name of color for filling
       color = "Textural Class")# the name of color for edges

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

print(f5a)

2.4 Pie Chart

We can use the functions in ggplot2 package to draw pie chart. We are using the “content_rating” in the dataset of American movies. This variable is about film ratings.

library(ggsci)
cinema=read.csv("data/cinema.csv",header = T)
df <- as.data.frame(table(cinema$content_rating))
df # Generating the table of content_rating

##    Var1 Freq
## 1     G   88
## 2    OR 1136
## 3    PG  655
## 4 PG-13 1597
## 5     R 2138

df = df[order(df$Freq, decreasing = TRUE),]   ## order the Freq variable in the dataset
Label = as.vector(df$Var1)
Label = paste(Label, "(", round(df$Freq / sum(df$Freq) * 100, 2), "%)        ", sep = "")   
df = df[order(df$Freq, decreasing = TRUE),]  
g1=ggplot(df, aes(x="", y=reorder(Var1,Freq), fill=Var1)) +
  geom_bar(stat="identity",width=1) + 
  coord_polar("y")+  
  labs(x = "", y = "", title = "") +  
  scale_fill_lancet()+ 
  theme(axis.text = element_blank(),
        axis.ticks = element_blank(),
        legend.position = "right")+ 
  geom_text(aes(label = Label), position = position_stack(vjust = 0.5)) + #Add percentage
  ggtitle("IMDB MPAA Rating")

## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.

g2=ggplot(df,aes(x=Var1,y=Freq, fill=Var1))+
  geom_bar(stat="identity",width = 0.8)+
  scale_fill_lancet()
grid.arrange(g1,g2,ncol=2,widths = c(2, 1))

2.5 Donut Chart

library(sqldf)

Single Donut Chart

We use geom_rect() function from ggplot2 package to draw the plot as rectangle then the coord_polar(theta=“y”) can transfer the plot to circle. And we can control the size of the circle by adjusting the x-axis.

The data set we use is the information of American universities.

m<-read.csv("data/college.csv")
summary(m[,c(5:8)])

##     region          highest_degree       control             gender         
##  Length:1269        Length:1269        Length:1269        Length:1269       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character

data1<-data.frame(category=c("Associate","Bachelor","Graduate"),#build the data frame
                  count=c(20,200,1049))
data1$fraction = data1$count / sum(data1$count)#calculate the percentage
data1$ymax = cumsum(data1$fraction)
data1$ymin = c(0, head(data1$ymax, n=-1))
data1$labelPosition <- (data1$ymax + data1$ymin) / 2 #position of the label 
data1$label <- paste0(data1$category, "\n ", data1$count,"\n","(",round(data1$fraction*100,1),"%)")
ggplot(data1, aes(ymax=ymax, ymin=ymin, xmax=4, xmin=3, fill=category)) +
  geom_rect() + #draw the plot as rectangle
  geom_text( x=1.8, aes(y=labelPosition, label=label, color=category), size=3.5) +
  scale_fill_brewer(palette=10) +#color of fillment
  scale_color_brewer(palette=5) +#color of title
  coord_polar(theta="y") +#change to circle
  xlim(c(-1, 4)) +#cut the center of the circle
  labs(title = "Distribution of the highest degree in universities (The count and percentage)")+
  theme_void() +
  theme(legend.position = "none")

## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.

Multiple Donut Chart

Now we want to show the distribution of private and public universities in different areas.

data2<-sqldf("select region,control,count(control) as control_count from m group by region,control")
data3<-sqldf("select region,count(control) as control_sum from m group by region")
data4<-merge(data2,data3,by= "region")
data4$control_percent<-data4$control_count/data4$control_sum
data4$label1<-c("Midwest"," ","Northeast"," ","South"," ","West"," ")
data4$label2<-paste(data4$control,":\n",data4$control_count,"\n(",round(data4$control_percent*100,1),"%)")

Show the data frame

head(data4)

##      region control control_count control_sum control_percent    label1
## 1   Midwest Private           246         353       0.6968839   Midwest
## 2   Midwest  Public           107         353       0.3031161          
## 3 Northeast Private           185         299       0.6187291 Northeast
## 4 Northeast  Public           114         299       0.3812709          
## 5     South Private           250         459       0.5446623     South
## 6     South  Public           209         459       0.4553377          
##                        label2
## 1 Private :\n 246 \n( 69.7 %)
## 2  Public :\n 107 \n( 30.3 %)
## 3 Private :\n 185 \n( 61.9 %)
## 4  Public :\n 114 \n( 38.1 %)
## 5 Private :\n 250 \n( 54.5 %)
## 6  Public :\n 209 \n( 45.5 %)

ggplot(data4, aes(x = region, y = control_percent, fill = control)) +
  geom_col() + # 
  geom_text(aes(label=label2),size=2.2,position=position_stack(vjust = 0.5))+
  geom_text(aes(label=label1),size=5,position = position_fill())+
  scale_x_discrete(limits = c(" "," "," ", "Midwest","Northeast","South","West")) +  #cut the center of the circle
  scale_fill_brewer(palette=3) +
  coord_polar("y")+                                                                  #change to circle
  labs(title = "Distribution of Private and Public Universities")+
  theme_void()+
  theme(legend.position = "none")

## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.

3 Visualization of Change

3.1 Slope Graph

A slope graph is a lot like a line graph, but it plots only the change between two time points, without any regard for the points in between. It is based on the idea that humans are good at interpreting changes in directions, i.e., slopes.

Package

library(tidyverse)
library(ggrepel)
library(RColorBrewer)

We are using the data set from the Fortune magazine, which describes the change of Revenue of top 10 Fortune 500 companies in 2018 and 2019.

df <- read.csv("data/top10.csv",header = T)
Name <- factor(df$Name)
df$Revenue <- df$Revenue/1000

q <- ggplot(df,
            aes(x = Year,
                y = Revenue,
                color = Name,
                group = Name)) +  
  geom_line(size=1) +
  geom_point(size=2) + 
  scale_x_continuous(breaks = c(2018, 2019),
                     labels = c(2018, 2019),
                     position = "top") +
 labs(title= "Revenue of the world's top 10 companies\n(in Million Dollars)") +
  theme(aspect.ratio = 2)
q

We add the text to the slopes.

q <- ggplot(df,
            aes(x=Year,
                y=Revenue,
                group=Name,
                color=Name)) +  
  geom_line(size=1) +
  geom_point(size=2) +
  labs(title= "2018-2019 Revenue of\nworld's top 10 companies",
       subtitle="(in Billions)") +
  scale_x_continuous(name = "",
                     position = "top",
                     breaks = c(2018, 2019),
                     labels = c("2018", "2019"),
                     limits = c(2018, 2022)) +
  scale_y_continuous(breaks = seq(200, 600, 100),
                     labels = format(seq(200, 600, 100), scientific = FALSE),
                     limits = c(200, 600)) +
  theme(legend.position = "none",
        aspect.ratio = 1.25,
        panel.background = element_rect(fill = "white")) +
  geom_text_repel(data = df %>% filter(Year == "2019"), 
                  aes(label = Name) , 
                  hjust = "left", 
                  fontface = "bold", 
                  size = 2.5, 
                  nudge_x = 0.3, 
                  direction = "y") +
   scale_color_viridis_d()
q

3.2 Arrow Plot

New York Times’ “Where the 1 Percent Have Gained the Most”

df %>% 
  mutate(Name = fct_reorder(Name, Revenue, min)) %>%
  arrange(Year) %>%
  ggplot() +
  geom_path(aes(x = Revenue, y = Name),
            arrow = arrow(length=unit(0.2,"cm"), type = "closed")) +
  geom_text(aes(x = Revenue, y = Name, label = round(Revenue),
                hjust = ifelse(Year == 2018, 1.4, -0.4))) +
  geom_text(data = df %>% group_by(Name) %>% summarize(ave_Revenue = mean(Revenue)),
            aes(x = ave_Revenue, y = Name, label = Name),
            vjust = 2,
            size = 2) +
  coord_cartesian(xlim = c(200, 600)) +
  scale_x_continuous(breaks = seq(200, 600, 100),
                     labels = format(seq(200, 600, 100), scientific = FALSE))

## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.

3.3 Nightingale Rose Chart/Radial Column Chart/Radial Barplot

In 1858 nurse, statistician, and reformer Florence NightingaleOffsite Link published Notes on Matters Affecting the Health, Efficiency, and Hospital Administration of the British Army. Founded Chiefly on the Experience of the Late War. Presented by Request to the Secretary of State for War. This privately printed work contained a color statistical graphic entitled “Diagram of the Causes of Mortality in the Army of the EastOffsite Link” which showed that epidemic disease, which was responsible for more British deaths in the course of the Crimean War than battlefield wounds, could be controlled by a variety of factors including nutrition, ventilation, and shelter. The graphic, which Nightingale used as a way to explain complex statistics simply, clearly, and persuasively, has become known as Nightingale’s Rose chart.

## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.

Dataset

Period time

We can use the circle in the Nightingale’s rose chart to represent a period time. The data set we used is the number of airport passengers in Taiwan for each month in 2018.

tw=read.csv("data/taiwanairport2018.csv",header = T)
month <- factor(tw$month)
twrose=ggplot(tw, aes(x=reorder(month, monthnum), y=passengers,fill = passengers)) +  
  geom_col(width = 1, color = 'white') +  
  scale_fill_gradientn(colors = c("yellow","orange","red")) +
  coord_polar()  +
  theme_bw()+  
  theme(
    panel.grid = element_blank(),
    panel.border= element_blank(),
    axis.text.y = element_blank(),
    axis.text.x = element_blank(),
    axis.ticks = element_blank(),
    axis.title = element_blank() 
  )   +
  labs(
    title= "The number of passengers in Taiwan's airport for each month in 2018"
  )+
  guides(fill=F)+  
 geom_text(aes(label = paste(month, passengers)),color = "dark blue",vjust = "left", hjust = "outward",fontface="bold",  size = 3.5)

## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.

twrose

A Radial column Chart is simply a Bar Chart plotted on a polar coordinate system, rather than on a Cartesian one.

Package

library(tidyverse)
library(ggthemes)

Dataset

We are using the 2015.csv which is about the number of cardiovascular patient in Guilin city of 2015

gui<-read.csv("data/2015.csv")
data1<-aggregate(gui[,(12:14)],list(gui$month),sum)
data1<-data1 %>% gather("gender","value",-c(1,2)) 
head(data1)

##   Group.1  cvd gender value
## 1       1 2399   cvdM  1391
## 2       2 2130   cvdM  1232
## 3       3 2873   cvdM  1651
## 4       4 2414   cvdM  1437
## 5       5 2522   cvdM  1436
## 6       6 2102   cvdM  1250

Draw the Radial Column Chart

ggplot(data1,aes(as.factor(Group.1),value,fill=as.factor(gender)))+
  geom_col(color="black",position=position_dodge(),width=0.5,size=0.25)+  # draw the bar plot
  coord_polar()+                                                             #change to polar system
  scale_x_discrete(limits = c(as.factor(1:12)), # the range of x
                   labels=c("Jan.","Feb.","Mar.","Apr.","May","Jun.",    # label of x
                            "Jul.","Aug.","Sep.","Oct.","Nov","Dec.")) +
  #scale_y_continuous(breaks=c(-800,0,500,1000,1500,2000),labels=c("","0","500","1000","1500","2000"))+ 
  ylim(c(-800,2000))+                                                        #the range of y
  geom_segment(aes(x=1,y=0,xend=12,yend=0),colour="black")+
  guides(fill=guide_legend(title = NULL)) +          #delete the title
  labs(title = "The number of cardiovascular patients in Guilin city of 2015")+   #add new title
  scale_fill_discrete(labels=c("Female","Male"))+  
  theme(axis.text.x = element_text(size = 10),      
        axis.title = element_blank(),              
        axis.line = element_blank(),                
        legend.text = element_text(size = 8),       #addjust the size of the text 
        legend.key.size = unit(3.6,'mm'),           #addjust the size of the plot 
        legend.position = c(0.5,0.5),               #addjust the position of plot
        plot.title = element_text(size = 12.5),     #addjust size of text of title
        #panel.grid.major.y = element_blank(),       #add grid in y-axis
        panel.grid.major.x = element_line(size = 0.7,linetype = "dotted")
        )

## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.

## Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

4 Others

4.1 Cleveland Dot Plot/Lollipop Chart

A Cleveland dot plot proposed by William S. Cleveland and Robert McGill (https://www.jstor.org/stable/2288400) is a great alternative to a simple bar chart, particularly if you have more than a few items, in which case a bar chart can easily look cluttered. In the same amount of space, many more values can be included in the Cleveland dot plot, and it is easy to read as well. A Cleveland dot plot typically plots a categorical variable against a numeric variable.

Note that even though the bar plot can visualize whatever the Cleveland dot plot visualizes, the bar plot often costs more data-ink compared to the Cleveland dot plot. Often times, the Cleveland dot plot can be more efficient.

mtcars_revised <- mtcars %>% 
                  arrange(mpg) %>% 
                  mutate(name = row.names(mtcars)) %>%
                  mutate(name = factor(name, levels = .$name))
ggplot(mtcars_revised, aes(x = mpg, y = reorder(name,-mpg), label = mpg) ) +
  geom_point() +
  theme_bw() +
  theme(
    panel.grid.major.x = element_blank(),
    panel.grid.minor.x = element_blank(),
    panel.grid.major.y = element_line(colour = "grey60", linetype = "dashed")
  )

A lollipop chart is a simple modification of the Cleveland dot plot. In addition to the dots, a lollipop chart contains lines that tie each categories to their relative dot, forming lollipops. A lollipop chart is great for comparing multiple categories as it aids the reader in aligning categories to points but minimizes the amount of ink on the graphic.

ggplot(mtcars_revised, aes(x = mpg, y = reorder(name,-mpg), label = mpg) ) +
  geom_point() +
  geom_text(nudge_x = 1.5) +
  geom_segment(aes(x = 0, xend = mpg, 
                   y = name, yend = name), color = "grey50") +
  theme_bw()

There are many extensions of the Cleveland dot plot and the lollipop chart.

mpg_revised <- mpg %>% mutate(brandmodel=paste(manufacturer, model)) %>%
                  group_by(brandmodel) %>% 
                  summarize(avg_hwy=mean(hwy, na.rm=TRUE),
                            avg_cty=mean(cty, na.rm=TRUE))
ggplot(mpg_revised) +
  geom_point(aes(avg_hwy, brandmodel),col="blue") +
  geom_point(aes(avg_cty, brandmodel),col="red") +
  geom_segment(aes(x = avg_cty, xend = avg_hwy, 
                   y = brandmodel, yend = brandmodel), color = "grey50")+
  geom_text(aes(x = avg_cty, y=brandmodel, label = round(avg_cty, 1)), size = 3, hjust = 1.5) +
  geom_text(aes(x = avg_hwy, y=brandmodel, label = round(avg_hwy, 1)), size = 3, hjust = -.5)

4.2 Waterfall Plot

The waterfalls package is based on ggplot2. We are going to use the function waterfall in the package to draw the plot.

We are using the dataset miga.csv which provides summary income statement from quarterly statements.

Data resource: MIGA Summary Income Statement From World Bank Financial Open Data

library(waterfalls)

miga<-read.csv("data/miga.csv")
miga$Item<-as.character(miga$Item)
waterfall(.data=miga,values =miga$Amount,labels =miga$Item,
          calc_total = TRUE, 
          total_rect_color = "steelblue2",
          total_axis_text = "summary income",
          rect_border = "white",
          fill_by_sign =TRUE)+
  coord_flip()

## Warning in waterfall(.data = miga, values = miga$Amount, labels = miga$Item, :
## .data and values and labels supplied, .data ignored

Income:Investment Income;Net Premium Income Expenses:Decrease in Reserves;Administrative expenses

Ch7P2 Visualization Types - Part II

Descriptive Analytics and Data Visualization

Yichen Qin (qinyn@ucmail.uc.edu), University of Cincinnati

2024-12-20

Ch7P2 Visualization Types - II

1 Visualization of Association for Discrete Data

1.1 Mosaic Plot

1.2 Parallel Sets

2 Visualization of Composition

2.1 Tree Map

2.2 Sunburst Chart

2.3 Ternary Plot

2.4 Pie Chart

2.5 Donut Chart

3 Visualization of Change

3.1 Slope Graph

3.2 Arrow Plot

3.3 Nightingale Rose Chart/Radial Column Chart/Radial Barplot

4 Others

4.1 Cleveland Dot Plot/Lollipop Chart

4.2 Waterfall Plot