In this chapter, we continue to discuss the commonly used visualization types. The following R packages are required to run the examples in this chapter.
library(tidyverse)
library(ggforce)
library(readxl)
library(sunburstR)
library(RColorBrewer)
library(grid)
library(gridExtra)
library(graphics)
library(vcd)
library(ggrepel)
library(ggsci)
library(ggtern)
library(sqldf)
library(waterfalls)
A mosaic plot (also known as a Marimekko diagram) is a graphical method for visualizing data from two or more qualitative variables. It is the multidimensional extension of spineplots, which graphically display the same information for only one variable. It gives an overview of the data and makes it possible to recognize relationships between different variables.
Dataset
We are using the Titanic dataset from the ggplot2. This dataset provides the information about the passengers who died on the Titanic’s maiden voyage with economic status (class), gender, age and survival status.
Package
There are many choices in R to draw mosaic plot. We can use geom_rect() in ggplot2 or geom_mosaic() in ggmosaic or mosaicplot() in graphics or mosaic() in vcd.
library(tidyverse)
library(graphics)
library(vcd)
Example by using graphics package
titanic=read.csv("data/titanic.csv")
dim(titanic)
## [1] 2201 4
titanic[seq(1,dim(titanic)[1],by=50),]
## Class Sex Age Survived
## 1 3rd Male Child No
## 51 3rd Female Child No
## 101 1st Male Adult No
## 151 1st Male Adult No
## 201 2nd Male Adult No
## 251 2nd Male Adult No
## 301 2nd Male Adult No
## 351 3rd Male Adult No
## 401 3rd Male Adult No
## 451 3rd Male Adult No
## 501 3rd Male Adult No
## 551 3rd Male Adult No
## 601 3rd Male Adult No
## 651 3rd Male Adult No
## 701 3rd Male Adult No
## 751 Crew Male Adult No
## 801 Crew Male Adult No
## 851 Crew Male Adult No
## 901 Crew Male Adult No
## 951 Crew Male Adult No
## 1001 Crew Male Adult No
## 1051 Crew Male Adult No
## 1101 Crew Male Adult No
## 1151 Crew Male Adult No
## 1201 Crew Male Adult No
## 1251 Crew Male Adult No
## 1301 Crew Male Adult No
## 1351 Crew Male Adult No
## 1401 3rd Female Adult No
## 1451 3rd Female Adult No
## 1501 2nd Male Child Yes
## 1551 1st Male Adult Yes
## 1601 1st Male Adult Yes
## 1651 3rd Male Adult Yes
## 1701 Crew Male Adult Yes
## 1751 Crew Male Adult Yes
## 1801 Crew Male Adult Yes
## 1851 Crew Male Adult Yes
## 1901 1st Female Adult Yes
## 1951 1st Female Adult Yes
## 2001 1st Female Adult Yes
## 2051 2nd Female Adult Yes
## 2101 2nd Female Adult Yes
## 2151 3rd Female Adult Yes
## 2201 Crew Female Adult Yes
titanic_tab=table(titanic)
titanic_tab
## , , Age = Adult, Survived = No
##
## Sex
## Class Female Male
## 1st 4 118
## 2nd 13 154
## 3rd 89 387
## Crew 3 670
##
## , , Age = Child, Survived = No
##
## Sex
## Class Female Male
## 1st 0 0
## 2nd 0 0
## 3rd 17 35
## Crew 0 0
##
## , , Age = Adult, Survived = Yes
##
## Sex
## Class Female Male
## 1st 140 57
## 2nd 80 14
## 3rd 76 75
## Crew 20 192
##
## , , Age = Child, Survived = Yes
##
## Sex
## Class Female Male
## 1st 1 5
## 2nd 13 11
## 3rd 14 13
## Crew 0 0
mosaicplot(~ Class + Sex , data = titanic,
main = "Survival on the Titanic", color = TRUE)
mosaicplot(~ Class + Sex + Age , data = titanic,
main = "Survival on the Titanic", color = TRUE)
mosaicplot(~ Class + Sex + Age + Survived, data = titanic,
main = "Survival on the Titanic", color = TRUE)
#mosaicplot(~ Class + Sex + Survived, data = titanic)
#mosaicplot(~ Sex + Class + Survived, data = titanic)
Another package for mosaic plots.
#vcd package
#data("Titanic")
#head(Titanic) # the same as titanic_tab
#mosaic(Titanic)
mosaic(~ Sex + Age + Survived + Class, data = titanic,
main = "Survival on the Titanic", shade = TRUE, legend = TRUE)
#assoc(Titanic, shade=TRUE, legend=TRUE)
In this incident, people tended to protect women and children. Adult men sacrificed themselves to give children and women opportunities, and the crew sacrificed themselves to give higher-class people opportunities.
Another example by using vcd
package.
data001 <-read.csv("data/cinema.csv")
mosaicplot(~ year + release_date, data =data001,
shade = T, color = T, main ="cinema" ) +
theme(axis.text.x=element_text(angle=-45, hjust= .1))
A parallel sets plot is a new method for the visualization and interactive exploration of categorical data that shows data frequencies instead of the individual data points. The method is based on the axis layout of parallel coordinates, with boxes representing the categories and parallelograms between the axes showing the relations between categories.
We are using the Titanic data set from the ggplot2. This data set provides the information about the passengers on the Titanic’s maiden voyage, including their ticket class (economic status), gender, age, and survival status. Note that we have to reorganize the data set before using the parallel sets plot.
library(ggforce) # Package
titanic_tab=table(titanic)
titanic_tab
## , , Age = Adult, Survived = No
##
## Sex
## Class Female Male
## 1st 4 118
## 2nd 13 154
## 3rd 89 387
## Crew 3 670
##
## , , Age = Child, Survived = No
##
## Sex
## Class Female Male
## 1st 0 0
## 2nd 0 0
## 3rd 17 35
## Crew 0 0
##
## , , Age = Adult, Survived = Yes
##
## Sex
## Class Female Male
## 1st 140 57
## 2nd 80 14
## 3rd 76 75
## Crew 20 192
##
## , , Age = Child, Survived = Yes
##
## Sex
## Class Female Male
## 1st 1 5
## 2nd 13 11
## 3rd 14 13
## Crew 0 0
titanic_freq <- reshape2::melt(titanic_tab)
titanic_freq
## Class Sex Age Survived value
## 1 1st Female Adult No 4
## 2 2nd Female Adult No 13
## 3 3rd Female Adult No 89
## 4 Crew Female Adult No 3
## 5 1st Male Adult No 118
## 6 2nd Male Adult No 154
## 7 3rd Male Adult No 387
## 8 Crew Male Adult No 670
## 9 1st Female Child No 0
## 10 2nd Female Child No 0
## 11 3rd Female Child No 17
## 12 Crew Female Child No 0
## 13 1st Male Child No 0
## 14 2nd Male Child No 0
## 15 3rd Male Child No 35
## 16 Crew Male Child No 0
## 17 1st Female Adult Yes 140
## 18 2nd Female Adult Yes 80
## 19 3rd Female Adult Yes 76
## 20 Crew Female Adult Yes 20
## 21 1st Male Adult Yes 57
## 22 2nd Male Adult Yes 14
## 23 3rd Male Adult Yes 75
## 24 Crew Male Adult Yes 192
## 25 1st Female Child Yes 1
## 26 2nd Female Child Yes 13
## 27 3rd Female Child Yes 14
## 28 Crew Female Child Yes 0
## 29 1st Male Child Yes 5
## 30 2nd Male Child Yes 11
## 31 3rd Male Child Yes 13
## 32 Crew Male Child Yes 0
parallel_data14 <- gather_set_data(titanic_freq, c(1,4))
parallel_data14
## Class Sex Age Survived value id x y
## 1 1st Female Adult No 4 1 1 1st
## 2 2nd Female Adult No 13 2 1 2nd
## 3 3rd Female Adult No 89 3 1 3rd
## 4 Crew Female Adult No 3 4 1 Crew
## 5 1st Male Adult No 118 5 1 1st
## 6 2nd Male Adult No 154 6 1 2nd
## 7 3rd Male Adult No 387 7 1 3rd
## 8 Crew Male Adult No 670 8 1 Crew
## 9 1st Female Child No 0 9 1 1st
## 10 2nd Female Child No 0 10 1 2nd
## 11 3rd Female Child No 17 11 1 3rd
## 12 Crew Female Child No 0 12 1 Crew
## 13 1st Male Child No 0 13 1 1st
## 14 2nd Male Child No 0 14 1 2nd
## 15 3rd Male Child No 35 15 1 3rd
## 16 Crew Male Child No 0 16 1 Crew
## 17 1st Female Adult Yes 140 17 1 1st
## 18 2nd Female Adult Yes 80 18 1 2nd
## 19 3rd Female Adult Yes 76 19 1 3rd
## 20 Crew Female Adult Yes 20 20 1 Crew
## 21 1st Male Adult Yes 57 21 1 1st
## 22 2nd Male Adult Yes 14 22 1 2nd
## 23 3rd Male Adult Yes 75 23 1 3rd
## 24 Crew Male Adult Yes 192 24 1 Crew
## 25 1st Female Child Yes 1 25 1 1st
## 26 2nd Female Child Yes 13 26 1 2nd
## 27 3rd Female Child Yes 14 27 1 3rd
## 28 Crew Female Child Yes 0 28 1 Crew
## 29 1st Male Child Yes 5 29 1 1st
## 30 2nd Male Child Yes 11 30 1 2nd
## 31 3rd Male Child Yes 13 31 1 3rd
## 32 Crew Male Child Yes 0 32 1 Crew
## 33 1st Female Adult No 4 1 4 No
## 34 2nd Female Adult No 13 2 4 No
## 35 3rd Female Adult No 89 3 4 No
## 36 Crew Female Adult No 3 4 4 No
## 37 1st Male Adult No 118 5 4 No
## 38 2nd Male Adult No 154 6 4 No
## 39 3rd Male Adult No 387 7 4 No
## 40 Crew Male Adult No 670 8 4 No
## 41 1st Female Child No 0 9 4 No
## 42 2nd Female Child No 0 10 4 No
## 43 3rd Female Child No 17 11 4 No
## 44 Crew Female Child No 0 12 4 No
## 45 1st Male Child No 0 13 4 No
## 46 2nd Male Child No 0 14 4 No
## 47 3rd Male Child No 35 15 4 No
## 48 Crew Male Child No 0 16 4 No
## 49 1st Female Adult Yes 140 17 4 Yes
## 50 2nd Female Adult Yes 80 18 4 Yes
## 51 3rd Female Adult Yes 76 19 4 Yes
## 52 Crew Female Adult Yes 20 20 4 Yes
## 53 1st Male Adult Yes 57 21 4 Yes
## 54 2nd Male Adult Yes 14 22 4 Yes
## 55 3rd Male Adult Yes 75 23 4 Yes
## 56 Crew Male Adult Yes 192 24 4 Yes
## 57 1st Female Child Yes 1 25 4 Yes
## 58 2nd Female Child Yes 13 26 4 Yes
## 59 3rd Female Child Yes 14 27 4 Yes
## 60 Crew Female Child Yes 0 28 4 Yes
## 61 1st Male Child Yes 5 29 4 Yes
## 62 2nd Male Child Yes 11 30 4 Yes
## 63 3rd Male Child Yes 13 31 4 Yes
## 64 Crew Male Child Yes 0 32 4 Yes
g1=ggplot(parallel_data14, aes(x=factor(x, levels = c("Class", "Sex","Age","Survived")), id = id, split = y, value = value)) +
xlab("Covariates")+
geom_parallel_sets(aes(fill = Survived),alpha = 0.3, axis.width = 0.2) +
geom_parallel_sets_axes(axis.width = 0.2) +
geom_parallel_sets_labels(color = 'white',size=3)
parallel_data124 <- gather_set_data(titanic_freq, c(1,2,4))
parallel_data124
## Class Sex Age Survived value id x y
## 1 1st Female Adult No 4 1 1 1st
## 2 2nd Female Adult No 13 2 1 2nd
## 3 3rd Female Adult No 89 3 1 3rd
## 4 Crew Female Adult No 3 4 1 Crew
## 5 1st Male Adult No 118 5 1 1st
## 6 2nd Male Adult No 154 6 1 2nd
## 7 3rd Male Adult No 387 7 1 3rd
## 8 Crew Male Adult No 670 8 1 Crew
## 9 1st Female Child No 0 9 1 1st
## 10 2nd Female Child No 0 10 1 2nd
## 11 3rd Female Child No 17 11 1 3rd
## 12 Crew Female Child No 0 12 1 Crew
## 13 1st Male Child No 0 13 1 1st
## 14 2nd Male Child No 0 14 1 2nd
## 15 3rd Male Child No 35 15 1 3rd
## 16 Crew Male Child No 0 16 1 Crew
## 17 1st Female Adult Yes 140 17 1 1st
## 18 2nd Female Adult Yes 80 18 1 2nd
## 19 3rd Female Adult Yes 76 19 1 3rd
## 20 Crew Female Adult Yes 20 20 1 Crew
## 21 1st Male Adult Yes 57 21 1 1st
## 22 2nd Male Adult Yes 14 22 1 2nd
## 23 3rd Male Adult Yes 75 23 1 3rd
## 24 Crew Male Adult Yes 192 24 1 Crew
## 25 1st Female Child Yes 1 25 1 1st
## 26 2nd Female Child Yes 13 26 1 2nd
## 27 3rd Female Child Yes 14 27 1 3rd
## 28 Crew Female Child Yes 0 28 1 Crew
## 29 1st Male Child Yes 5 29 1 1st
## 30 2nd Male Child Yes 11 30 1 2nd
## 31 3rd Male Child Yes 13 31 1 3rd
## 32 Crew Male Child Yes 0 32 1 Crew
## 33 1st Female Adult No 4 1 2 Female
## 34 2nd Female Adult No 13 2 2 Female
## 35 3rd Female Adult No 89 3 2 Female
## 36 Crew Female Adult No 3 4 2 Female
## 37 1st Male Adult No 118 5 2 Male
## 38 2nd Male Adult No 154 6 2 Male
## 39 3rd Male Adult No 387 7 2 Male
## 40 Crew Male Adult No 670 8 2 Male
## 41 1st Female Child No 0 9 2 Female
## 42 2nd Female Child No 0 10 2 Female
## 43 3rd Female Child No 17 11 2 Female
## 44 Crew Female Child No 0 12 2 Female
## 45 1st Male Child No 0 13 2 Male
## 46 2nd Male Child No 0 14 2 Male
## 47 3rd Male Child No 35 15 2 Male
## 48 Crew Male Child No 0 16 2 Male
## 49 1st Female Adult Yes 140 17 2 Female
## 50 2nd Female Adult Yes 80 18 2 Female
## 51 3rd Female Adult Yes 76 19 2 Female
## 52 Crew Female Adult Yes 20 20 2 Female
## 53 1st Male Adult Yes 57 21 2 Male
## 54 2nd Male Adult Yes 14 22 2 Male
## 55 3rd Male Adult Yes 75 23 2 Male
## 56 Crew Male Adult Yes 192 24 2 Male
## 57 1st Female Child Yes 1 25 2 Female
## 58 2nd Female Child Yes 13 26 2 Female
## 59 3rd Female Child Yes 14 27 2 Female
## 60 Crew Female Child Yes 0 28 2 Female
## 61 1st Male Child Yes 5 29 2 Male
## 62 2nd Male Child Yes 11 30 2 Male
## 63 3rd Male Child Yes 13 31 2 Male
## 64 Crew Male Child Yes 0 32 2 Male
## 65 1st Female Adult No 4 1 4 No
## 66 2nd Female Adult No 13 2 4 No
## 67 3rd Female Adult No 89 3 4 No
## 68 Crew Female Adult No 3 4 4 No
## 69 1st Male Adult No 118 5 4 No
## 70 2nd Male Adult No 154 6 4 No
## 71 3rd Male Adult No 387 7 4 No
## 72 Crew Male Adult No 670 8 4 No
## 73 1st Female Child No 0 9 4 No
## 74 2nd Female Child No 0 10 4 No
## 75 3rd Female Child No 17 11 4 No
## 76 Crew Female Child No 0 12 4 No
## 77 1st Male Child No 0 13 4 No
## 78 2nd Male Child No 0 14 4 No
## 79 3rd Male Child No 35 15 4 No
## 80 Crew Male Child No 0 16 4 No
## 81 1st Female Adult Yes 140 17 4 Yes
## 82 2nd Female Adult Yes 80 18 4 Yes
## 83 3rd Female Adult Yes 76 19 4 Yes
## 84 Crew Female Adult Yes 20 20 4 Yes
## 85 1st Male Adult Yes 57 21 4 Yes
## 86 2nd Male Adult Yes 14 22 4 Yes
## 87 3rd Male Adult Yes 75 23 4 Yes
## 88 Crew Male Adult Yes 192 24 4 Yes
## 89 1st Female Child Yes 1 25 4 Yes
## 90 2nd Female Child Yes 13 26 4 Yes
## 91 3rd Female Child Yes 14 27 4 Yes
## 92 Crew Female Child Yes 0 28 4 Yes
## 93 1st Male Child Yes 5 29 4 Yes
## 94 2nd Male Child Yes 11 30 4 Yes
## 95 3rd Male Child Yes 13 31 4 Yes
## 96 Crew Male Child Yes 0 32 4 Yes
g2=ggplot(parallel_data124, aes(x=factor(x, levels = c("Class", "Sex","Age","Survived")), id = id, split = y, value = value)) +
xlab("Covariates") +
geom_parallel_sets(aes(fill = Survived), alpha = 0.3, axis.width = 0.2) +
geom_parallel_sets_axes(axis.width = 0.2) +
geom_parallel_sets_labels(colour = 'white',size=3)
#grid.arrange(g1,g2,ncol=2)
data <- reshape2::melt(Titanic)
data <- gather_set_data(data, 1:4)
ggplot(data, aes(x, id = id, split = y, value = value)) +
geom_parallel_sets(aes(fill = Sex), alpha = 0.3, axis.width = 0.1) +
geom_parallel_sets_axes(axis.width = 0.1) +
geom_parallel_sets_labels(colour = 'white')
parallel_data1234 <- gather_set_data(titanic_freq, c(1,2,3,4))
parallel_data1234
## Class Sex Age Survived value id x y
## 1 1st Female Adult No 4 1 1 1st
## 2 2nd Female Adult No 13 2 1 2nd
## 3 3rd Female Adult No 89 3 1 3rd
## 4 Crew Female Adult No 3 4 1 Crew
## 5 1st Male Adult No 118 5 1 1st
## 6 2nd Male Adult No 154 6 1 2nd
## 7 3rd Male Adult No 387 7 1 3rd
## 8 Crew Male Adult No 670 8 1 Crew
## 9 1st Female Child No 0 9 1 1st
## 10 2nd Female Child No 0 10 1 2nd
## 11 3rd Female Child No 17 11 1 3rd
## 12 Crew Female Child No 0 12 1 Crew
## 13 1st Male Child No 0 13 1 1st
## 14 2nd Male Child No 0 14 1 2nd
## 15 3rd Male Child No 35 15 1 3rd
## 16 Crew Male Child No 0 16 1 Crew
## 17 1st Female Adult Yes 140 17 1 1st
## 18 2nd Female Adult Yes 80 18 1 2nd
## 19 3rd Female Adult Yes 76 19 1 3rd
## 20 Crew Female Adult Yes 20 20 1 Crew
## 21 1st Male Adult Yes 57 21 1 1st
## 22 2nd Male Adult Yes 14 22 1 2nd
## 23 3rd Male Adult Yes 75 23 1 3rd
## 24 Crew Male Adult Yes 192 24 1 Crew
## 25 1st Female Child Yes 1 25 1 1st
## 26 2nd Female Child Yes 13 26 1 2nd
## 27 3rd Female Child Yes 14 27 1 3rd
## 28 Crew Female Child Yes 0 28 1 Crew
## 29 1st Male Child Yes 5 29 1 1st
## 30 2nd Male Child Yes 11 30 1 2nd
## 31 3rd Male Child Yes 13 31 1 3rd
## 32 Crew Male Child Yes 0 32 1 Crew
## 33 1st Female Adult No 4 1 2 Female
## 34 2nd Female Adult No 13 2 2 Female
## 35 3rd Female Adult No 89 3 2 Female
## 36 Crew Female Adult No 3 4 2 Female
## 37 1st Male Adult No 118 5 2 Male
## 38 2nd Male Adult No 154 6 2 Male
## 39 3rd Male Adult No 387 7 2 Male
## 40 Crew Male Adult No 670 8 2 Male
## 41 1st Female Child No 0 9 2 Female
## 42 2nd Female Child No 0 10 2 Female
## 43 3rd Female Child No 17 11 2 Female
## 44 Crew Female Child No 0 12 2 Female
## 45 1st Male Child No 0 13 2 Male
## 46 2nd Male Child No 0 14 2 Male
## 47 3rd Male Child No 35 15 2 Male
## 48 Crew Male Child No 0 16 2 Male
## 49 1st Female Adult Yes 140 17 2 Female
## 50 2nd Female Adult Yes 80 18 2 Female
## 51 3rd Female Adult Yes 76 19 2 Female
## 52 Crew Female Adult Yes 20 20 2 Female
## 53 1st Male Adult Yes 57 21 2 Male
## 54 2nd Male Adult Yes 14 22 2 Male
## 55 3rd Male Adult Yes 75 23 2 Male
## 56 Crew Male Adult Yes 192 24 2 Male
## 57 1st Female Child Yes 1 25 2 Female
## 58 2nd Female Child Yes 13 26 2 Female
## 59 3rd Female Child Yes 14 27 2 Female
## 60 Crew Female Child Yes 0 28 2 Female
## 61 1st Male Child Yes 5 29 2 Male
## 62 2nd Male Child Yes 11 30 2 Male
## 63 3rd Male Child Yes 13 31 2 Male
## 64 Crew Male Child Yes 0 32 2 Male
## 65 1st Female Adult No 4 1 3 Adult
## 66 2nd Female Adult No 13 2 3 Adult
## 67 3rd Female Adult No 89 3 3 Adult
## 68 Crew Female Adult No 3 4 3 Adult
## 69 1st Male Adult No 118 5 3 Adult
## 70 2nd Male Adult No 154 6 3 Adult
## 71 3rd Male Adult No 387 7 3 Adult
## 72 Crew Male Adult No 670 8 3 Adult
## 73 1st Female Child No 0 9 3 Child
## 74 2nd Female Child No 0 10 3 Child
## 75 3rd Female Child No 17 11 3 Child
## 76 Crew Female Child No 0 12 3 Child
## 77 1st Male Child No 0 13 3 Child
## 78 2nd Male Child No 0 14 3 Child
## 79 3rd Male Child No 35 15 3 Child
## 80 Crew Male Child No 0 16 3 Child
## 81 1st Female Adult Yes 140 17 3 Adult
## 82 2nd Female Adult Yes 80 18 3 Adult
## 83 3rd Female Adult Yes 76 19 3 Adult
## 84 Crew Female Adult Yes 20 20 3 Adult
## 85 1st Male Adult Yes 57 21 3 Adult
## 86 2nd Male Adult Yes 14 22 3 Adult
## 87 3rd Male Adult Yes 75 23 3 Adult
## 88 Crew Male Adult Yes 192 24 3 Adult
## 89 1st Female Child Yes 1 25 3 Child
## 90 2nd Female Child Yes 13 26 3 Child
## 91 3rd Female Child Yes 14 27 3 Child
## 92 Crew Female Child Yes 0 28 3 Child
## 93 1st Male Child Yes 5 29 3 Child
## 94 2nd Male Child Yes 11 30 3 Child
## 95 3rd Male Child Yes 13 31 3 Child
## 96 Crew Male Child Yes 0 32 3 Child
## 97 1st Female Adult No 4 1 4 No
## 98 2nd Female Adult No 13 2 4 No
## 99 3rd Female Adult No 89 3 4 No
## 100 Crew Female Adult No 3 4 4 No
## 101 1st Male Adult No 118 5 4 No
## 102 2nd Male Adult No 154 6 4 No
## 103 3rd Male Adult No 387 7 4 No
## 104 Crew Male Adult No 670 8 4 No
## 105 1st Female Child No 0 9 4 No
## 106 2nd Female Child No 0 10 4 No
## 107 3rd Female Child No 17 11 4 No
## 108 Crew Female Child No 0 12 4 No
## 109 1st Male Child No 0 13 4 No
## 110 2nd Male Child No 0 14 4 No
## 111 3rd Male Child No 35 15 4 No
## 112 Crew Male Child No 0 16 4 No
## 113 1st Female Adult Yes 140 17 4 Yes
## 114 2nd Female Adult Yes 80 18 4 Yes
## 115 3rd Female Adult Yes 76 19 4 Yes
## 116 Crew Female Adult Yes 20 20 4 Yes
## 117 1st Male Adult Yes 57 21 4 Yes
## 118 2nd Male Adult Yes 14 22 4 Yes
## 119 3rd Male Adult Yes 75 23 4 Yes
## 120 Crew Male Adult Yes 192 24 4 Yes
## 121 1st Female Child Yes 1 25 4 Yes
## 122 2nd Female Child Yes 13 26 4 Yes
## 123 3rd Female Child Yes 14 27 4 Yes
## 124 Crew Female Child Yes 0 28 4 Yes
## 125 1st Male Child Yes 5 29 4 Yes
## 126 2nd Male Child Yes 11 30 4 Yes
## 127 3rd Male Child Yes 13 31 4 Yes
## 128 Crew Male Child Yes 0 32 4 Yes
g3=ggplot(parallel_data1234, aes(x=factor(x, levels = c("Class", "Sex","Age","Survived")), id = id, split = y, value = value)) +
xlab("Covariates")+
geom_parallel_sets(aes(fill = Survived), alpha = 0.3, axis.width = 0.2) +
geom_parallel_sets_axes(axis.width = 0.2) +
geom_parallel_sets_labels(color = 'white',size=3)
parallel_data123 <- gather_set_data(titanic_freq, c(1,2,3))
parallel_data123
## Class Sex Age Survived value id x y
## 1 1st Female Adult No 4 1 1 1st
## 2 2nd Female Adult No 13 2 1 2nd
## 3 3rd Female Adult No 89 3 1 3rd
## 4 Crew Female Adult No 3 4 1 Crew
## 5 1st Male Adult No 118 5 1 1st
## 6 2nd Male Adult No 154 6 1 2nd
## 7 3rd Male Adult No 387 7 1 3rd
## 8 Crew Male Adult No 670 8 1 Crew
## 9 1st Female Child No 0 9 1 1st
## 10 2nd Female Child No 0 10 1 2nd
## 11 3rd Female Child No 17 11 1 3rd
## 12 Crew Female Child No 0 12 1 Crew
## 13 1st Male Child No 0 13 1 1st
## 14 2nd Male Child No 0 14 1 2nd
## 15 3rd Male Child No 35 15 1 3rd
## 16 Crew Male Child No 0 16 1 Crew
## 17 1st Female Adult Yes 140 17 1 1st
## 18 2nd Female Adult Yes 80 18 1 2nd
## 19 3rd Female Adult Yes 76 19 1 3rd
## 20 Crew Female Adult Yes 20 20 1 Crew
## 21 1st Male Adult Yes 57 21 1 1st
## 22 2nd Male Adult Yes 14 22 1 2nd
## 23 3rd Male Adult Yes 75 23 1 3rd
## 24 Crew Male Adult Yes 192 24 1 Crew
## 25 1st Female Child Yes 1 25 1 1st
## 26 2nd Female Child Yes 13 26 1 2nd
## 27 3rd Female Child Yes 14 27 1 3rd
## 28 Crew Female Child Yes 0 28 1 Crew
## 29 1st Male Child Yes 5 29 1 1st
## 30 2nd Male Child Yes 11 30 1 2nd
## 31 3rd Male Child Yes 13 31 1 3rd
## 32 Crew Male Child Yes 0 32 1 Crew
## 33 1st Female Adult No 4 1 2 Female
## 34 2nd Female Adult No 13 2 2 Female
## 35 3rd Female Adult No 89 3 2 Female
## 36 Crew Female Adult No 3 4 2 Female
## 37 1st Male Adult No 118 5 2 Male
## 38 2nd Male Adult No 154 6 2 Male
## 39 3rd Male Adult No 387 7 2 Male
## 40 Crew Male Adult No 670 8 2 Male
## 41 1st Female Child No 0 9 2 Female
## 42 2nd Female Child No 0 10 2 Female
## 43 3rd Female Child No 17 11 2 Female
## 44 Crew Female Child No 0 12 2 Female
## 45 1st Male Child No 0 13 2 Male
## 46 2nd Male Child No 0 14 2 Male
## 47 3rd Male Child No 35 15 2 Male
## 48 Crew Male Child No 0 16 2 Male
## 49 1st Female Adult Yes 140 17 2 Female
## 50 2nd Female Adult Yes 80 18 2 Female
## 51 3rd Female Adult Yes 76 19 2 Female
## 52 Crew Female Adult Yes 20 20 2 Female
## 53 1st Male Adult Yes 57 21 2 Male
## 54 2nd Male Adult Yes 14 22 2 Male
## 55 3rd Male Adult Yes 75 23 2 Male
## 56 Crew Male Adult Yes 192 24 2 Male
## 57 1st Female Child Yes 1 25 2 Female
## 58 2nd Female Child Yes 13 26 2 Female
## 59 3rd Female Child Yes 14 27 2 Female
## 60 Crew Female Child Yes 0 28 2 Female
## 61 1st Male Child Yes 5 29 2 Male
## 62 2nd Male Child Yes 11 30 2 Male
## 63 3rd Male Child Yes 13 31 2 Male
## 64 Crew Male Child Yes 0 32 2 Male
## 65 1st Female Adult No 4 1 3 Adult
## 66 2nd Female Adult No 13 2 3 Adult
## 67 3rd Female Adult No 89 3 3 Adult
## 68 Crew Female Adult No 3 4 3 Adult
## 69 1st Male Adult No 118 5 3 Adult
## 70 2nd Male Adult No 154 6 3 Adult
## 71 3rd Male Adult No 387 7 3 Adult
## 72 Crew Male Adult No 670 8 3 Adult
## 73 1st Female Child No 0 9 3 Child
## 74 2nd Female Child No 0 10 3 Child
## 75 3rd Female Child No 17 11 3 Child
## 76 Crew Female Child No 0 12 3 Child
## 77 1st Male Child No 0 13 3 Child
## 78 2nd Male Child No 0 14 3 Child
## 79 3rd Male Child No 35 15 3 Child
## 80 Crew Male Child No 0 16 3 Child
## 81 1st Female Adult Yes 140 17 3 Adult
## 82 2nd Female Adult Yes 80 18 3 Adult
## 83 3rd Female Adult Yes 76 19 3 Adult
## 84 Crew Female Adult Yes 20 20 3 Adult
## 85 1st Male Adult Yes 57 21 3 Adult
## 86 2nd Male Adult Yes 14 22 3 Adult
## 87 3rd Male Adult Yes 75 23 3 Adult
## 88 Crew Male Adult Yes 192 24 3 Adult
## 89 1st Female Child Yes 1 25 3 Child
## 90 2nd Female Child Yes 13 26 3 Child
## 91 3rd Female Child Yes 14 27 3 Child
## 92 Crew Female Child Yes 0 28 3 Child
## 93 1st Male Child Yes 5 29 3 Child
## 94 2nd Male Child Yes 11 30 3 Child
## 95 3rd Male Child Yes 13 31 3 Child
## 96 Crew Male Child Yes 0 32 3 Child
g4=ggplot(parallel_data123, aes(x=factor(x, levels = c("Class", "Sex","Age","Survived")), id = id, split = y, value = value)) +
xlab("Covariates") +
geom_parallel_sets(aes(fill = Survived), alpha = 0.3, axis.width = 0.2) +
geom_parallel_sets_axes(axis.width = 0.2) +
geom_parallel_sets_labels(colour = 'white',size=3)
grid.arrange(g3,g4,ncol=2)
## Warning: Computation failed in `stat_parallel_sets()`
## Caused by error in `FUN()`:
## ! id must be unique within axes
## Warning: Computation failed in `stat_parallel_sets_axes()`
## Caused by error in `FUN()`:
## ! id must be unique within axes
## Warning: Computation failed in `stat_parallel_sets_axes()`
## Caused by error in `FUN()`:
## ! id must be unique within axes
## Warning: Computation failed in `stat_parallel_sets()`
## Caused by error in `FUN()`:
## ! id must be unique within axes
## Warning: Computation failed in `stat_parallel_sets_axes()`
## Caused by error in `FUN()`:
## ! id must be unique within axes
## Warning: Computation failed in `stat_parallel_sets_axes()`
## Caused by error in `FUN()`:
## ! id must be unique within axes
Note that the data is now organized in a way that can be directly plot.
The last four columns are the most important part. Note there are 32
different groups of passengers, i.e., 32 group = 4 class * 2 genders * 2
ages * 2 survival status. Therefore, the variable id marks these 32
groups. For each group, the variable value stores how many passengers in
this group. The variables x and y indicates the group label, i.e.,
whether is a male group or female group, child group or adult group.
This whole process is repeated four times because we have in total four
categorical variables.
The figure above describes the data and the number of passengers in each category. In addition, it displays how two category variables interact with each other. For example, age and class, class and gender, and gender and survival.
However, what if we want to see how survival interacts with all other variables? Since the survival status is the most important variable, we can move it to another dimension, the color of the bands.
Treemaps display hierarchical (tree-structured) data as a set of nested rectangles. Each branch of the tree is given a rectangle, which is then tiled with smaller rectangles representing subbranches. A leaf node’s rectangle has an area proportional to a specified dimension of the data.Often the leaf nodes are colored to show a separate dimension of the data.
Dataset: We are using the data set about Indonesia’s mobile phone market sales in the first half of 2020. The data is available at: https://www.kaggle.com/kurniakh/marketplace-data
phone<-read.csv("data/phone.csv")
dim(phone)
## [1] 361 6
phone[seq(1,nrow(phone),by=10),]
## type rmb sold brand region lnrmb
## 1 Apple iPhone 3459.90 597 Apple not-china 8.148995
## 11 Apple iPhone 6 1742.90 6736 Apple not-china 7.463306
## 21 Apple iPhone XR 5584.20 8067 Apple not-china 8.627696
## 31 Asus Zenfone 4 Max Plus ZC554KL 1207.39 568 ASUS china 7.096216
## 41 Asus Zenfone 6 ZS630KL 3535.03 503 ASUS china 8.170477
## 51 Asus Zenfone Selfie ZD551KL 1206.00 147 ASUS china 7.095064
## 61 Huawei Ascend Y540 674.88 110 Huawei china 6.514535
## 71 Huawei nova 2 1491.49 650 Huawei china 7.307531
## 81 Huawei P30 lite New Edition 1062.18 1304 Huawei china 6.968079
## 91 Infinix Hot 5 769.43 120 Infinix china 6.645650
## 101 Infinix S5 1010.63 3038 Infinix china 6.918329
## 111 Nokia 3.1 A 960.46 138 Nokia not-china 6.867412
## 121 Oppo A31 (2020) 1222.82 1264 OPPO china 7.108915
## 131 Oppo A7x 2026.08 49 OPPO china 7.613858
## 141 Oppo F5 1304.81 9683 OPPO china 7.173813
## 151 Oppo Neo 7 1447.20 235 OPPO china 7.277386
## 161 Realme 3 961.18 1688 Realme china 6.868162
## 171 Samsung Galaxy A10 826.96 38036 Samsung not-china 6.717756
## 181 Samsung Galaxy A50 1904.87 35262 Samsung not-china 7.552169
## 191 Samsung Galaxy A70 2384.87 22884 Samsung not-china 7.776900
## 201 Samsung Galaxy Express Prime 258.08 20 Samsung not-china 5.553270
## 211 Samsung Galaxy J2 Core (2020) 584.67 470 Samsung not-china 6.371048
## 221 Samsung Galaxy J7 1567.39 516 Samsung not-china 7.357167
## 231 Samsung Galaxy M30 1366.03 7412 Samsung not-china 7.219664
## 241 Samsung Galaxy S10 Lite 4050.50 728 Samsung not-china 8.306596
## 251 Samsung Galaxy S7 edge 2392.92 2661 Samsung not-china 7.780270
## 261 vivo V11 (V11 Pro) 1636.07 27259 vivo china 7.400052
## 271 vivo V7 1342.66 330 vivo china 7.202408
## 281 vivo Y53 771.79 69 vivo china 6.648712
## 291 vivo Z1Pro 1523.47 15567 vivo china 7.328746
## 301 Xiaomi Mi 8 Explorer 2966.76 122 Xiaomi china 7.995226
## 311 Xiaomi Mi Max 2714.95 407 Xiaomi china 7.906529
## 321 Xiaomi Mi Play 1291.49 1970 Xiaomi china 7.163552
## 331 Xiaomi Redmi 4 (4X) 878.08 3438 Xiaomi china 6.777738
## 341 Xiaomi Redmi 8A Dual 313.46 829 Xiaomi china 5.747672
## 351 Xiaomi Redmi Note 5 AI Dual Camera 1077.19 5448 Xiaomi china 6.982111
## 361 Xiaomi Redmi Y1 (Note 5A) 987.64 3882 Xiaomi china 6.895318
Package
library(tidyverse)
library(treemapify)
Tree map
ggplot(phone,
aes(area = sold,
subgroup = region,
subgroup2 = brand,
subgroup3 = type,
fill=lnrmb))+
geom_treemap()+
geom_treemap_subgroup3_border(color="white",size=1)+
geom_treemap_subgroup2_border(color="red",size=2)+
geom_treemap_subgroup_border(color="blue",size=3)+
#geom_treemap_subgroup_text(place = "centre", grow = TRUE, alpha = 0.5, colour ="white")+
geom_treemap_subgroup2_text(place = "bottom", grow = TRUE, alpha = 0.3, colour ="red")+
geom_treemap_text(aes(label=type),colour = "white", place = "topleft", reflow = TRUE, size=10)+
scale_fill_distiller(palette="Blues",name="phone\nprice\n(RMB)", breaks = log(c(250, 500, 1000, 2000, 4000, 8000, 16000)), labels = c(250, 500, 1000, 2000, 4000, 8000, 16000))
area=sold,the area is the sales volume of the mobile phone.
fill=lnrmb,the shade of the color represents the price (the price takes the logarithm ) of the phone. The cheaper price, the darker color.
subgroup=region,According to the origin of mobile phone brands, it is divided into two types: China (left of the black line) and not-China. As we can be seen from the figure, the Indonesian mobile phone market does not have local brands. Most of them are Chinese brands.
subgroup2=brand, Chinese brands include Xiaomi, oppo, vivo, realme, etc., and non-Chinese brands include Apple, Samsung and Nokia.
subgroup3=type, based on the type of phone models, the most popular models in Indonesia are mainly cheaper models.
We can use tree map to visualize the titanic data set, however, it is less efficient than mosaic plot.
titanic_df=as.data.frame(table(titanic[,c("Class","Sex","Survived")]))
ggplot(titanic_df,
aes(area = Freq,
subgroup = Class,
subgroup2 = Sex,
subgroup3 = Survived,
label=paste("Class",Class,Sex,"Survived",Survived)))+
geom_treemap()+
geom_treemap_subgroup3_border(color="yellow",size=1)+
geom_treemap_subgroup2_border(color="red",size=3)+
geom_treemap_subgroup_border(color="blue",size=5)+
geom_treemap_subgroup_text(place = "centre", grow = TRUE, alpha = 0.2, colour ="blue",fontface = "bold")+
geom_treemap_subgroup2_text(place = "bottom", grow = FALSE, alpha = 0.2, colour ="red",fontface = "italic")+
#geom_treemap_subgroup3_text(place = "top", grow = FALSE, alpha = 0.2, colour ="white",fontface = "italic")+
geom_treemap_text(colour = "yellow", place = "topleft", reflow = FALSE,size=10)
# alternatively
titanic_df2=as.data.frame(table(titanic))
ggplot(titanic_df2,
aes(area = Freq,
subgroup = Class,
subgroup2 = Sex,
subgroup3 = Age,
subgroup4 = Survived,
label=paste("Class",Class,Sex,Age,"Survived",Survived)))+
geom_treemap()+
geom_treemap_subgroup3_border(color="yellow",size=1)+
geom_treemap_subgroup2_border(color="red",size=3)+
geom_treemap_subgroup_border(color="blue",size=5)+
geom_treemap_subgroup_text(place = "centre", grow = TRUE, alpha = 0.2, colour ="blue",fontface = "bold")+
geom_treemap_subgroup2_text(place = "bottom", grow = FALSE, alpha = 0.2, colour ="red",fontface = "italic")+
geom_treemap_subgroup3_text(place = "left", grow = FALSE, alpha = 0.2, colour ="yellow",fontface = "italic",size=20)+
geom_treemap_text(colour = "white", place = "topleft", reflow = FALSE,size=10)
A sunburst chart is typically used to visualize hierarchical data structures. A sunburst chart is also called wedge stack graph, radial hierarchy, circular bar plot, ring chart, multi-level pie chart, and radial treemap. The sunburst chart consists of an inner circle surrounded by rings of deeper hierarchy levels. The angle of each segment is either proportional to a value or divided equally under its parent node. All segments in sunburst charts may be colored according to which category or hierarchy level they belong to.
library(tidyverse)
library(readxl)
library(sunburstR)
library(RColorBrewer)
Data set: We visualize the data set on Indonesia’s mobile phone market sales in the first half of 2020 (https://www.kaggle.com/kurniakh/marketplace-data).
phone=read.csv("data/phone_sunburst.csv",header = TRUE)
phone_sun=phone[,c("PhoneModel","BrandCountry","Brand","Sold")]
phone_sun$Category=paste(phone$BrandCountry,phone$Brand,phone$PhoneModel,sep="-")
phone_sun=phone_sun[,c("Category","Sold")]
sund2b(phone_sun,
color=colorRampPalette(brewer.pal(11,"Set3"))(35), #add new colors
showLabels = FALSE,
rootLabel = "Total Phone Sold in Indonesia 2020")
A ternary plot, ternary graph, triangle plot, simplex plot, Gibbs triangle or de Finetti diagram is a barycentric plot on three variables which sum to a constant. It graphically depicts the ratios of the three variables as positions in an equilateral triangle. It is used in physical chemistry, petrology, mineralogy, metallurgy, and other physical sciences to show the compositions of systems composed of three species. In population genetics, it is often called a de Finetti diagram. In game theory, it is often called a simplex plot. Ternary plots are tools for analyzing compositional data in the three-dimensional case.
library(ggtern)
We now use a simple example generating from uniform distribution U(0,1) to show a ternary plot
df=data.frame(prop1=0.1,prop2=0.3,prop3=0.6)
df2=data.frame(prop1=0.05,prop2=0.03,prop3=0.92)
df=rbind(df,df2)
g1=ggtern(data=df,mapping=aes(x=prop1,y=prop2,z=prop3))+
geom_point(size=2)+
geom_Tline(Tintercept=c(0.3))+
geom_Lline(Lintercept=c(0.1))+
geom_Rline(Rintercept=c(0.6))
g1
USDA textural classification chart
We are using the “USDA” data set from package ggtern which is issued by the United States Department of Agriculture (USDA) in the form of a ternary diagram.
data("USDA", package = "ggtern")
dfLabels <- plyr::ddply(USDA, "Label", function(df) {
label <- as.character(df$Label[ 1 ])
df$Angle <- switch(label, "Loamy Sand" = -35, 0)
colMeans(df[setdiff(colnames(df), "Label")])
})
f5a<-ggtern(data = USDA, mapping = aes(x = Sand, y = Clay, z = Silt))+ #three axes for Sand、Clay、Silt
geom_polygon(mapping = aes(fill = Label),
alpha = 0.75, size = 0.5, color = "black")+ #add polygons,set the fill color based on Label,transparency as 0.75,size as 0.5,the color of the edge is black
geom_text(data = dfLabels, mapping = aes(label = Label, angle = Angle),
size = 2.5) +#add text which contains dfLabels's Label,Angle as tne angle
theme_rgbw() + #R default background
theme_showsecondary() +#R default background with Scale
theme_showarrows() +
custom_percent("Percent") +#add'percent'
guides(color = "none", fill = "none")+
labs(title = "USDA Textural Classification Chart",#title it。。。
fill = "Textural Class", #the name of color for filling
color = "Textural Class")# the name of color for edges
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
print(f5a)
We can use the functions in ggplot2 package to draw pie chart. We are using the “content_rating” in the dataset of American movies. This variable is about film ratings.
library(ggsci)
cinema=read.csv("data/cinema.csv",header = T)
df <- as.data.frame(table(cinema$content_rating))
df # Generating the table of content_rating
## Var1 Freq
## 1 G 88
## 2 OR 1136
## 3 PG 655
## 4 PG-13 1597
## 5 R 2138
df = df[order(df$Freq, decreasing = TRUE),] ## order the Freq variable in the dataset
Label = as.vector(df$Var1)
Label = paste(Label, "(", round(df$Freq / sum(df$Freq) * 100, 2), "%) ", sep = "")
df = df[order(df$Freq, decreasing = TRUE),]
g1=ggplot(df, aes(x="", y=reorder(Var1,Freq), fill=Var1)) +
geom_bar(stat="identity",width=1) +
coord_polar("y")+
labs(x = "", y = "", title = "") +
scale_fill_lancet()+
theme(axis.text = element_blank(),
axis.ticks = element_blank(),
legend.position = "right")+
geom_text(aes(label = Label), position = position_stack(vjust = 0.5)) + #Add percentage
ggtitle("IMDB MPAA Rating")
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.
g2=ggplot(df,aes(x=Var1,y=Freq, fill=Var1))+
geom_bar(stat="identity",width = 0.8)+
scale_fill_lancet()
grid.arrange(g1,g2,ncol=2,widths = c(2, 1))
library(sqldf)
Single Donut Chart
We use geom_rect() function from ggplot2 package to draw the plot as rectangle then the coord_polar(theta=“y”) can transfer the plot to circle. And we can control the size of the circle by adjusting the x-axis.
The data set we use is the information of American universities.
m<-read.csv("data/college.csv")
summary(m[,c(5:8)])
## region highest_degree control gender
## Length:1269 Length:1269 Length:1269 Length:1269
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
data1<-data.frame(category=c("Associate","Bachelor","Graduate"),#build the data frame
count=c(20,200,1049))
data1$fraction = data1$count / sum(data1$count)#calculate the percentage
data1$ymax = cumsum(data1$fraction)
data1$ymin = c(0, head(data1$ymax, n=-1))
data1$labelPosition <- (data1$ymax + data1$ymin) / 2 #position of the label
data1$label <- paste0(data1$category, "\n ", data1$count,"\n","(",round(data1$fraction*100,1),"%)")
ggplot(data1, aes(ymax=ymax, ymin=ymin, xmax=4, xmin=3, fill=category)) +
geom_rect() + #draw the plot as rectangle
geom_text( x=1.8, aes(y=labelPosition, label=label, color=category), size=3.5) +
scale_fill_brewer(palette=10) +#color of fillment
scale_color_brewer(palette=5) +#color of title
coord_polar(theta="y") +#change to circle
xlim(c(-1, 4)) +#cut the center of the circle
labs(title = "Distribution of the highest degree in universities (The count and percentage)")+
theme_void() +
theme(legend.position = "none")
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.
Multiple Donut Chart
Now we want to show the distribution of private and public universities in different areas.
data2<-sqldf("select region,control,count(control) as control_count from m group by region,control")
data3<-sqldf("select region,count(control) as control_sum from m group by region")
data4<-merge(data2,data3,by= "region")
data4$control_percent<-data4$control_count/data4$control_sum
data4$label1<-c("Midwest"," ","Northeast"," ","South"," ","West"," ")
data4$label2<-paste(data4$control,":\n",data4$control_count,"\n(",round(data4$control_percent*100,1),"%)")
Show the data frame
head(data4)
## region control control_count control_sum control_percent label1
## 1 Midwest Private 246 353 0.6968839 Midwest
## 2 Midwest Public 107 353 0.3031161
## 3 Northeast Private 185 299 0.6187291 Northeast
## 4 Northeast Public 114 299 0.3812709
## 5 South Private 250 459 0.5446623 South
## 6 South Public 209 459 0.4553377
## label2
## 1 Private :\n 246 \n( 69.7 %)
## 2 Public :\n 107 \n( 30.3 %)
## 3 Private :\n 185 \n( 61.9 %)
## 4 Public :\n 114 \n( 38.1 %)
## 5 Private :\n 250 \n( 54.5 %)
## 6 Public :\n 209 \n( 45.5 %)
ggplot(data4, aes(x = region, y = control_percent, fill = control)) +
geom_col() + #
geom_text(aes(label=label2),size=2.2,position=position_stack(vjust = 0.5))+
geom_text(aes(label=label1),size=5,position = position_fill())+
scale_x_discrete(limits = c(" "," "," ", "Midwest","Northeast","South","West")) + #cut the center of the circle
scale_fill_brewer(palette=3) +
coord_polar("y")+ #change to circle
labs(title = "Distribution of Private and Public Universities")+
theme_void()+
theme(legend.position = "none")
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.
A slope graph is a lot like a line graph, but it plots only the change between two time points, without any regard for the points in between. It is based on the idea that humans are good at interpreting changes in directions, i.e., slopes.
Package
library(tidyverse)
library(ggrepel)
library(RColorBrewer)
We are using the data set from the Fortune magazine, which describes the change of Revenue of top 10 Fortune 500 companies in 2018 and 2019.
df <- read.csv("data/top10.csv",header = T)
Name <- factor(df$Name)
df$Revenue <- df$Revenue/1000
q <- ggplot(df,
aes(x = Year,
y = Revenue,
color = Name,
group = Name)) +
geom_line(size=1) +
geom_point(size=2) +
scale_x_continuous(breaks = c(2018, 2019),
labels = c(2018, 2019),
position = "top") +
labs(title= "Revenue of the world's top 10 companies\n(in Million Dollars)") +
theme(aspect.ratio = 2)
q
We add the text to the slopes.
q <- ggplot(df,
aes(x=Year,
y=Revenue,
group=Name,
color=Name)) +
geom_line(size=1) +
geom_point(size=2) +
labs(title= "2018-2019 Revenue of\nworld's top 10 companies",
subtitle="(in Billions)") +
scale_x_continuous(name = "",
position = "top",
breaks = c(2018, 2019),
labels = c("2018", "2019"),
limits = c(2018, 2022)) +
scale_y_continuous(breaks = seq(200, 600, 100),
labels = format(seq(200, 600, 100), scientific = FALSE),
limits = c(200, 600)) +
theme(legend.position = "none",
aspect.ratio = 1.25,
panel.background = element_rect(fill = "white")) +
geom_text_repel(data = df %>% filter(Year == "2019"),
aes(label = Name) ,
hjust = "left",
fontface = "bold",
size = 2.5,
nudge_x = 0.3,
direction = "y") +
scale_color_viridis_d()
q
New York Times’ “Where the 1 Percent Have Gained the Most”
df %>%
mutate(Name = fct_reorder(Name, Revenue, min)) %>%
arrange(Year) %>%
ggplot() +
geom_path(aes(x = Revenue, y = Name),
arrow = arrow(length=unit(0.2,"cm"), type = "closed")) +
geom_text(aes(x = Revenue, y = Name, label = round(Revenue),
hjust = ifelse(Year == 2018, 1.4, -0.4))) +
geom_text(data = df %>% group_by(Name) %>% summarize(ave_Revenue = mean(Revenue)),
aes(x = ave_Revenue, y = Name, label = Name),
vjust = 2,
size = 2) +
coord_cartesian(xlim = c(200, 600)) +
scale_x_continuous(breaks = seq(200, 600, 100),
labels = format(seq(200, 600, 100), scientific = FALSE))
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.
In 1858 nurse, statistician, and reformer Florence NightingaleOffsite Link published Notes on Matters Affecting the Health, Efficiency, and Hospital Administration of the British Army. Founded Chiefly on the Experience of the Late War. Presented by Request to the Secretary of State for War. This privately printed work contained a color statistical graphic entitled “Diagram of the Causes of Mortality in the Army of the EastOffsite Link” which showed that epidemic disease, which was responsible for more British deaths in the course of the Crimean War than battlefield wounds, could be controlled by a variety of factors including nutrition, ventilation, and shelter. The graphic, which Nightingale used as a way to explain complex statistics simply, clearly, and persuasively, has become known as Nightingale’s Rose chart.
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.
Dataset
Period time
We can use the circle in the Nightingale’s rose chart to represent a period time. The data set we used is the number of airport passengers in Taiwan for each month in 2018.
tw=read.csv("data/taiwanairport2018.csv",header = T)
month <- factor(tw$month)
twrose=ggplot(tw, aes(x=reorder(month, monthnum), y=passengers,fill = passengers)) +
geom_col(width = 1, color = 'white') +
scale_fill_gradientn(colors = c("yellow","orange","red")) +
coord_polar() +
theme_bw()+
theme(
panel.grid = element_blank(),
panel.border= element_blank(),
axis.text.y = element_blank(),
axis.text.x = element_blank(),
axis.ticks = element_blank(),
axis.title = element_blank()
) +
labs(
title= "The number of passengers in Taiwan's airport for each month in 2018"
)+
guides(fill=F)+
geom_text(aes(label = paste(month, passengers)),color = "dark blue",vjust = "left", hjust = "outward",fontface="bold", size = 3.5)
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.
twrose
A Radial column Chart is simply a Bar Chart plotted on a polar coordinate system, rather than on a Cartesian one.
Package
library(tidyverse)
library(ggthemes)
Dataset
We are using the 2015.csv which is about the number of cardiovascular patient in Guilin city of 2015
gui<-read.csv("data/2015.csv")
data1<-aggregate(gui[,(12:14)],list(gui$month),sum)
data1<-data1 %>% gather("gender","value",-c(1,2))
head(data1)
## Group.1 cvd gender value
## 1 1 2399 cvdM 1391
## 2 2 2130 cvdM 1232
## 3 3 2873 cvdM 1651
## 4 4 2414 cvdM 1437
## 5 5 2522 cvdM 1436
## 6 6 2102 cvdM 1250
Draw the Radial Column Chart
ggplot(data1,aes(as.factor(Group.1),value,fill=as.factor(gender)))+
geom_col(color="black",position=position_dodge(),width=0.5,size=0.25)+ # draw the bar plot
coord_polar()+ #change to polar system
scale_x_discrete(limits = c(as.factor(1:12)), # the range of x
labels=c("Jan.","Feb.","Mar.","Apr.","May","Jun.", # label of x
"Jul.","Aug.","Sep.","Oct.","Nov","Dec.")) +
#scale_y_continuous(breaks=c(-800,0,500,1000,1500,2000),labels=c("","0","500","1000","1500","2000"))+
ylim(c(-800,2000))+ #the range of y
geom_segment(aes(x=1,y=0,xend=12,yend=0),colour="black")+
guides(fill=guide_legend(title = NULL)) + #delete the title
labs(title = "The number of cardiovascular patients in Guilin city of 2015")+ #add new title
scale_fill_discrete(labels=c("Female","Male"))+
theme(axis.text.x = element_text(size = 10),
axis.title = element_blank(),
axis.line = element_blank(),
legend.text = element_text(size = 8), #addjust the size of the text
legend.key.size = unit(3.6,'mm'), #addjust the size of the plot
legend.position = c(0.5,0.5), #addjust the position of plot
plot.title = element_text(size = 12.5), #addjust size of text of title
#panel.grid.major.y = element_blank(), #add grid in y-axis
panel.grid.major.x = element_line(size = 0.7,linetype = "dotted")
)
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.
## Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
A Cleveland dot plot proposed by William S. Cleveland and Robert McGill (https://www.jstor.org/stable/2288400) is a great alternative to a simple bar chart, particularly if you have more than a few items, in which case a bar chart can easily look cluttered. In the same amount of space, many more values can be included in the Cleveland dot plot, and it is easy to read as well. A Cleveland dot plot typically plots a categorical variable against a numeric variable.
Note that even though the bar plot can visualize whatever the Cleveland dot plot visualizes, the bar plot often costs more data-ink compared to the Cleveland dot plot. Often times, the Cleveland dot plot can be more efficient.
mtcars_revised <- mtcars %>%
arrange(mpg) %>%
mutate(name = row.names(mtcars)) %>%
mutate(name = factor(name, levels = .$name))
ggplot(mtcars_revised, aes(x = mpg, y = reorder(name,-mpg), label = mpg) ) +
geom_point() +
theme_bw() +
theme(
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
panel.grid.major.y = element_line(colour = "grey60", linetype = "dashed")
)
A lollipop chart is a simple modification of the Cleveland dot plot. In addition to the dots, a lollipop chart contains lines that tie each categories to their relative dot, forming lollipops. A lollipop chart is great for comparing multiple categories as it aids the reader in aligning categories to points but minimizes the amount of ink on the graphic.
ggplot(mtcars_revised, aes(x = mpg, y = reorder(name,-mpg), label = mpg) ) +
geom_point() +
geom_text(nudge_x = 1.5) +
geom_segment(aes(x = 0, xend = mpg,
y = name, yend = name), color = "grey50") +
theme_bw()
There are many extensions of the Cleveland dot plot and the lollipop chart.
mpg_revised <- mpg %>% mutate(brandmodel=paste(manufacturer, model)) %>%
group_by(brandmodel) %>%
summarize(avg_hwy=mean(hwy, na.rm=TRUE),
avg_cty=mean(cty, na.rm=TRUE))
ggplot(mpg_revised) +
geom_point(aes(avg_hwy, brandmodel),col="blue") +
geom_point(aes(avg_cty, brandmodel),col="red") +
geom_segment(aes(x = avg_cty, xend = avg_hwy,
y = brandmodel, yend = brandmodel), color = "grey50")+
geom_text(aes(x = avg_cty, y=brandmodel, label = round(avg_cty, 1)), size = 3, hjust = 1.5) +
geom_text(aes(x = avg_hwy, y=brandmodel, label = round(avg_hwy, 1)), size = 3, hjust = -.5)
The waterfalls package is based on ggplot2. We are going to use the function waterfall in the package to draw the plot.
We are using the dataset miga.csv which provides summary income statement from quarterly statements.
Data resource: MIGA Summary Income Statement From World Bank Financial Open Data
library(waterfalls)
miga<-read.csv("data/miga.csv")
miga$Item<-as.character(miga$Item)
waterfall(.data=miga,values =miga$Amount,labels =miga$Item,
calc_total = TRUE,
total_rect_color = "steelblue2",
total_axis_text = "summary income",
rect_border = "white",
fill_by_sign =TRUE)+
coord_flip()
## Warning in waterfall(.data = miga, values = miga$Amount, labels = miga$Item, :
## .data and values and labels supplied, .data ignored
Income:Investment Income;Net Premium Income Expenses:Decrease in Reserves;Administrative expenses