In this chapter, we continue to discuss the commonly used visualization types. The following R packages are required to run the examples in this chapter.
library(tidyverse)
library(ggforce)
library(readxl)
library(sunburstR)
library(RColorBrewer)
library(grid)
library(gridExtra)
library(graphics)
library(vcd)
library(ggrepel)
library(ggsci)
library(ggtern)
library(sqldf)
library(waterfalls)
library(treemapify)
A mosaic plot (also known as a Marimekko diagram) is a graphical method for visualizing data from two or more qualitative variables. It is the multidimensional extension of spineplots, which graphically display the same information for only one variable. It gives an overview of the data and makes it possible to recognize relationships between different variables.
To demonstrate the usage of the mosaic plot, we explore the Titanic data set from the ggplot2. This data set provides the information about the passengers who either survived or died on the Titanic’s maiden voyage. The information include economic status (ticket class), gender, age, and survival status. As we can see in the visualizaiton, people tended to protect women and children. Adult men sacrificed themselves to give children and women opportunities, and the crew sacrificed themselves to give higher-class people opportunities.
library(tidyverse)
titanic=read_csv("data/titanic.csv")
## Rows: 2201 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Class, Sex, Age, Survived
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
titanic
## # A tibble: 2,201 × 4
## Class Sex Age Survived
## <chr> <chr> <chr> <chr>
## 1 3rd Male Child No
## 2 3rd Male Child No
## 3 3rd Male Child No
## 4 3rd Male Child No
## 5 3rd Male Child No
## 6 3rd Male Child No
## 7 3rd Male Child No
## 8 3rd Male Child No
## 9 3rd Male Child No
## 10 3rd Male Child No
## # ℹ 2,191 more rows
mosaicplot(~ Class + Survived , data = titanic,
main = "Survival on the Titanic", color = TRUE)
mosaicplot(~ Class + Sex + Survived , data = titanic,
main = "Survival on the Titanic", color = TRUE)
mosaicplot(~ Class + Sex + Age + Survived, data = titanic,
main = "Survival on the Titanic", color = TRUE)
#mosaicplot(~ Sex + Age + Class + Survived, data = titanic, color = TRUE)
There are many choices in R to draw mosaic plots. We can use geom_rect() in ggplot2 or geom_mosaic() in ggmosaic or mosaicplot() in graphics or mosaic() in R package vcd. Additional examples of the mosaic plot are as follows.
library(vcd)
mosaic(~ Sex + Class + Age + Survived, data = titanic,
main = "Survival on the Titanic", shade = TRUE, legend = TRUE)
assoc(Titanic, shade=TRUE, legend=TRUE)
data001 <-read.csv("data/cinema.csv")
mosaicplot(~ year + release_date, data =data001,
shade = T, color = T, main ="cinema" ) +
theme(axis.text.x=element_text(angle=-45, hjust= .1))
A parallel sets plot is a new method for the visualization and interactive exploration of categorical data that shows data frequencies instead of the individual data points. The method is based on the axis layout of parallel coordinates, with boxes representing the categories and parallelograms between the axes showing the relations between categories.
We are using the Titanic data set from the ggplot2. This data set provides the information about the passengers on the Titanic’s maiden voyage, including their ticket class (economic status), gender, age, and survival status. Note that we have to reorganize the data set before using the parallel sets plot.
library(ggforce) # Package
titanic=read_csv("data/titanic.csv")
## Rows: 2201 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Class, Sex, Age, Survived
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
titanic_freq = titanic %>%
count(Class, Sex, Age, Survived, .drop = FALSE)
parallel_data14 <- gather_set_data(titanic_freq, c(1,4))
head(parallel_data14)
## # A tibble: 6 × 8
## Class Sex Age Survived n id x y
## <chr> <chr> <chr> <chr> <int> <int> <chr> <chr>
## 1 1st Female Adult No 4 1 Class 1st
## 2 1st Female Adult Yes 140 2 Class 1st
## 3 1st Female Child Yes 1 3 Class 1st
## 4 1st Male Adult No 118 4 Class 1st
## 5 1st Male Adult Yes 57 5 Class 1st
## 6 1st Male Child Yes 5 6 Class 1st
ggplot(parallel_data14,
aes(x = x,
id = id,
split = y,
value = n
)) +
xlab("Covariates") +
geom_parallel_sets(alpha = 0.3, axis.width = 0.2) +
geom_parallel_sets_axes(axis.width = 0.2) +
geom_parallel_sets_labels(color = 'white',size=3)
parallel_data1234 <- gather_set_data(titanic_freq, 1:4)
ggplot(parallel_data1234, aes(x=factor(x, levels = c("Class", "Age","Sex","Survived")),
# we can control the order of the covariates in levels = xxx
id = id, split = y, value = n)) +
geom_parallel_sets(alpha = 0.3, axis.width = 0.2) +
geom_parallel_sets_axes(axis.width = 0.2) +
geom_parallel_sets_labels(color = 'white',size=3) +
xlab("Covariate")
If we would like to emphasize on the covariates’ association with survived.
ggplot(parallel_data1234, aes(x=factor(x, levels = c("Class", "Age", "Sex", "Survived")),
id = id, split = y, value = n)) +
xlab("Covariates")+
geom_parallel_sets(aes(fill = Survived), alpha = 0.3, axis.width = 0.2) +
geom_parallel_sets_axes(axis.width = 0.2) +
geom_parallel_sets_labels(color = 'white',size=3)
Note that the data is now organized in a way that can be directly plot. The last four columns are the most important part. Note there are 32 different groups of passengers, i.e., 32 group = 4 class * 2 genders * 2 ages * 2 survival status. Therefore, the variable id marks these 32 groups. For each group, the variable value stores how many passengers in this group. The variables x and y indicates the group label, i.e., whether is a male group or female group, child group or adult group. This whole process is repeated four times because we have in total four categorical variables.
The figure above describes the data and the number of passengers in each category. In addition, it displays how two category variables interact with each other. For example, age and class, class and gender, and gender and survival.
However, what if we want to see how survival interacts with all other variables? Since the survival status is the most important variable, we can move it to another dimension, the color of the bands.
Treemaps display hierarchical (tree-structured) data as a set of nested rectangles. Each branch of the tree is given a rectangle, which is then tiled with smaller rectangles representing subbranches. A leaf node’s rectangle has an area proportional to a specified dimension of the data.Often the leaf nodes are colored to show a separate dimension of the data.
We are analyzing the data set about Indonesia’s mobile phone market sales in the first half of 2020. The data is available at: https://www.kaggle.com/kurniakh/marketplace-data
library(tidyverse)
library(treemapify)
phone<-read.csv("data/phone.csv")
head(phone)
## type rmb sold brand region lnrmb
## 1 Apple iPhone 3459.90 597 Apple not-china 8.148995
## 2 Apple iPhone 11 6919.89 6205 Apple not-china 8.842155
## 3 Apple iPhone 11 Pro 8601.13 4473 Apple not-china 9.059649
## 4 Apple iPhone 11 Pro Max 10402.09 7087 Apple not-china 9.249762
## 5 Apple iPhone 4 787.20 130 Apple not-china 6.668482
## 6 Apple iPhone 4 CDMA 945.50 8 Apple not-china 6.851714
ggplot(phone,
aes(area = sold,
subgroup = region,
subgroup2 = brand,
subgroup3 = type,
fill=rmb))+
geom_treemap()+
geom_treemap_subgroup3_border(color="white",size=1)+
geom_treemap_subgroup2_border(color="red",size=2)+
geom_treemap_subgroup_border(color="blue",size=3)+
#geom_treemap_subgroup_text(place = "centre", grow = TRUE, alpha = 0.5, colour ="white")+
geom_treemap_subgroup2_text(place = "bottom", grow = TRUE, alpha = 0.3, colour ="red")+
geom_treemap_text(aes(label=type),colour = "white", place = "topleft", reflow = TRUE, size=10)+
scale_fill_distiller(palette="Blues",
name="phone\nprice\n(RMB)",
breaks = log(c(250, 500, 1000, 2000, 4000, 8000, 16000)),
labels = c(250, 500, 1000, 2000, 4000, 8000, 16000),
trans = "log10")
area=sold,the area is the sales volume of the mobile phone.
fill=rmb,the shade of the color represents the price (the price takes the logarithm ) of the phone. The cheaper price, the darker color.
subgroup=region,According to the origin of mobile phone brands, it is divided into two types: China (left of the black line) and not-China. As we can be seen from the figure, the Indonesian mobile phone market does not have local brands. Most of them are Chinese brands.
subgroup2=brand, Chinese brands include Xiaomi, oppo, vivo, realme, etc., and non-Chinese brands include Apple, Samsung and Nokia.
subgroup3=type, based on the type of phone models, the most popular models in Indonesia are mainly cheaper models.
We can use tree map to visualize the titanic data set, however, it is less efficient than mosaic plot.
titanic_df=as.data.frame(table(titanic[,c("Class","Sex","Survived")]))
ggplot(titanic_df,
aes(area = Freq,
subgroup = Class,
subgroup2 = Sex,
subgroup3 = Survived,
label=paste("Class",Class,Sex,"Survived",Survived)))+
geom_treemap()+
geom_treemap_subgroup3_border(color="yellow",size=1)+
geom_treemap_subgroup2_border(color="red",size=3)+
geom_treemap_subgroup_border(color="blue",size=5)+
geom_treemap_subgroup_text(place = "centre", grow = TRUE, alpha = 0.2, colour ="blue",fontface = "bold")+
geom_treemap_subgroup2_text(place = "bottom", grow = FALSE, alpha = 0.2, colour ="red",fontface = "italic")+
#geom_treemap_subgroup3_text(place = "top", grow = FALSE, alpha = 0.2, colour ="white",fontface = "italic")+
geom_treemap_text(colour = "yellow", place = "topleft", reflow = FALSE,size=10)
# alternatively
titanic_df2=as.data.frame(table(titanic))
ggplot(titanic_df2,
aes(area = Freq,
subgroup = Class,
subgroup2 = Sex,
subgroup3 = Age,
subgroup4 = Survived,
label=paste("Class",Class,Sex,Age,"Survived",Survived)))+
geom_treemap()+
geom_treemap_subgroup3_border(color="yellow",size=1)+
geom_treemap_subgroup2_border(color="red",size=3)+
geom_treemap_subgroup_border(color="blue",size=5)+
geom_treemap_subgroup_text(place = "centre", grow = TRUE, alpha = 0.2, colour ="blue",fontface = "bold")+
geom_treemap_subgroup2_text(place = "bottom", grow = FALSE, alpha = 0.2, colour ="red",fontface = "italic")+
geom_treemap_subgroup3_text(place = "left", grow = FALSE, alpha = 0.2, colour ="yellow",fontface = "italic",size=20)+
geom_treemap_text(colour = "white", place = "topleft", reflow = FALSE,size=10)
A sunburst chart is typically used to visualize hierarchical data structures. A sunburst chart is also called wedge stack graph, radial hierarchy, circular bar plot, ring chart, multi-level pie chart, and radial treemap. The sunburst chart consists of an inner circle surrounded by rings of deeper hierarchy levels. The angle of each segment is either proportional to a value or divided equally under its parent node. All segments in sunburst charts may be colored according to which category or hierarchy level they belong to.
library(tidyverse)
library(sunburstR)
library(RColorBrewer)
Data set: We visualize the data set on Indonesia’s mobile phone market sales in the first half of 2020 (https://www.kaggle.com/kurniakh/marketplace-data). Note this is a different data set from the previous treemap example.
phone=read.csv("data/phone_sunburst.csv",header = TRUE)
phone_sun=phone[,c("PhoneModel","BrandCountry","Brand","Sold")]
phone_sun$Category=paste(phone$BrandCountry,phone$Brand,phone$PhoneModel,sep="-")
phone_sun=phone_sun[,c("Category","Sold")]
sund2b(phone_sun,
color = colorRampPalette(brewer.pal(11,"Set3"))(35), #add new colors
showLabels = FALSE,
rootLabel = "Total Phone Sold in Indonesia 2020")
A ternary plot, ternary graph, triangle plot, simplex plot, Gibbs triangle or de Finetti diagram is a barycentric plot on three variables which sum to a constant. It graphically depicts the ratios of the three variables as positions in an equilateral triangle. It is used in physical chemistry, petrology, mineralogy, metallurgy, and other physical sciences to show the compositions of systems composed of three species. In population genetics, it is often called a de Finetti diagram. In game theory, it is often called a simplex plot. Ternary plots are tools for analyzing compositional data in the three-dimensional case.
library(ggtern)
We now use a simple example generating from uniform distribution U(0,1) to show a ternary plot
df=data.frame(prop1=0.1,prop2=0.3,prop3=0.6)
df2=data.frame(prop1=0.05,prop2=0.03,prop3=0.92)
df=rbind(df,df2)
g1=ggtern(data=df,mapping=aes(x=prop1,y=prop2,z=prop3))+
geom_point(size=2)+
geom_Tline(Tintercept=c(0.3))+
geom_Lline(Lintercept=c(0.1))+
geom_Rline(Rintercept=c(0.6))
g1
USDA textural classification chart
We are using the “USDA” data set from package ggtern which is issued by the United States Department of Agriculture (USDA) in the form of a ternary diagram.
data("USDA", package = "ggtern")
dfLabels <- plyr::ddply(USDA, "Label", function(df) {
label <- as.character(df$Label[ 1 ])
df$Angle <- switch(label, "Loamy Sand" = -35, 0)
colMeans(df[setdiff(colnames(df), "Label")])
})
f5a<-ggtern(data = USDA, mapping = aes(x = Sand, y = Clay, z = Silt))+ #three axes for Sand、Clay、Silt
geom_polygon(mapping = aes(fill = Label),
alpha = 0.75, size = 0.5, color = "black")+ #add polygons,set the fill color based on Label,transparency as 0.75,size as 0.5,the color of the edge is black
geom_text(data = dfLabels, mapping = aes(label = Label, angle = Angle),
size = 2.5) +#add text which contains dfLabels's Label,Angle as tne angle
theme_rgbw() + #R default background
theme_showsecondary() +#R default background with Scale
theme_showarrows() +
custom_percent("Percent") +#add'percent'
guides(color = "none", fill = "none")+
labs(title = "USDA Textural Classification Chart",#title it。。。
fill = "Textural Class", #the name of color for filling
color = "Textural Class")# the name of color for edges
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
print(f5a)
## Warning: Removing Layer 2 ('PositionNudge'), as it is not an approved position
## (for ternary plots) under the present ggtern package.
## Ignoring unknown labels:
## • colour : "Textural Class"
## • W : "Percent"
library(tidyverse)
college = read_csv("data/college.csv")
## Rows: 1269 Columns: 17
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): name, city, state, region, highest_degree, control, gender
## dbl (10): id, admission_rate, sat_avg, undergrads, tuition, faculty_salary_a...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
college %>%
group_by(region) %>%
summarize(count = n()) %>%
arrange(count) %>%
ggplot(aes(x = 1,
y = count,
fill = region)) +
geom_bar(stat = "identity",
width = 1) +
geom_text(aes(label = paste0(count, "\n(",
round(count / sum(count) * 100, 1),
"%)")),
position = position_stack(vjust = 0.5)) +
coord_polar(theta = "y") +
theme_void()
Single Donut Chart
We use geom_rect() function from ggplot2 package to draw the plot as rectangle then the coord_polar(theta=“y”) can transfer the plot to circle. And we can control the size of the circle by adjusting the x-axis.
college %>%
group_by(region) %>%
summarize(count = n(),
fraction = n()/nrow(college) ) %>%
mutate(max = cumsum(fraction),
min = cumsum(fraction) - fraction,
label_position = (max + min)/2,
label = paste0(region, "\n", count, "\n", round(fraction*100,1), "%")) %>%
ggplot(aes(ymax = max,
ymin = min,
xmax = 4,
xmin = 3,
fill = region)) +
geom_rect() +
geom_text(aes(y = label_position,
label = label),
x = 3.5) +
coord_polar(theta = "y") +
xlim(c(0, 4)) +
theme_void()
college %>%
group_by(region) %>%
summarize(count = n(),
fraction = n()/nrow(college) ) %>%
arrange(fraction) %>%
ggplot(aes(y = fraction,
x = 3,
fill = region)) +
geom_bar(stat = "identity",
width = 1) +
geom_text(aes(label = round(fraction,2)),
position = position_stack(vjust = 0.5)) +
coord_polar(theta = "y") +
xlim(c(0, 4)) +
theme_void()
Nested donut chart
college %>%
ggplot() +
geom_bar(aes(x = region,
fill = control),
position = "fill") +
geom_text(aes(x = region,
label = region),
y = 0) +
coord_polar(theta = "y") +
theme_void()
A slope graph is a lot like a line graph, but it plots only the change between two time points, without any regard for the points in between. It is based on the idea that humans are good at interpreting changes in directions, i.e., slopes.
Package
library(tidyverse)
library(ggrepel)
library(RColorBrewer)
We are using the data set from the Fortune magazine, which describes the change of Revenue of top 10 Fortune 500 companies in 2018 and 2019.
df <- read.csv("data/top10.csv",header = T)
df %>%
mutate(Name = factor(Name),
Revenue = Revenue/1000) %>%
ggplot(aes(x = Year,
y=Revenue,
group=Name,
color=Name)) +
geom_line(size=1) +
geom_point(size=2) +
labs(title= "2018-2019 Revenue of\nworld's top 10 companies",
subtitle="(in Billions)") +
scale_x_continuous(name = "",
position = "top",
breaks = c(2018, 2019),
labels = c("2018", "2019"),
limits = c(2018, 2020)) +
scale_y_continuous(breaks = seq(200, 600, 100),
labels = format(seq(200, 600, 100), scientific = FALSE),
limits = c(200, 600)) +
theme(legend.position = "none",
aspect.ratio = 1.25,
panel.background = element_rect(fill = "white")) +
geom_text_repel(data = df %>% mutate(Revenue = Revenue/1000) %>% filter(Year == "2019"),
aes(label = Name) ,
hjust = "left",
fontface = "bold",
size = 2.5,
nudge_x = 0.3,
direction = "y") +
scale_color_viridis_d()
New York Times’ “Where the 1 Percent Have Gained the Most”
df %>%
mutate(Name = factor(Name),
Revenue = Revenue/1000) %>%
mutate(Name = fct_reorder(Name, Revenue, min)) %>%
arrange(Year) %>%
ggplot() +
geom_path(aes(x = Revenue, y = Name),
arrow = arrow(length=unit(0.2,"cm"), type = "closed")) +
geom_text(aes(x = Revenue, y = Name, label = round(Revenue),
hjust = ifelse(Year == 2018, 1.4, -0.4))) +
geom_text(data = df %>%
mutate(Revenue = Revenue/1000) %>%
group_by(Name) %>%
summarize(ave_Revenue = mean(Revenue)),
aes(x = ave_Revenue, y = Name, label = Name),
vjust = 2,
size = 2) +
coord_cartesian(xlim = c(200, 600)) +
scale_x_continuous(breaks = seq(200, 600, 100),
labels = format(seq(200, 600, 100), scientific = FALSE)) +
theme(axis.text.y = element_blank(),
axis.ticks.y = element_blank())
In 1858 nurse, statistician, and reformer Florence NightingaleOffsite Link published Notes on Matters Affecting the Health, Efficiency, and Hospital Administration of the British Army. Founded Chiefly on the Experience of the Late War. Presented by Request to the Secretary of State for War. This privately printed work contained a color statistical graphic entitled “Diagram of the Causes of Mortality in the Army of the EastOffsite Link” which showed that epidemic disease, which was responsible for more British deaths in the course of the Crimean War than battlefield wounds, could be controlled by a variety of factors including nutrition, ventilation, and shelter. The graphic, which Nightingale used as a way to explain complex statistics simply, clearly, and persuasively, has become known as Nightingale’s Rose chart.
library(tidyverse)
Nightingale = read_csv("data/Nightingale.csv")
## Rows: 24 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Month
## dbl (8): Year, Army, Disease, Wounds, Other, Disease.rate, Wounds.rate, Oth...
## date (1): Date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(Nightingale)
## # A tibble: 6 × 10
## Date Month Year Army Disease Wounds Other Disease.rate Wounds.rate
## <date> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1854-04-01 Apr 1854 8571 1 0 5 1.4 0
## 2 1854-05-01 May 1854 23333 12 0 9 6.2 0
## 3 1854-06-01 Jun 1854 28333 11 0 6 4.7 0
## 4 1854-07-01 Jul 1854 28722 359 0 23 150 0
## 5 1854-08-01 Aug 1854 30246 828 1 30 328. 0.4
## 6 1854-09-01 Sep 1854 30290 788 81 70 312. 32.1
## # ℹ 1 more variable: Other.rate <dbl>
Nightingale %>%
select(Date, Month, Year, contains("rate")) %>%
pivot_longer(cols = 4:6, names_to = "Cause", values_to = "Rate") %>%
mutate(Cause = gsub(".rate", "", Cause),
period = ifelse(Date <= as.Date("1855-03-01"),
"April 1854 to March 1855",
"April 1855 to March 1856"),
Month = fct_relevel(Month,
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec",
"Jan", "Feb", "Mar", "Apr", "May", "Jun")) %>%
ggplot(aes(Month, Rate)) +
geom_col(aes(fill = Cause), width = 1, position = "identity") +
coord_polar() +
facet_wrap(~period) +
scale_fill_manual(values = c("skyblue3", "grey30", "firebrick")) +
scale_y_sqrt() +
theme_void() +
theme(axis.text.x = element_text(size = 9),
strip.text = element_text(size = 11),
legend.position = "bottom",
plot.background = element_rect(fill = alpha("cornsilk", 0.5)),
plot.margin = unit(c(10, 10, 10, 10), "pt"),
plot.title = element_text(vjust = 5)) +
ggtitle("Diagram of the Causes of Mortality in the Army in the East")
A Cleveland dot plot proposed by William S. Cleveland and Robert McGill (https://www.jstor.org/stable/2288400) is a great alternative to a simple bar chart, particularly if you have more than a few items, in which case a bar chart can easily look cluttered. In the same amount of space, many more values can be included in the Cleveland dot plot, and it is easy to read as well. A Cleveland dot plot typically plots a categorical variable against a numeric variable.
Note that even though the bar plot can visualize whatever the Cleveland dot plot visualizes, the bar plot often costs more data-ink compared to the Cleveland dot plot. Often times, the Cleveland dot plot can be more efficient.
mtcars_revised <- mtcars %>%
arrange(mpg) %>%
mutate(name = row.names(mtcars)) %>%
mutate(name = factor(name, levels = .$name))
ggplot(mtcars_revised, aes(x = mpg, y = reorder(name,-mpg), label = mpg) ) +
geom_point() +
theme_bw() +
theme(
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
panel.grid.major.y = element_line(colour = "grey60", linetype = "dashed")
)
A lollipop chart is a simple modification of the Cleveland dot plot. In addition to the dots, a lollipop chart contains lines that tie each categories to their relative dot, forming lollipops. A lollipop chart is great for comparing multiple categories as it aids the reader in aligning categories to points but minimizes the amount of ink on the graphic.
ggplot(mtcars_revised, aes(x = mpg, y = reorder(name,-mpg), label = mpg) ) +
geom_point() +
geom_text(nudge_x = 1.5) +
geom_segment(aes(x = 0, xend = mpg,
y = name, yend = name), color = "grey50") +
theme_bw()
There are many extensions of the Cleveland dot plot and the lollipop chart.
mpg_revised <- mpg %>% mutate(brandmodel=paste(manufacturer, model)) %>%
group_by(brandmodel) %>%
summarize(avg_hwy=mean(hwy, na.rm=TRUE),
avg_cty=mean(cty, na.rm=TRUE))
ggplot(mpg_revised) +
geom_point(aes(avg_hwy, brandmodel),col="blue") +
geom_point(aes(avg_cty, brandmodel),col="red") +
geom_segment(aes(x = avg_cty, xend = avg_hwy,
y = brandmodel, yend = brandmodel), color = "grey50")+
geom_text(aes(x = avg_cty, y=brandmodel, label = round(avg_cty, 1)), size = 3, hjust = 1.5) +
geom_text(aes(x = avg_hwy, y=brandmodel, label = round(avg_hwy, 1)), size = 3, hjust = -.5)
The waterfalls package is based on ggplot2. We are going to use the function waterfall in the package to draw the plot.
We are using the dataset miga.csv which provides summary income statement from quarterly statements.
Data resource: MIGA Summary Income Statement From World Bank Financial Open Data
library(waterfalls)
miga<-read.csv("data/miga.csv")
miga$Item<-as.character(miga$Item)
waterfall(.data=miga,values =miga$Amount,labels =miga$Item,
calc_total = TRUE,
total_rect_color = "steelblue2",
total_axis_text = "summary income",
rect_border = "white",
fill_by_sign =TRUE)+
coord_flip()
## Warning in waterfall(.data = miga, values = miga$Amount, labels = miga$Item, :
## .data and values and labels supplied, .data ignored
## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## ℹ The deprecated feature was likely used in the waterfalls package.
## Please report the issue to the authors.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Income:Investment Income;Net Premium Income Expenses:Decrease in Reserves;Administrative expenses