Network Visualization
1 Layout of Vertices Networks in 2D
- 1.1 Co-occurrence Matrix as a Network
2 Heatmap
- 2.1 unweighted heatmap
- 2.2 weighted heatmap
3 Edge Bundling
- 3.1 Edge Bundling (Pride and Prejudice)
- 3.2 Edge Bundling (flare)
4 Arc Diagram
5 Hive Diagram
6 Flow diagram: Sankey Diagram and Chord Diagram
- 6.1 Sankey Diagram
- 6.2 Chord Diagram

Network Visualization

In this chapter, we will discuss the topic of network visualization. Below is a list of packages needed in this chapter.

pkgs <- c(
  "igraph",
  "ggraph",
  "tidygraph",
  "networkD3",
  "heatmaply",
  "dendextend",
  "circlize"
)
missing_pkgs <- pkgs[!(pkgs %in% installed.packages()[, "Package"])]
if (length(missing_pkgs) > 0) {
  install.packages(missing_pkgs)
}

A network (or a graph) shows relationship among a set of entities. In the network, each entity is represented as a vertex, the relationship/connections between vertices are represented by edges. A directed network means its edges come with directions, e.g., twitter following/follower network. An undirected network means its edges have no directions, e.g., facebook friendship network. A network is called a binary network if there is either a connection or no connection between two vertices, e.g., binary edge. A network is called a weighted network if the edges come with weights which describe the strength of the connections.

A network can be denoted as \(G=(V,E)\) where \(V=\{1,...,n\}\) is the vertex set and \(E \subseteq V \times V\) is the edge set. A binary network can be efficiently represented by a \(n \times n\) adjacency matrix \(\bf A\) = \([A_{ij}]_{n \times n}\) where \(A_{ij} = 1\) if there is an edge between vertex \(i\) and vertex \(j\), and 0 otherwise. In the case of weighted network, \(A_{ij}\) represents the edge weight between vertex \(i\) to \(j\). In the case of directed network, \(A_{ij}\) represents the edge weight from vertex \(i\) to \(j\).

1 Layout of Vertices Networks in 2D

A network of \(n\) vertices is represented by a \(n\) by \(n\) matrix, which means the network is high-dimensional data. As we discuss in the previous sections, high-dimensional data is generally much harder to visualize and dimension reduction is often used.

One common approach is to draw the vertices in a two-dimensional space and draw the edges between vertices. A simple planar network can be effectively visualized this way. However, as the number of vertices and edges increase, if not design carefully, these edges may be overlapping with others, making the visualization hard to see. A ideal visualization should be aesthetically pleasing, which means - 1. vertices are evenly distributed. - 2. minimal edge crossing. - 3. edge of roughly same length. - 4. reflect inherent properties of the network. Such a layout is often achieved by the force-directed algorithm.

A force-directed algorithm essentially considers each vertex as a steel ring and each edge as a spring connecting two rings (vertices), hence these rings and springs form a mechanical system. We initially place these rings in the 2D space with a initial layout, and then let go these rings so that the springs force these rings to move. This way, the whole system achieves a minimal energy state. And the location of rings are used for visualization of these vertices.

Based on the idea of force-directed algorithm, there are many variations. In addition to the attractive force induced by springs, another type of force, repulsive force, is added between vertices. The attractive forces exist between vertices connected by edges, and is often modeled according to Hooke’s law. The repulsive forces exist between every pair of all vertices, and is often modeled according to Coulomb’s law (for electrically charged particles).

Such a algorithm is often achieved using simulation. In the simulation, forces are applied on vertices, pulling them in different directions. Such a procedure is repeated until the layout of vertices stablizes. Due to the random initialization of algorithm, each resulting layout may be different.

Force-directed algorithm has many advantages. - It works well with medium size network (500 vertices). - It can be extended to accommodate many other preference. - It is intuitive and easy to understand since it mimics the physical system. - It can be interactive because users can specify the initial layout or change the layout during the iterative algorithm. - It is theoretically justified in statistics and physics.

The disadvantages include - Long running time. - The algorithm may be trapped in local minimums. Hence the final layout can be influenced by the initial layout. As more vertices are included, the issue of local minimum becomes more serious.

We use the social network among the main characters in the book of Pride and Prejudice. We have demonstrated some visualizations on a text network constructed by co-occurrence matrix in the section of text visualization, where each node is a Each edge weight represents the number of co-occurrence of two characters in the same sentence. As this network represents the relationships among a group of people, it is a prefect example of social network.

Let’s re-cap that network quickly.

1.1 Co-occurrence Matrix as a Network

book_cooc <- readRDS(file = "./data/book_cooc.RDS")
book_cooc[1:9,1:9]

##           mrbennet mrbingley elizabeth jane lydia mrsbennet kitty mary mrdarcy
## mrbennet        77         4         0    0     0         0     0    0       0
## mrbingley        4       189        28   23     0        14     0    0      19
## elizabeth        0        28       698   74    16        21    11    0      64
## jane             0        23        74  255     7        12     0    0      11
## lydia            0         0        16    7   131         5    13    0       0
## mrsbennet        0        14        21   12     5       140     7    0       0
## kitty            0         0        11    0    13         7    67    7       0
## mary             0         0         0    0     0         0     7   38       0
## mrdarcy          0        19        64   11     0         0     0    0     352

In the above co-occurrence matrix, each element in the matrix represents the number of sentences that contain both the character names in column and row. For example, there are 74 sentences that contain both Elizabeth and Jane at the same time. Note that we show only the first a few rows and columns of the matrix.

Now we treat each character as a vertex and the number of co-occurrence as the edge weight. We can plug the network using the igraph package.

library(igraph)

## 
## Attaching package: 'igraph'

## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum

## The following object is masked from 'package:base':
## 
##     union

book_net <- graph_from_adjacency_matrix(book_cooc, mode = "undirected", diag = FALSE, weighted = TRUE)
plot(book_net, edge.width = E(book_net)$weight/5) #plot(book_net) #cooc_min <- 4 #book_cooc[book_cooc<4] <- 0

The R ggraph package is also commonly used for network visualization

library(ggraph)

## Loading required package: ggplot2

ggraph(book_net, layout="fr") + 
  geom_edge_link(edge_colour="black", edge_alpha=0.2) +
  geom_node_point( color="blue", size=3) +
  geom_node_text(aes(label=name), repel = TRUE, size=2, color="blue") +
  theme(
    legend.position="none",
    plot.margin=unit(rep(1,4), "cm")
  )

## Warning: Using the `size` aesthetic in this geom was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` in the `default_aes` field and elsewhere instead.

library(ggraph)
g1 = ggraph(book_net, layout="fr") + 
  geom_edge_link(edge_colour="black", edge_alpha=0.2) +
  geom_node_point( color="blue", size=3) +
  geom_node_text(aes(label=name), repel = TRUE, size=4, color="blue") + 
  ggtitle("FR")
g2 = ggraph(book_net, layout="kk", maxiter = 10) + 
  geom_edge_link(edge_colour="black", edge_alpha=0.2) +
  geom_node_point( color="blue", size=3) +
  geom_node_text(aes(label=name), repel = TRUE, size=4, color="blue") + 
  ggtitle("KK")
g3 = ggraph(book_net, layout = 'drl') + 
  geom_edge_link(edge_colour="black", edge_alpha=0.2) +
  geom_node_point( color="blue", size=3) +
  geom_node_text(aes(label=name), repel = TRUE, size=4, color="blue") + 
  ggtitle("DRL")
g4 = ggraph(book_net, layout = 'lgl') + 
  geom_edge_link(edge_colour="black", edge_alpha=0.2) +
  geom_node_point( color="blue", size=3) +
  geom_node_text(aes(label=name), repel = TRUE, size=4, color="blue") + 
  ggtitle("LGL")
g5 = ggraph(book_net, layout = 'graphopt') + 
  geom_edge_link(edge_colour="black", edge_alpha=0.2) +
  geom_node_point( color="blue", size=3) +
  geom_node_text(aes(label=name), repel = TRUE, size=4, color="blue") + 
  ggtitle("Graphopt")
g6 = ggraph(book_net, layout = 'stress') + 
  geom_edge_link(edge_colour="black", edge_alpha=0.2) +
  geom_node_point( color="blue", size=3) +
  geom_node_text(aes(label=name), repel = TRUE, size=4, color="blue") + 
  ggtitle("Stress")
g7 = ggraph(book_net, layout="circle") + 
  geom_edge_link(edge_colour="black", edge_alpha=0.2) +
  geom_node_point( color="blue", size=3) +
  geom_node_text(aes(label=name), repel = TRUE, size=4, color="blue") + 
  ggtitle("Circle")
g8 = ggraph(book_net, layout="focus", focus = 3) + 
  geom_edge_link(edge_colour="black", edge_alpha=0.2) +
  geom_node_point( color="blue", size=3) +
  geom_node_text(aes(label=name), repel = TRUE, size=4, color="blue") + 
  ggtitle("Focus")
g9 = ggraph(book_net, layout="sphere") + 
  geom_edge_link(edge_colour="black", edge_alpha=0.2) +
  geom_node_point( color="blue", size=3) +
  geom_node_text(aes(label=name), repel = TRUE, size=4, color="blue") + 
  ggtitle("Sphere")
g10 = ggraph(book_net, layout="randomly") + 
  geom_edge_link(edge_colour="black", edge_alpha=0.2) +
  geom_node_point( color="blue", size=3) +
  geom_node_text(aes(label=name), repel = TRUE, size=4, color="blue") + 
  ggtitle("Random")
g11 = ggraph(book_net, layout="grid") + 
  geom_edge_link(edge_colour="black", edge_alpha=0.2) +
  geom_node_point( color="blue", size=3) +
  geom_node_text(aes(label=name), repel = TRUE, size=4, color="blue") + 
  ggtitle("Grid")
g12 = ggraph(book_net, layout="linear") + 
  geom_edge_arc(edge_colour="black", edge_alpha=0.2) +
  geom_node_point( color="blue", size=3) +
  geom_node_text(aes(label=name), repel = TRUE, size=4, color="blue") +
  ggtitle("Linear")
g13 = ggraph(book_net, layout="star") + 
  geom_edge_link(edge_colour="black", edge_alpha=0.2) +
  geom_node_point( color="blue", size=3) +
  geom_node_text(aes(label=name), repel = TRUE, size=4, color="blue") +
  ggtitle("Star")
library(gridExtra)
grid.arrange(g1,g2,g3,g4,g5,g6,g7,g8,g9,g10,g11,g12,g13,ncol=2)

The Fruchterman-Reingold layout is a force-directed layout algorithm. The idea of a force directed layout algorithm is to consider a force between any two nodes. In this algorithm, the nodes are represented by steel rings and the edges are springs between them. The attractive force is analogous to the spring force and the repulsive force is analogous to the electrical force. The basic idea is to minimize the energy of the system by moving the nodes and changing the forces between them.

This layout is usually useful for visualizing very large undirected networks.

The Kamada Kawai layout is another force based algorithm that performs very well for connected graphs, but it gives poor results for unconnected ones. Due to

The DrL layout is another force-directed graph layout toolbox focused on real-world large-scale graphs, developed by Shawn Martin and colleagues at Sandia National Laboratories.

The sphere Layout places the vertices (approximately) uniformly on the surface of a sphere, this is thus a 3d layout. The benefit of using this here is very clear. The location of each student is relatively fixed so that we can compare easily between two years.

Or we can directly plot those two network in one plot as the nodes are the same. And this time, we will plt four layouts and try to decide which one make more sense by just comparing them.

Although there is no such thing as “the best layout algorithm” as algorithms have been optimized for different scenarios. Experiment with them and choose the one that is “salty” is sometime helpful!

There are other layout avaliable in ggraph, more detail can be found here: https://www.rdocumentation.org/packages/igraph/versions/0.7.1/topics/layout

2 Heatmap

Another way to visualize the network is visualize the adjacency matrix directly.

library(heatmaply)

## Loading required package: plotly

## 
## Attaching package: 'plotly'

## The following object is masked from 'package:ggplot2':
## 
##     last_plot

## The following object is masked from 'package:igraph':
## 
##     groups

## The following object is masked from 'package:stats':
## 
##     filter

## The following object is masked from 'package:graphics':
## 
##     layout

## Loading required package: viridis

## Loading required package: viridisLite

## 
## ======================
## Welcome to heatmaply version 1.4.2
## 
## Type citation('heatmaply') for how to cite the package.
## Type ?heatmaply for the main documentation.
## 
## The github page is: https://github.com/talgalili/heatmaply/
## Please submit your suggestions and bug-reports at: https://github.com/talgalili/heatmaply/issues
## You may ask questions at stackoverflow, use the r and heatmaply tags: 
##   https://stackoverflow.com/questions/tagged/heatmaply
## ======================

## 
## Attaching package: 'heatmaply'

## The following object is masked from 'package:igraph':
## 
##     normalize

2.1 unweighted heatmap

heatmaply(book_cooc,
        dendrogram = "both",
        xlab = "", ylab = "", 
        main = "",
        scale = "none",
        margins = c(60,100,40,20),
        grid_color = "white",
        grid_width = 0.0000000001,
        titleX = FALSE,
        hide_colorbar = TRUE,
        branches_lwd = 0.1,
        label_names = c("Name", "With:", "Value"),
        fontsize_row = 7, fontsize_col = 7,
        labCol = colnames(book_cooc),
        labRow = rownames(book_cooc),
        heatmap_layers = theme(axis.line=element_blank())
        )

## Warning in doTryCatch(return(expr), name, parentenv, handler): unable to load shared object '/Library/Frameworks/R.framework/Resources/modules//R_X11.so':
##   dlopen(/Library/Frameworks/R.framework/Resources/modules//R_X11.so, 0x0006): Library not loaded: '/opt/X11/lib/libSM.6.dylib'
##   Referenced from: '/Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/modules/R_X11.so'
##   Reason: tried: '/opt/X11/lib/libSM.6.dylib' (no such file), '/Library/Frameworks/R.framework/Resources/lib/libSM.6.dylib' (no such file), '/Library/Java/JavaVirtualMachines/jdk-17.0.1+12/Contents/Home/lib/server/libSM.6.dylib' (no such file)

2.2 weighted heatmap

heatmaply(book_cooc, 
      dendrogram = "none",
      xlab = "", ylab = "", 
      main = "",
      scale = "column",
      margins = c(60,100,40,20),
      grid_color = "white",
      grid_width = 0.00001,
      titleX = FALSE,
      hide_colorbar = TRUE,
      branches_lwd = 0.1,
      label_names = c("From", "To:", "Value"),
      fontsize_row = 7, fontsize_col = 7,
      labCol = colnames(book_cooc),
      labRow = rownames(book_cooc),
      heatmap_layers = theme(axis.line=element_blank())
      )

Or we can even visulize the matrix as another layout using ggraph.

library(tidygraph)

## 
## Attaching package: 'tidygraph'

## The following object is masked from 'package:igraph':
## 
##     groups

## The following object is masked from 'package:stats':
## 
##     filter

ggraph(book_cooc, 'matrix', sort.by = node_rank_leafsort()) + 
  geom_edge_tile(mirror = TRUE) + 
  coord_fixed()

ggraph(book_cooc, 'matrix', sort.by = node_rank_leafsort()) + 
  geom_edge_point() + 
  coord_fixed()

ggraph(book_cooc, 'matrix', sort.by = node_rank_leafsort()) + 
  geom_edge_bend() + 
  coord_fixed()

This visualization is not only straightford. As many network layouts suffer from poor scalability, where edges will eventually begin to overlap to the extend that the plot becomes unintellible, visualizing it as matrix avoids overlapping edges completely.

But at the same time, this visualization shows very different pattern compared to topology plot. Besides, the node order now has a big influence on the look of the plot.

3 Edge Bundling

One question: Remember that we visualized the clusters in the text visualization chapter. Can we visualize the clustering information and the network connections in one plot??

Short answer: we can use Edge Bundling. Edge Bundling allows to visualize adjacency relations between entities organized in a hierarchy. The idea is to bundle the adjacency edges together to decrease the clutter usually observed in complex networks.

3.1 Edge Bundling (Pride and Prejudice)

book_cluster <- readRDS(file = "./data/book_cluster.RDS")
den_hc <- as.dendrogram(book_cluster)
  
ggraph(den_hc, layout = 'dendrogram', circular = TRUE) + 
  geom_edge_link(alpha=0.8) +
  geom_node_text(aes(x = x, y=y, filter = leaf, label=label), size=4, alpha=1) +
  coord_fixed() +
  theme_void() +
  theme(
    legend.position="none",
    plot.margin=unit(c(0,0,0,0),"cm"),
  ) +
  expand_limits(x = c(-1.2, 1.2), y = c(-1.2, 1.2))

library(dendextend)

## 
## ---------------------
## Welcome to dendextend version 1.16.0
## Type citation('dendextend') for how to cite the package.
## 
## Type browseVignettes(package = 'dendextend') for the package vignette.
## The github page is: https://github.com/talgalili/dendextend/
## 
## Suggestions and bug-reports can be submitted at: https://github.com/talgalili/dendextend/issues
## You may ask questions at stackoverflow, use the r and dendextend tags: 
##   https://stackoverflow.com/questions/tagged/dendextend
## 
##  To suppress this message use:  suppressPackageStartupMessages(library(dendextend))
## ---------------------

## 
## Attaching package: 'dendextend'

## The following object is masked from 'package:stats':
## 
##     cutree

book_edge = as_edgelist(book_net)
# The connection object must refer to the ids of the leaves:
from=match(book_edge[,1],get_nodes_attr(den_hc,"label"))
to=match(book_edge[,2],get_nodes_attr(den_hc,"label"))

# Make the plot
ggraph(den_hc,layout='dendrogram',circular=TRUE)+ 
  geom_edge_link(alpha=0.3) +
  geom_conn_bundle(data=get_con(from=from,to=to),alpha= 0.8, colour="#69b3a2") + 
  geom_node_text(aes(x = x, y=y, filter = leaf, label=label), size=4, alpha=1) +
  coord_fixed() +
  theme_void() +
  theme(
    legend.position="none",
    plot.margin=unit(c(0,0,0,0),"cm"),
  ) +
  expand_limits(x = c(-1.2, 1.2), y = c(-1.2, 1.2))

3.2 Edge Bundling (flare)

edges=flare$edges
head(edges)

##                      from                                           to
## 1 flare.analytics.cluster flare.analytics.cluster.AgglomerativeCluster
## 2 flare.analytics.cluster   flare.analytics.cluster.CommunityStructure
## 3 flare.analytics.cluster  flare.analytics.cluster.HierarchicalCluster
## 4 flare.analytics.cluster            flare.analytics.cluster.MergeEdge
## 5   flare.analytics.graph  flare.analytics.graph.BetweennessCentrality
## 6   flare.analytics.graph           flare.analytics.graph.LinkDistance

vertices=flare$vertices%>%arrange(name)%>%mutate(name=factor(name,name))
head(vertices)

##                                           name size            shortName
## 1                                        flare    0                flare
## 2                              flare.analytics    0            analytics
## 3                      flare.analytics.cluster    0              cluster
## 4 flare.analytics.cluster.AgglomerativeCluster 3938 AgglomerativeCluster
## 5   flare.analytics.cluster.CommunityStructure 3812   CommunityStructure
## 6  flare.analytics.cluster.HierarchicalCluster 6714  HierarchicalCluster

#Preparation to draw labels properly:
vertices$id=NA
myleaves=which(is.na(match(vertices$name,edges$from)))
nleaves=length(myleaves)

vertices$id[myleaves]=seq(1:nleaves)
vertices$angle=90-360*vertices$id/nleaves
vertices$hjust=ifelse(vertices$angle < -90, 1,0)
vertices$angle=ifelse(vertices$angle < -90,vertices$angle+180,vertices$angle)
head(vertices)

##                                           name size            shortName id
## 1                                        flare    0                flare NA
## 2                              flare.analytics    0            analytics NA
## 3                      flare.analytics.cluster    0              cluster NA
## 4 flare.analytics.cluster.AgglomerativeCluster 3938 AgglomerativeCluster  1
## 5   flare.analytics.cluster.CommunityStructure 3812   CommunityStructure  2
## 6  flare.analytics.cluster.HierarchicalCluster 6714  HierarchicalCluster  3
##      angle hjust
## 1       NA    NA
## 2       NA    NA
## 3       NA    NA
## 4 88.36364     0
## 5 86.72727     0
## 6 85.09091     0

# Build a network object from this dataset:
mygraph=graph_from_data_frame(edges,vertices=vertices)

The clustering.

# Basic dendrogram
ggraph(mygraph,layout='dendrogram',circular=TRUE)+ 
    geom_edge_link(size=0.4,alpha=0.1)+
    geom_node_text(aes(x=x*1.01,y=y*1.01,filter=leaf,label=shortName,angle=angle-90,hjust=hjust),size=1.5,alpha=0.5) +
    coord_fixed() +
    theme_void() +
    theme(
      legend.position="none",
      plot.margin=unit(c(0,0,0,0),"cm"),
    ) +
    expand_limits(x=c(-1.2, 1.2),y=c(-1.2, 1.2))

## Warning in geom_edge_link(size = 0.4, alpha = 0.1): Ignoring unknown parameters:
## `edge_size`

The network with clustering information.

connections=flare$imports
# The connection object must refer to the ids of the leaves:
from=match(connections$from,vertices$name)
to=match(connections$to,vertices$name)

# Make the plot
ggraph(mygraph,layout='dendrogram',circular=TRUE)+ 
    geom_conn_bundle(data=get_con(from=from,to=to),alpha= 0.1, colour="#69b3a2") + 
    geom_node_text(aes(x=x*1.01,y=y*1.01,filter=leaf,label=shortName,angle = angle-90,hjust=hjust),size=1.5,alpha=1) +
    coord_fixed()+
    theme_void()+
    theme(
      legend.position="none",
      plot.margin=unit(c(0,0,0,0),"cm"),
    ) +
    expand_limits(x = c(-1.2, 1.2), y = c(-1.2, 1.2))

4 Arc Diagram

In arc diagrams, vertices are displayed along a single axis and links are represented by arcs. Compared to the 2D visualization introduced previously, arc diagram displays the label of each vertex clearly, which is often difficult in other 2D layout. Another merit for using the arc diagram is that it can utilize the clustering information if the node order is chosen wisely.

ggraph(book_net, layout="linear")+
  geom_edge_arc(aes(width=weight/60, color=factor(from) ),alpha=0.6,show.legend=F) +
  geom_node_text(aes(label=name), repel=F, size=4.5,angle = 320)

5 Hive Diagram

An extension to arc diagrams is the hive plot, where instead of the nodes being laid out along a single one-dimensional axis they are laid out along multiple axes. This can help reveal more complex clusters (if the nodes represent connected people, imagine for example laying out nodes along axes of both “income” and “enthicity”).

Here’s an example of a hive plot on the pride and prejudice network:

graph <- as_tbl_graph(book_net) %>% 
  mutate(degree = centrality_degree())
age=c("old","young","young","young","young",
          "old","kid","kid","young","young","young",
          "young","old","young","young","old","old")
ggraph(graph, 'hive', axis = age) + 
  geom_edge_hive(colour = 12,label_colour = 2) + 
  geom_axis_hive(aes(colour = age), size = 2, label = FALSE) + 
  geom_node_label(aes(label=name),repel=F, size=2.5) + 
  coord_fixed()

And it is a particularly useful way of visualizing graphs with many nodes and edges that look like a dense “hairball” using traditional graph layouts. Thus, we can plot the hive diagram on the highschool dataset based on their number of friends:

highschool_graph <- as_tbl_graph(highschool) %>% 
  mutate(degree = centrality_degree())

highschool_graph <- highschool_graph %>% 
  mutate(friends = ifelse(
    centrality_degree(mode = 'in') < 5, 'few',
    ifelse(centrality_degree(mode = 'in') >= 15, 'many', 'medium')
  ))
ggraph(highschool_graph, 'hive', axis = friends, sort.by = degree) + 
  geom_edge_hive(aes(colour = factor(year))) + 
  geom_axis_hive(aes(colour = friends), size = 2, label = FALSE) + 
  coord_fixed()

Please note that the inter-connection between node on each axis are ignored in the visualization.

6 Flow diagram: Sankey Diagram and Chord Diagram

Flow diagram is a collective term for a diagram representing a flow or set of dynamic relationships in a system. Some of those flow diagram are actually very helpful in visualizing the network or network-like dataset.

We will use the 1960 - 1970 population migration data, which displays the number of people migrating from one country to another. Data used comes from this publication: https://onlinelibrary.wiley.com/doi/abs/10.1111/imre.12327

6.1 Sankey Diagram

Sankey diagrams are a type of flow diagram in which the width of the arrows is proportional to the flow rate.

Sankey diagrams can also visualize the energy accounts, material flow accounts on a regional or national level, and cost breakdowns. Itemphasize the major transfers or flows within a system. They help locate the most important contributions to a flow. They often show conserved quantities within defined system boundaries.

We will use our old friend, R package networkD3 here.

# Load dataset from github
data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/13_AdjacencyDirectedWeighted.csv", header=TRUE)
# Package
library(networkD3)
library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✔ tibble  3.1.6     ✔ dplyr   1.0.8
## ✔ tidyr   1.2.0     ✔ stringr 1.4.0
## ✔ readr   2.1.2     ✔ forcats 0.5.1
## ✔ purrr   0.3.4

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::as_data_frame() masks tibble::as_data_frame(), igraph::as_data_frame()
## ✖ dplyr::combine()       masks gridExtra::combine()
## ✖ purrr::compose()       masks igraph::compose()
## ✖ tidyr::crossing()      masks igraph::crossing()
## ✖ dplyr::filter()        masks tidygraph::filter(), plotly::filter(), stats::filter()
## ✖ dplyr::groups()        masks tidygraph::groups(), plotly::groups(), igraph::groups()
## ✖ dplyr::lag()           masks stats::lag()
## ✖ purrr::simplify()      masks igraph::simplify()

# I need a long format
data_long <- data %>%
  rownames_to_column %>%
  gather(key = 'key', value = 'value', -rowname) %>%
  filter(value > 0)
colnames(data_long) <- c("source", "target", "value")
data_long$target <- paste(data_long$target, " ", sep="")

# From these flows we need to create a node data frame: it lists every entities involved in the flow
nodes <- data.frame(name=c(as.character(data_long$source), as.character(data_long$target)) %>% unique())
 
# With networkD3, connection must be provided using id, not using real name like in the links dataframe.. So we need to reformat it.
data_long$IDsource=match(data_long$source, nodes$name)-1 
data_long$IDtarget=match(data_long$target, nodes$name)-1

# prepare colour scale
ColourScal ='d3.scaleOrdinal() .range(["#FDE725FF","#B4DE2CFF","#6DCD59FF","#35B779FF","#1F9E89FF","#26828EFF","#31688EFF","#3E4A89FF","#482878FF","#440154FF"])'

# Make the Network
sankeyNetwork(Links = data_long, Nodes = nodes,
                     Source = "IDsource", Target = "IDtarget",
                     Value = "value", NodeID = "name", 
                     sinksRight=FALSE, colourScale=ColourScal, nodeWidth=40, fontSize=13, nodePadding=20)

For non-interactive Sankey plot, one can use R package riverplot.

6.2 Chord Diagram

A chord diagram is a graphical method of displaying the inter-relationships between data in a matrix. The data are arranged radially around a circle with the relationships between the data points typically drawn as arcs connecting the data.

The format can be aesthetically pleasing, making it a popular choice in the world of data visualization.

The primary use of chord diagrams is to show the flows or connections between several entities (called nodes). Each entity is represented by a fragment (often colored or pattered) along the circumference of the circle. Arcs are drawn between entities to show flows (and exchanges in economics). The thickness of the arc is proportional to the significance of the flow.

We will use the R package circlize here:

library(circlize)

## ========================================
## circlize version 0.4.15
## CRAN page: https://cran.r-project.org/package=circlize
## Github page: https://github.com/jokergoo/circlize
## Documentation: https://jokergoo.github.io/circlize_book/book/
## 
## If you use it in published research, please cite:
## Gu, Z. circlize implements and enhances circular visualization
##   in R. Bioinformatics 2014.
## 
## This message can be suppressed by:
##   suppressPackageStartupMessages(library(circlize))
## ========================================

## 
## Attaching package: 'circlize'

## The following object is masked from 'package:igraph':
## 
##     degree

# short names
colnames(data) <- c("Africa", "East Asia", "Europe", "Latin Ame.",   "North Ame.",   "Oceania", "South Asia", "South East Asia", "Soviet Union", "West.Asia")
rownames(data) <- colnames(data)

# I need a long format
data_long <- data %>%
  rownames_to_column %>%
  gather(key = 'key', value = 'value', -rowname)

# parameters
circos.clear()
circos.par(start.degree = 90, gap.degree = 4, track.margin = c(-0.1, 0.1), points.overflow.warning = FALSE)
par(mar = rep(0, 4))

# color palette
mycolor <- viridis(10, alpha = 1, begin = 0, end = 1, option = "D")
mycolor <- mycolor[sample(1:10)]



# Base plot
chordDiagram(
  x = data_long, 
  grid.col = mycolor,
  transparency = 0.25,
  directional = 1,
  direction.type = c("arrows", "diffHeight"), 
  diffHeight  = -0.04,
  annotationTrack = "grid", 
  annotationTrackHeight = c(0.05, 0.1),
  link.arr.type = "big.arrow", 
  link.sort = TRUE, 
  link.largest.ontop = TRUE)

# Add text and axis
circos.trackPlotRegion(
  track.index = 1, 
  bg.border = NA, 
  panel.fun = function(x, y) {
    
    xlim = get.cell.meta.data("xlim")
    sector.index = get.cell.meta.data("sector.index")
    
    # Add names to the sector. 
    circos.text(
      x = mean(xlim), 
      y = 3.2, 
      labels = sector.index, 
      facing = "bending", 
      cex = 0.8
      )

    # Add graduation on axis
    circos.axis(
      h = "top", 
      major.at = seq(from = 0, to = xlim[2], by = ifelse(test = xlim[2]>10, yes = 2, no = 1)), 
      minor.ticks = 1, 
      major.tick.length = 0.5,
      labels.niceFacing = FALSE)
  }
)

Ch11 Network Visualization

Descriptive Analytics and Data Visualization

Yichen Qin (qinyn@ucmail.uc.edu), University of Cincinnati

2023-01-19