fig_3_spice

When Venn diagrams are not enough – Visualizing overlapping data with Social Network Analysis in R

I recently thought about ways to visualize medications and their co-occurences in a group of children. As long as you want to visualize up to  4 different medications you can simply use Venn diagrams. There is a very nice R-package to generate these kind of graphics for you (for a  description see: Chen and Boutros, 2011). But this is of little help here.

The problem I faced involved 29 different medications and 50 children. So my data was stored in a table with 29 columns – one for each medication – and 50 rows – one for each child, so that the cells indicate whether or not the child took the medication.

M <- matrix(sample(0:1, 1450, replace=TRUE, prob=c(0.9,0.1)), nc=29)

The Solution – Social Network Analysis

There are a several R-packages to analyze and visualize social network data – I will focus on “igraph” in this post. The problem I had was that I was not – and probably I am still not –  familiar with the concepts and nomenclature of this field. The key to using the data described above in terms of network analysis was understanding that such data is called an affiliation matrix, where individuals are affiliated with certain events. As “igraph” likes adjacency matrices, where every column and row represents a different node – in our case a medication. The diagonal gives the number of times a medication was given (more information can be found on Daizaburo Shizuka site).

We transform an affilition matrix into an adjacency matrix in R simply by:

adj=M%*%t(M)

Now we can make a first bare-minimum plot:

require(igraph)
g=graph.adjacency(adj,mode=”undirected”, weighted=TRUE,diag=FALSE)
summary(g)
plot(g, main=”The bare minimum”)

 

Adding information and spicing it up a notch

In all likelihood You want to add at least three kinds of  information:

  1. Labels for the nodes
  2. Size of the nodes to represent the total number of events, aka medications
  3. Size of the links to represent the overlap between medications

name<-sample(c(LETTERS, letters, 1:99), 29, replace=TRUE)
number<-diag(adj)*5+5
width<-(E(g)$weight/2)+1
plot(g, main=”A little more information”, vertex.size=number,vertex.label=name,edge.width=width)

 

The “igraph” package lets you adopt quite a few parameters so you should consult with the manual. I only changed some of the colors, layout, fonts, etc.

plot(g, main=”Spice it up a notch”, vertex.size=number, vertex.label=name, edge.width=width, layout=layout.lgl, vertex.color=”red”, edge.color=”darkgrey”, vertex.label.family =”sans”, vertex.label.color=”black”)

 


Here is just the code:

?View Code RSPLUS
require(igraph)
setwd("~/Desktop/")
 
# Generate example data
M <- matrix(sample(0:1, 1450, replace=TRUE, prob=c(0.9,0.1)), nc=29)
 
# Transform matrices
adj=M%*%t(M)
 
# Make a simple plot
g<-graph.adjacency(adj,mode="undirected", weighted=TRUE,diag=FALSE)
summary(g)
plot(g, main="The bare minimum")
 
# Add more information
name<-sample(c(LETTERS, letters, 1:99), 29, replace=TRUE)
number<-diag(adj)*5+5
width<-(E(g)$weight/2)+1
 
plot(g, main="A little more information", vertex.size=number,vertex.label=name,edge.width=width)
 
# Adjust some plotting parameters
plot(g, main="Spice it up a notch", vertex.size=number, vertex.label=name, edge.width=width, layout=layout.lgl, vertex.color="red", edge.color="darkgrey", vertex.label.family ="sans", vertex.label.color="black")

5 thoughts on “When Venn diagrams are not enough – Visualizing overlapping data with Social Network Analysis in R”

  1. Instead of defining a relationship between drugs as a single co-occurrence, you might consider estimating the conditional independence structure with log linear graphical models. This will give you simpler, more interpret-able visualizations.

    A useful book by the people who developed the gRim package for R. A free preview of a relevant chapter of the book. See section 2.4 Model Selection, for an example of the kinds of thing this approach is good for.

  2. My personal preference would be to calculate a correlation matrix and plug it into the heatmap() function. With the network diagram, especially one with so many edges, there’s no practical way to actually investigate the connections between individual nodes. With the heatmap/dendrogram approach, your eyes are able to quickly pick up important relationships between variables.

    1. I should amend my comment to say, for small networks this approach makes sense if you want to see two variables (i.e. node and edge weights). Can’t do that with the heatmap approach (afaik).

  3. Hi,

    just found a small typo. The correct adjacency matrix for medications is generated with:

    adj=t(M)%*%M.

    The one in the script, i.e. adj=M%*%t(M), generates a matrix for the participants.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>