Category Archives: Visualization / Visualisierungen

Evolution of a logistic regression

In my last post I showed how one can easily summarize the outcome of a logistic regression. Here I want to show how this really depends on the data-points that are used to estimate the model. Taking a cue from the evolution of a correlation I have plotted the estimated Odds Ratios (ORs) depending on the number of included participants. The result is bad news for those working with small (< 750 participants) data-sets.

evolution

 

“eval_reg” Function to estimate model parameters for subsets of data 

eval_reg<-function(model){
mod<-model
dat<-mod$data[sample(nrow(mod$data)),]
vars<-names(coef(mod))
est<-data.frame(matrix(nrow=nrow(dat), ncol=length(vars)))
pb <- txtProgressBar(min = 50, max = nrow(dat), style = 3)

for(i in 50:nrow(dat)){
try(boot_mod<-update(mod, data=dat[1:i,]))
try(est[i,]<-exp(coef(boot_mod)))
setTxtProgressBar(pb, i)
}
est$mod_nr<-1:length(dat[,1])
names(est)<-c(vars, ‘mod_nr’)
return(est)
}

As I randomized the order of data you can run it again and again to arrive at an even deeper mistrust as some of the resulting permutations will look like they stabilize earlier. On the balance you need to set the random-number seed to make it reproducible.

Run and plot the development

set.seed(29012001)

mod_eval<-eval_reg(gp_mod)

tmp<-melt(mod_eval,id=’mod_nr’)
tmp2<-tmp[tmp$variable!='(Intercept)',]

ticks<-c(seq(.1, 1, by =.1), seq(0, 10, by =1), seq(10, 100, by =10))

ggplot(tmp2, aes(y=value, x = mod_nr, color = variable)) +
geom_line() +
geom_hline(y=1, linetype=2) +
labs(title = ‘Evolution of logistic regression’, y = ‘OR’, x = ‘number of participants’) +
scale_y_log10(breaks=ticks, labels = as.character(ticks)) +
theme_bw()

Update 29-01-2013:

I added my definition of the ticks on the log-scale. The packages needed are ggplot2 and reshape.

Plotting Odds Ratios (aka a forrestplot) with ggplot2 –

Hi,

if you like me work in medical research, you have to plot the results of multiple logistic regressions every once in a while. As I have not yet found a great solution to make these plots I have put together the following short skript. Do not expect too much, it’s more of a reminder to my future self than some mind-boggling new invention. The code can be found below the resulting figure looks like this:

fig_1_odds

Here comes the code. It takes the model and optionally a title as an input and generates the above plot.

 

 

plot_odds<-function(x, title = NULL){
tmp<-data.frame(cbind(exp(coef(x)), exp(confint(x))))
odds<-tmp[-1,]
names(odds)<-c(‘OR’, ‘lower’, ‘upper’)
odds$vars<-row.names(odds)
ticks<-c(seq(.1, 1, by =.1), seq(0, 10, by =1), seq(10, 100, by =10))

ggplot(odds, aes(y= OR, x = reorder(vars, OR))) +
geom_point() +
geom_errorbar(aes(ymin=lower, ymax=upper), width=.2) +
scale_y_log10(breaks=ticks, labels = ticks) +
geom_hline(yintercept = 1, linetype=2) +
coord_flip() +
labs(title = title, x = ‘Variables’, y = ‘OR’) +
theme_bw()
}

 

P.s. I know about ggplots “annotation_logticks” but they messed up my graphics, also it is not very often that ORs span more than three orders of magnitude. If they do consider playing with ggplots function or update the line beginning with “ticks <- ” in the above example

Update 29-01-2013: I replaced the nasty ” as they resulted in some nasty copy-past errors…

fig_3_spice

When Venn diagrams are not enough – Visualizing overlapping data with Social Network Analysis in R

I recently thought about ways to visualize medications and their co-occurences in a group of children. As long as you want to visualize up to  4 different medications you can simply use Venn diagrams. There is a very nice R-package to generate these kind of graphics for you (for a  description see: Chen and Boutros, 2011). But this is of little help here.

The problem I faced involved 29 different medications and 50 children. So my data was stored in a table with 29 columns – one for each medication – and 50 rows – one for each child, so that the cells indicate whether or not the child took the medication.

M <- matrix(sample(0:1, 1450, replace=TRUE, prob=c(0.9,0.1)), nc=29)

The Solution – Social Network Analysis

There are a several R-packages to analyze and visualize social network data – I will focus on “igraph” in this post. The problem I had was that I was not – and probably I am still not –  familiar with the concepts and nomenclature of this field. The key to using the data described above in terms of network analysis was understanding that such data is called an affiliation matrix, where individuals are affiliated with certain events. As “igraph” likes adjacency matrices, where every column and row represents a different node – in our case a medication. The diagonal gives the number of times a medication was given (more information can be found on Daizaburo Shizuka site).

We transform an affilition matrix into an adjacency matrix in R simply by:

adj=M%*%t(M)

Now we can make a first bare-minimum plot:

require(igraph)
g=graph.adjacency(adj,mode=”undirected”, weighted=TRUE,diag=FALSE)
summary(g)
plot(g, main=”The bare minimum”)

 

Adding information and spicing it up a notch

In all likelihood You want to add at least three kinds of  information:

  1. Labels for the nodes
  2. Size of the nodes to represent the total number of events, aka medications
  3. Size of the links to represent the overlap between medications

name<-sample(c(LETTERS, letters, 1:99), 29, replace=TRUE)
number<-diag(adj)*5+5
width<-(E(g)$weight/2)+1
plot(g, main=”A little more information”, vertex.size=number,vertex.label=name,edge.width=width)

 

The “igraph” package lets you adopt quite a few parameters so you should consult with the manual. I only changed some of the colors, layout, fonts, etc.

plot(g, main=”Spice it up a notch”, vertex.size=number, vertex.label=name, edge.width=width, layout=layout.lgl, vertex.color=”red”, edge.color=”darkgrey”, vertex.label.family =”sans”, vertex.label.color=”black”)

 


Here is just the code:

?View Code RSPLUS
require(igraph)
setwd("~/Desktop/")
 
# Generate example data
M <- matrix(sample(0:1, 1450, replace=TRUE, prob=c(0.9,0.1)), nc=29)
 
# Transform matrices
adj=M%*%t(M)
 
# Make a simple plot
g<-graph.adjacency(adj,mode="undirected", weighted=TRUE,diag=FALSE)
summary(g)
plot(g, main="The bare minimum")
 
# Add more information
name<-sample(c(LETTERS, letters, 1:99), 29, replace=TRUE)
number<-diag(adj)*5+5
width<-(E(g)$weight/2)+1
 
plot(g, main="A little more information", vertex.size=number,vertex.label=name,edge.width=width)
 
# Adjust some plotting parameters
plot(g, main="Spice it up a notch", vertex.size=number, vertex.label=name, edge.width=width, layout=layout.lgl, vertex.color="red", edge.color="darkgrey", vertex.label.family ="sans", vertex.label.color="black")
test

Visualizing GIS data with R and Open Street Map

In this post I way to share with you some code to use Openstreetmap – maps as a backdrop for a data visualization. We will use the RgoogleMaps-package for R. In the following I will show you how to make this graph.



1. Download the map

I wanted to take a closer look at an area around my former neighborhood, which is in Bochum, Germany.

lat_c<-51.47393
lon_c<-7.22667
bb<-qbbox(lat = c(lat_c[1]+0.01, lat_c[1]-0.01), lon = c(lon_c[1]+0.03, lon_c[1]-0.03))

Once this is done, you can download the corresponding Openstreetmap tile with the following line.

OSM.map<-GetMap.OSM(lonR=bb$lonR, latR=bb$latR, scale = 20000, destfile=”bochum.png”)

2. Add some points to the graphic

Now your second step will most likely be adding points to the map. I choose the following two.

lat <- c(51.47393, 51.479021)
lon <- c(7.22667, 7.222526)
val <- c(10, 100)

As the R-package was mainly build for google-maps, the coordinates need to be adjusted by hand. I made the following functions, that take the min and max value from the downloaded map.

lat_adj<-function(lat, map){(map$BBOX$ll[1]-lat)/(map$BBOX$ll[1]-map$BBOX$ur[1])}
lon_adj<-function(lon, map){(map$BBOX$ll[2]-lon)/(map$BBOX$ll[2]-map$BBOX$ur[2])

Now you can add some points to the map. If you want them to mean anything it may be handy to specify an alpha-level and change some aspects of the points, e.g. size, color, alpha corresponding to some variable of interest.

PlotOnStaticMap(OSM.map, lat = lat_adj(lat, OSM.map), lon = lon_adj(lon, OSM.map), col=rgb(200,val,0,85,maxColorValue=255),pch=16,cex=4)

Here is the full code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
require(RgoogleMaps)
 
#define the part of the world you want to plot. Here the area around my former home.
lat_c<-51.47393
lon_c<-7.22667
bb<-qbbox(lat = c(lat_c[1]+0.01, lat_c[1]-0.01), lon = c(lon_c[1]+0.03, lon_c[1]-0.03))
 
# download the tile from OSM
OSM.map<-GetMap.OSM(lonR=bb$lonR, latR=bb$latR, scale = 20000, destfile="bochum.png")
image(OSM.map)
#Add some coordinates
lat<- c(51.47393, 51.479021)
lon<- c(7.22667, 7.222526)
val <- c(0, 255)
 
#function to adjust the coordinates
lat_adj<-function(lat, map){(map$BBOX$ll[1]-lat)/(map$BBOX$ll[1]-map$BBOX$ur[1])}
lon_adj<-function(lon, map){(map$BBOX$ll[2]-lon)/(map$BBOX$ll[2]-map$BBOX$ur[2])}
 
PlotOnStaticMap(OSM.map, lat = lat_adj(lat, OSM.map), lon = lon_adj(lon, OSM.map), col=rgb(255,0, val,90,maxColorValue=255),pch=16,cex=4)
 
dev.print(jpeg,"test.jpeg", width=1204, height=644, units="px")
yoda

The four steps to publication-grade graphics in R

For many, the main reason to use R is to generate really good-looking or at least informative graphics. However, while it is easy to find information on how to make an individual plot, it can take some time to find out how to get them out into the world. Here is my four-step program to turning your plot into a graphic-file.

In the following I will use my present favorite plot from here as an example.

 

1. Set your options

R allows you to set many general options for your plots, e.g. the margins and whether or not a box should be drawn around most of which are the documentation here.

My favorites are:

  • mfrow: To combine several plots into one (not necessary for the exaple).
  • mar: To control the margins of the plot (not necessary for the exaple).
  • las: To rotate the axis-labels (not necessary for the example)
?View Code RSPLUS
par(mar=c(2,0,2,2))

2. Make your plot

Well this part is the most heterogeneous, just take a peek at the gallery to get some inspiration, or dive into ggplot2 for a very comprehensive graphic-framework that also helps you to add legends.

?View Code RSPLUS
pie(c(1,1), labels="", col=c("black", "white"), main="Your options according to Yoda", init.angle=90)

3. Add a legend

R has a built-in function to add legends. The full documentation can be found here,  The options I use almost every time are:

  • x,y: To tell R where to put the legend. Usually I use the  name for the location (e.g. “top left”), instead of x and y-coordinates.
  • legend: To add some descriptions for the colors/line-types/shadings.
  • fill: To select the colors or alternatively “lty” for the line-type
  • bty: To get rid of the box around the legend
?View Code RSPLUS
legend("right", c("do", "do not", "try"), fill=c("black", "white", "gold"), bty="n", cex=1.4)

4. Save it to a file

R has various options to save files, as documented here. I most often save them as png, as the file-size for tiffs is extremely large at the same quality. This allows you to set the options

  • filename: well something with a  ”.png” at the end
  • width and height: To control the scale and of the image.
  • units: To
  • resolution: To Journals love images with at least 300 dpi.
  • bg: To have non-transparent background simply use “white”.
?View Code RSPLUS
dev.print(png, "yoda.png", width=8, height=6, units="in", res=300, bg="white")

Enjoy the complete script

 

?View Code RSPLUS
par(mar=c(2,0,2,2))
pie(c(1,1), labels="", col=c("black", "white"), main="Your options according to Yoda", init.angle=90)
legend("right", c("do", "do not", "try"), fill=c("black", "white", "gold"), bty="n", cex=1.4)
dev.print(png, "yoda.png", width=8, height=6, units="in", res=300, bg="white")
Para muchos el principal motivo para usar R es generar gráficos de muy buen aspecto o al menos informativos. Encontrar la información necesaria para hacer un diagrama individual puede ser fácil pero averiguar como presentar los diagramas al mundo puede ser un largo proceso. Aquí les presento mi programa, compuesto de 4 pasos, para convertir sus diagramas en archivos gráficos.A continuación usare mi grafico favorito de aquí como un ejemplo.

1. Configura tus opciones

R te permite configurar muchas opciones generales para tus diagramas, por ejemplo los márgenes y si deseas o no que una “caja” rodee la mayor parte de la documentación  aquí
Mis favoritos son:* mfrow:  Para combinar varios diagramas en un (no es necesario para el ejemplo).
* mar: Para controlar los márgenes del diagrama (no es necesario para el ejemplo)
* las: Para rotar los ejes-etiquetas (no es necesario para el ejemplo).
?View Code RSPLUS
par(mar=c(2,0,2,2))

2. Haz tu diagrama

Esta parte es la mas heterogénea. Puedes visitar nuestra galería  para inspirarte o recorrer ggplot2  para ver una estructura grafica (graphic-framework) muy completa que además te permita agregar notas.

?View Code RSPLUS
pie(c(1,1), labels="", col=c("black", "white"), main="Your options according to Yoda", init.angle=90)

3. Agregar una nota

R ha incorporado funciones para agregar notas. Puedes encontrar la documentación completa aquí. Las opciones que uso generalmente son:

  • x,y:  Para decirle a R donde colocar la nota. Generalmente uso el nombre para la localización (por ejemplo: “top left”) en lugar de coordenadas x-y.
  • legend: Para agregar algunas descripciones sobre  colors/line-types/shadings (colores/tipo de líneas/sombreado).
  • fill: Para elegir los colores o “lty” para el tipo de línea.
  • bty: Para deshacerse de la “caja” alrededor de las notas.

legend(“right”, c(“do”, “do not”, “try”), fill=c(“black”, “white”, “gold”), bty=”n”, cex=1.4)

?View Code RSPLUS
legend("right", c("do", "do not", "try"), fill=c("black", "white", "gold"), bty="n", cex=1.4)

4. Guardarlo en un archivo

R tiene varias opciones para guardar archivos, tal como esta documentado aquí. Por lo general guardo los archivos en formato png ya que los archivos tiffs son extremadamente pesados y de igual calidad. Esto te permite configurar las siguientes opciones:

  • filename: El nombre del archivos con un “.png” al final
  • width and height: Para controlar la escala de la imagen.
  • units: Para
  • resolution: Los Journals adoran las imágenes con al menos 300 dpi.
  • bg: Para tener un fondo no transparente debes usar simplemente “white”.
?View Code RSPLUS
dev.print(png, "yoda.png", width=8, height=6, units="in", res=300, bg="white")

Disfruta el Script completo!

 

?View Code RSPLUS
par(mar=c(2,0,2,2))
pie(c(1,1), labels="", col=c("black", "white"), main="Your options according to Yoda", init.angle=90)
legend("right", c("do", "do not", "try"), fill=c("black", "white", "gold"), bty="n", cex=1.4)
dev.print(png, "yoda.png", width=8, height=6, units="in", res=300, bg="white")

Visualizing small-scale paired data – combining boxplots, stripcharts, and confidence-intervals in R

Sometimes when working with small paired data-sets it is nice to see/show all the data in a structured form. For example when looking at pre-post comparisons, connected dots are a natural way to visualize which data-points belong together. In R this can be easily be combined with boxplots expressing the overall distribution of the data.  This also has the advantage of beeing more true to non-normal data that is not correctly represented by means +/- 95%CI. I have not come up against a good tutorial of how to do such a plot (although the right hand plot borrows heavily from this post on the excellent R-mailing list), so in the post you will find the code to generate such a graph in R.

Continue reading

Merkmalsvisualisierung mit Biplot/ Visualizing data with Biplot

Biplot visualizes distribution of data within a twodimensional coordinate system. In contrast to the usual visualization, biplot shows how many data points are located at the same spot. Additionally, a filter variable can be created in oder to change the color of certain values. Biplot can be used on any operating system that supports Java.

Krankenhausbetten_NRW

Visualizing geographic data with R

[de]

Viele Daten, die man erhebt, sind örtlich gebunden. In R kann man solche Daten sehr schön mit dem Package sp darstellen. Alles, was man braucht sind:

  1. Kartenmaterial: Also Beschreibungen der Kanten. Eine sehr gute Datenbank hierfür ist die GADM database of Global Administrative Areas. Für Deutschland gibt es dort eine Karte aller Kreise, die als R-Datenobjekt gespeichert sind – nett oder?
  2. Daten, die einen lokalen Bezug haben. Viele gibt es in Genesis Datenbank des Statistikportals dem gemeinsamen Datenangebot der Statistischen Bundesämter. Ich hab’ mir vorgenommen einfach mal die Anzahl der Krankenhausbetten zu plotten. Hier ist das Ergebnis:

 

 

Continue reading

Goodbye Powerpoint – Slides with LaTeX and beamer

A couple of weeks ago I presented my first presentation with LaTeX. It took some time longer, and especially as the deadline was approaching I was very close to switching back to powerpoint. But now I am glad I didn’t, because I rediscovered the ABC of what I like about LaTeX approach.

  1. Automaticity: Once you learn LaTeX, you don’t have to make a reference list or table of contents
  2. Beauty: The beamer-themes just look really really good. They give the right amount of information on every slide, and leave ample space for your contents.
  3. Compatibility: As you are presenting PDFs you can be sure, that they will look similar across different computers.

Here is a minimal example of a presentation with table of contents, several slides, animation, images, references, and Umlauts. Of course, I am not the first to use the beamer class, I particularly like this introductory site because it has many examples for the different themes, and has both an English and German version. You will find many more if you just google for specific commands (e.g. “\usetheme{Warsaw}”). In any case, I hope you find the example useful, all files necessary to adapt it to your own needs can be downloaded here.

[de]Vor einigen Wochen habe ich meine erste Präsentation mit LaTeX gehalten. Die Vorbereitung hat einige Zeit länger gebraucht, und vor allem als sich die Deadline näherte, stand ich ganz kurz davor, zurück zu Powerpoint zu wechseln. Jetzt bin ich froh, nicht gewechselt zu haben, weil ich das ABC bezüglich was ich an LaTeX mag wiederentdeckt habe:

  1. Automaticity: Wenn du einmal LaTeX gelernt hast, brauchst du kein Literatur- oder Inhaltsverzeichnis mehr zu machen
  2. Beauty: Das Folienlayout sieht wirklich wirklich schön aus. Es wird die richtige Menge an Informationen auf jeder Folie geliefert, wobei ausreichend Platz für deine Inhalte gelassen wird.
  3. Compatibility: Wenn du PDFs präsentierst, kannst du dir sicher sein, dass sie auf verschiedenen Computern trotzdem alle ähnlich aussehen.

Here ist ein kleines Beispiel mit Inhaltsverzeichnis, einigen Folien, Animationen, Bildern, Referenzen und Umlauten. Natürlich bin ich nicht der erste, der beamer class nutzt: besonders mag ich diese (this) Einführungsseite, weil sie viele Beispiele für verschiedene Themen hat und sowohl englische, als auch deutsche Versionen. Noch viele weiter Beispiele findest du, wenn du für spezifische Kommandos googlest (z.B. “\usetheme{Warsaw}”). In jedem Fall hoffe ich, dass du die Beispiele hilfreich findest. Alle Dateien, welche nötig sind um es an deine eigenen Bedürfnisse anzupassen können here downgeloaded werden./de]

Reproducible Research = LaTeX + R + Jabref

[en]The concept of reproducible research with it’s core idea of being able to reproduce all figures, tables, and results in a manuscript is fascinating. The best way to implement this is by using R in combination with latex. However, it takes a while to get everything into place. There is some information about sweave on the authors’ webpage and  several others give great examples (here, and here).  However, these do not include citations, and as someone who needs Umlauts I also spent some time to find out how to include these. In the following I describe my final set of files that can be used to write small and not so small tutorials in statistics. All of which look like this example. Any comments are much aprecciated:

Overview

Stuff I installed: Texshop and Jabref. For both I set the standard encoding to UTF8.

Four files need to be in the same folder (“/temp/):

  1. the master-file that contains the text, R-code-chunks, and citations
  2. the bibtex-file that contains the references in bibtex-format
  3. Sweave.sty
  4. the R-skript that starts the sweave command and compiles the resulting Latex-document using pdflatex.

You can download the whole folder here.

Continue reading

Inkscape – the final graphic program, is it?

We have covered quite a lot of graphic programs already, but one type of graphic- software is still lacking. Today it is all about scalable vector graphics. These are used quite a bit on the web as they allow high-quality images with only a limited file-size. Instead of specifying each and every pixel in a picture vector graphics specify where on the image a line or a should be placed. Inscape is the free and open source software to do this.

An Open Source vector graphics editor, with capabilities similar to Illustrator, CorelDraw, or Xara X, using the W3C standard Scalable Vector Graphics (SVG) file format. from Inscape.org

In contrast to these programs Inscape is free and works on all operating systems.

Cons:

  • Inscape is still in beta. However, last time I tried it this was no problem.

look at all the pictures – Irfanview

By now we have already covered a couple of graphic programs to edit pictures. For all you, who work on windows, what is lacking now is a simple file-viewer to have a look at the results. That’s exactly what irfanview does best. It has all the basic features of pictures viewers:

  • It opens many formats – e.g. these tiffs, that windows struggels with
  • It has a thumbnail viewer to browse through your files
  • Most importantly it has batch-conversion features to rename, resize or filter large amounts of pictures.

So a tasks such as “resize these 800 pictures to a maximal height or width of 400px”, that can take a usual student assisstant a couple of hours, can be accomplished in minutes. If you are the student assisstant you propably know what to do with these saved hours (hehe). If you are the one who wanted to get the job done you will have to think of more intellectually challenging jobs – sorry for that.

Cons:

  • it is windows only
  • batch-conversion sometimes does not work without administrator privileges.