Tag Archives: Datenanalyse

Plotting Odds Ratios (aka a forrestplot) with ggplot2 –

Hi,

if you like me work in medical research, you have to plot the results of multiple logistic regressions every once in a while. As I have not yet found a great solution to make these plots I have put together the following short skript. Do not expect too much, it’s more of a reminder to my future self than some mind-boggling new invention. The code can be found below the resulting figure looks like this:

fig_1_odds

Here comes the code. It takes the model and optionally a title as an input and generates the above plot.

 

 

plot_odds<-function(x, title = NULL){
tmp<-data.frame(cbind(exp(coef(x)), exp(confint(x))))
odds<-tmp[-1,]
names(odds)<-c(‘OR’, ‘lower’, ‘upper’)
odds$vars<-row.names(odds)
ticks<-c(seq(.1, 1, by =.1), seq(0, 10, by =1), seq(10, 100, by =10))

ggplot(odds, aes(y= OR, x = reorder(vars, OR))) +
geom_point() +
geom_errorbar(aes(ymin=lower, ymax=upper), width=.2) +
scale_y_log10(breaks=ticks, labels = ticks) +
geom_hline(yintercept = 1, linetype=2) +
coord_flip() +
labs(title = title, x = ‘Variables’, y = ‘OR’) +
theme_bw()
}

 

P.s. I know about ggplots “annotation_logticks” but they messed up my graphics, also it is not very often that ORs span more than three orders of magnitude. If they do consider playing with ggplots function or update the line beginning with “ticks <- ” in the above example

Update 29-01-2013: I replaced the nasty ” as they resulted in some nasty copy-past errors…

Dump MySQL to CSV using R

Based on a related post on one of my favorite python-lists I remembered, that I wrote a similar snipplet some time ago.

So if you want to dump your whole MySQL database to csv-files you can recycle the following code:

?Download mysql2cvs.R
1
2
3
4
5
6
7
8
9
require(RMySQL)
m<-MySQL()
summary(m)
con<-dbConnect(m, dbname = "YOURDB", host="localhost", port=8889, user="YOURUSER", pass="YOURPASS", unix.sock="/Applications/MAMP/tmp/mysql/mysql.sock") # in case you are using MAMP 
tables<-dbListTables(con)
 
for (i in 1 : length(tables)){
temp<-(dbReadTable(con, tables[i]))
write.table(temp, tables[i], row.names=F)}

This also is a great way to use my new source-code plugin (WP-CodeBox)
Enjoy!

Visualizing small-scale paired data – combining boxplots, stripcharts, and confidence-intervals in R

Sometimes when working with small paired data-sets it is nice to see/show all the data in a structured form. For example when looking at pre-post comparisons, connected dots are a natural way to visualize which data-points belong together. In R this can be easily be combined with boxplots expressing the overall distribution of the data.  This also has the advantage of beeing more true to non-normal data that is not correctly represented by means +/- 95%CI. I have not come up against a good tutorial of how to do such a plot (although the right hand plot borrows heavily from this post on the excellent R-mailing list), so in the post you will find the code to generate such a graph in R.

Continue reading

RStudio the missing link between your brain and statistics

RStudio is a graphical user interface for R. Or as the developers put it.

RStudio™ is a new integrated development environment (IDE) for R. RStudio combines an intuitive user interface with powerful coding tools to help you get the most out of R.

 

While there have been a few projects (e.g. RCommander, RkWard, JaguaR) RStudio is the first I will probably integrate into my workflow – the mac-gui I work with is already great and has some essential features like syntax-highlighting out of the box, but I will recommend RStudio to anyone considering to start working with R – and anyone else asking me about statistics.

I just want to highlight two features which can change the learning curve for R.

1. Getting data into R. RStudio has a nice import dataset feature that can be used to read text-files. Something that can be really frustrating in the beginning.

2. Navigating the data. By just clicking aI really hope that this will stay a read-only feature, because everything else is simply not the way to go.

Cons:

  • Umlauts are not yet integrated, but it seems like a matter of time with these guy/is.

Simulation studies in R – Reproducing MacCullums et al. 2002 “Effects of variable dichotomization”

[en]
I recently came across an excellent paper “On the Practice of Dichotomization of Quantitiative Variables” by MacCallum and colleagues (2002) . As I use ANOVAs a lot in my research, it really got me thinking about the whole issue. Even though I have no great idea for an innovative simulation study, you might have one. If you read through this post, you will notice that it’s really simple – at least the technical part.

I will only explain the two-variable scenario. But the setup is basically the same for more complex, e.g.  two-variable, setups. Let’s start with their small numerical example before turning to the simulation study.

Continue reading

Beurteilerübereinstimmung mit ReCal/ Calculating Intercoder Reliability with ReCal

After categorization of data, it is sensible to calculate intercoder reliability. This can be easily done using the web serviceReCal.

The only action required to use this service is to upload a CSV-file containing the codes. There are three different variants of the program: one for two coders with data at nominal scale level, one for three or more coders with data at nominal scale level, and one for any number of coders with data at ordinal, interval, or ratio scale level. The program is described in the following article:

Freelon, Deen G. (2010). ReCal: Intercoder Reliability Calculation as a Web Service. International Journal of Internet Science,5(1), 20-33.

Krankenhausbetten_NRW

Visualizing geographic data with R

[de]

Viele Daten, die man erhebt, sind örtlich gebunden. In R kann man solche Daten sehr schön mit dem Package sp darstellen. Alles, was man braucht sind:

  1. Kartenmaterial: Also Beschreibungen der Kanten. Eine sehr gute Datenbank hierfür ist die GADM database of Global Administrative Areas. Für Deutschland gibt es dort eine Karte aller Kreise, die als R-Datenobjekt gespeichert sind – nett oder?
  2. Daten, die einen lokalen Bezug haben. Viele gibt es in Genesis Datenbank des Statistikportals dem gemeinsamen Datenangebot der Statistischen Bundesämter. Ich hab’ mir vorgenommen einfach mal die Anzahl der Krankenhausbetten zu plotten. Hier ist das Ergebnis:

 

 

Continue reading

G*Power 3

Once in a while one has to determine the power of an experiment, the effect size, or the adequate sample size. Nearly all statistical programs allow to determine the effect size of an experiment and/or its power. But why powering up SPSS, when a small tool would do as well if not better?

G*Power 3 offers different types of statistical power analysis. Typically one uses a-priori or post-hoc.  G*Power 3 is available for Mac OS X 10.4 and Windows XP/Vista. It has been described in

Faul, F., Erdfelder, E., Lang, A., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39(2), 175-191.

It can be downloaded here.

All together now – Confirmatory Factor Analysis in R

Describing multivariate data is not easy. Especially, if you think that statisticians have not developed any new tools after the ANOVA and principal component analysis (PCA). For social and experimental scientists the most important new technique are structural equation models that combine measurement models (that substitute reliability analysis and PCA) and structural models (that substitute ANOVAs or regressions).

At present three R-packages provide the functionality to extimate structural equation models.

  • sem: The first package to provide the ability to fit structural equation models in R.
  • OpenMX: Has a large number of active developers, draws up-on a well established code to fit the models (Mx) and can fit non-standard models, and is the first to announce version 1.0.
  • lavaan: Aims at a very easy-to-use implementation of SEM that also incorporates advanced techniques (e.g. Full Information Maximum Likelihood Estimation, and multiple-group confirmatory factor analysis).

Continue reading

Daten visualisieren – R

By now, most people should have understood, that R is a extremly powerful programm to calculate all kinds of statistics – though someone still has to implement Bowkers test. What might be less well known is that it has many extremly flexible built-in functions and well supported add-on packages to generate  graphics.

Sheldon might say:

“R uses the worlds most powerful graphics chip – your imagination”

Continue reading

R: Daten einlesen & kontrollieren

Für Anfänger ist es oft ein großes Problem, Daten in R einzulesen ehe diese manipuliert werden können. Im Folgenden stelle ich meine Lieblingsbefehle vor, die ich verwende. Die sind nicht sehr elegant, aber effizient.

Wenn jemand weitere elegante Schritte kennt: Schreibt einen Kommentar!

Im Folgenden wird das Paket foreign , sowie die Funktionen set.wd(), require(), read.table(), read.spss(), names(), str(), summary(), unique(), table(), und attach() verwendet.

Continue reading

R: Linear mixed effects models

Einleitung
Linear mixed effects modeling ist eine schöne Sache, gerade wenn man -  wie so oft in psycholinguistischen Experimenten – komplexe Daten analysieren muss. Top beschrieben sind diese in:

Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59(4), 390-412.

In der folgenden Einführung werden die Packages languageR und foreign benötigt und die Funktionen read.spss(), lmer(), sowie pvals.fnc() verwendet.

Continue reading