Hi folks,

not really that we have really covered all open source projects that are worth a mentioning.  However, before I can catch enough breath for the next longer post, I wanted to share some thoughts on file names. As long as you start and finish your work in one sweep there isn’t really much need for a really systematic approach. However, if you - like probably everyone reading this blog – are working on projects that span longer than a day - think manuscript, data of your latest study - you need a system to keep track of versions and changes.

If all goes well you can tell stories with file-names.

How I name my own files

Filenames consist of four parts:

  1. A title: e.g. “science_paper”

  2. A version-number, e.g. “_3”

  3. A short description of what has changed, e.g. “_made_up_data”

  4. The ending indicating the file-type, e.g. “.doc”

So the final name is: “science_paper_3_made_up_data.doc”.

This way I can quickly find the most up to date file by sorting the filenames and know what I last did. Sometimes colleagues use the date instead of the integer to keep track of the latest number.  In my opinion integers work better because it also gives you a sense of how often you have worked on a specific file. If the number goes up into the 20 it may be a good idea to reflect on what is gong wrong. However, if you really have to use the date I suggest to use a yyyy-mm-dd format, because this also let’s you quickly find the latest file.

How I like working with others on files

The major problem when working with others is not so much clash of file-naming-conventions but ignorance of the importance of the problem of mixing up files. So the first thing is to agree on trying to keep a system. This way you decrease the likelihood of being called anal behind your back.

I usually opt for a fifth part to the filename after the version-number “Author initials”

  1. A title: e.g. “science_paper”

  2. A version-number, e.g. “_3”

  3. Author, e.g. “_gh”

  4. A short description of what has changed, e.g. “_made_up_data”

  5. The ending indicating the file-type, e.g. “.doc”

So the final name is: “science_paper_3_gh_made_up_data.doc”.

Again there is a small question about the version-numbers. Specifically, who is in charge of changing it. The two options are:

  • Everyone gives a new number.

  • Only the lead-author / lead-analyst gives new numbers.

I tend to prefer the latter especially in the final stages of a paper when you want to approve a specific final version for submission. But this only works as long as everyone agrees to comment only once on a specific version.

Cheerio and sorry again for being too german.

[de] Manchmal möchte man den Probanden in einer Online-Studie eine Audiodatei vorspielen. Das klappt aber nicht immer perfekt mit gängiger herkömmlicher Befragungssoftware. Manchmal möchte man auch die Audios auf dem eigenen Server ablegen und in der Befragungssofteware nur passend verlinken. Hier kann der Flash Mp3-Player helfen.

Es gibt verschiedene Versionen dieses Players. Mir hat bisher die Mini-Variante schon gereicht - wie man mit dieser ein Audio abspielen (und damit auch in eine Online-Befragung einbinden) kann, ist hier beschrieben.

Pro: Es ist wirklich einfach und die Beschreibung und Beispielse sehr hilfreich. Der Code-Generator auf der Website ist eine weitere gute Hilfe.

Contra: Aus meiner Sicht gibts kein Contra.

[/de] [en] Sometimes you would like to play an audio to your participants in an online survey. But not every type of audio is working perfectly in standard online survey software and sometimes you would just like to link to your audio data on your own server. Here you could use the Flash Mp3-Player.

There are different Versions of the player available. For my tasks even the mini player was sufficient so far. How to present an audio online with this player (and so showing it in an online survey as well) is described here.

Pro: It is really easy and the given examples are very helpful. The code generator on the website is an additional helpful tool.

Con: From my point of view there are no cons. [/en]

In this post I way to share with you some code to use Openstreetmap - maps as a backdrop for a data visualization. We will use the RgoogleMaps-package for R. In the following I will show you how to make this graph.

1. Download the map

I wanted to take a closer look at an area around my former neighborhood, which is in Bochum, Germany.

lat_c<-51.47393 lon_c<-7.22667 bb<-qbbox(lat = c(lat_c[1]+0.01, lat_c[1]-0.01), lon = c(lon_c[1]+0.03, lon_c[1]-0.03))

Once this is done, you can download the corresponding Openstreetmap tile with the following line.

OSM.map<-GetMap.OSM(lonR=bb$lonR, latR=bb$latR, scale = 20000, destfile="bochum.png")

2. Add some points to the graphic

Now your second step will most likely be adding points to the map. I choose the following two.

lat <- c(51.47393, 51.479021) lon <- c(7.22667, 7.222526) val <- c(10, 100)

As the R-package was mainly build for google-maps, the coordinates need to be adjusted by hand. I made the following functions, that take the min and max value from the downloaded map.

lat_adj<-function(lat, map){(map$BBOX$ll[1]-lat)/(map$BBOX$ll[1]-map$BBOX$ur[1])} lon_adj<-function(lon, map){(map$BBOX$ll[2]-lon)/(map$BBOX$ll[2]-map$BBOX$ur[2])

Now you can add some points to the map. If you want them to mean anything it may be handy to specify an alpha-level and change some aspects of the points, e.g. size, color, alpha corresponding to some variable of interest.

PlotOnStaticMap(OSM.map, lat = lat_adj(lat, OSM.map), lon = lon_adj(lon, OSM.map), col=rgb(200,val,0,85,maxColorValue=255),pch=16,cex=4)

Here is the full code:

require(RgoogleMaps)

#define the part of the world you want to plot. Here the area around my former home.
lat_c<-51.47393
lon_c<-7.22667
bb<-qbbox(lat = c(lat_c[1]+0.01, lat_c[1]-0.01), lon = c(lon_c[1]+0.03, lon_c[1]-0.03))

# download the tile from OSM
OSM.map<-GetMap.OSM(lonR=bb$lonR, latR=bb$latR, scale = 20000, destfile="bochum.png")
image(OSM.map)
#Add some coordinates
lat<- c(51.47393, 51.479021)
lon<- c(7.22667, 7.222526)
val <- c(0, 255)

#function to adjust the coordinates
lat_adj<-function(lat, map){(map$BBOX$ll[1]-lat)/(map$BBOX$ll[1]-map$BBOX$ur[1])}
lon_adj<-function(lon, map){(map$BBOX$ll[2]-lon)/(map$BBOX$ll[2]-map$BBOX$ur[2])}

PlotOnStaticMap(OSM.map, lat = lat_adj(lat, OSM.map), lon = lon_adj(lon, OSM.map), col=rgb(255,0, val,90,maxColorValue=255),pch=16,cex=4)

dev.print(jpeg,"test.jpeg", width=1204, height=644, units="px")

For many, the main reason to use R is to generate really good-looking or at least informative graphics. However, while it is easy to find information on how to make an individual plot, it can take some time to find out how to get them out into the world. Here is my four-step program to turning your plot into a graphic-file.

In the following I will use my present favorite plot from here as an example.

1. Set your options

R allows you to set many general options for your plots, e.g. the margins and whether or not a box should be drawn around most of which are the documentation here.

My favorites are:

  • mfrow: To combine several plots into one (not necessary for the exaple).
  • mar: To control the margins of the plot (not necessary for the exaple).
  • las: To rotate the axis-labels (not necessary for the example)
> > > > par(mar=c(2,0,2,2)) > > > >

2. Make your plot

Well this part is the most heterogeneous, just take a peek at the gallery to get some inspiration, or dive into ggplot2 for a very comprehensive graphic-framework that also helps you to add legends.

> > pie(c(1,1), labels="", col=c("black", "white"), main="Your options according to Yoda", init.angle=90) > >

3. Add a legend

R has a built-in function to add legends. The full documentation can be found here,  The options I use almost every time are:

  • x,y: To tell R where to put the legend. Usually I use the  name for the location (e.g. “top left”), instead of x and y-coordinates.

  • legend: To add some descriptions for the colors/line-types/shadings.

  • fill: To select the colors or alternatively “lty” for the line-type

  • bty: To get rid of the box around the legend

> > legend("right", c("do", "do not", "try"), fill=c("black", "white", "gold"), bty="n", cex=1.4) > >

4. Save it to a file

R has various options to save files, as documented here. I most often save them as png, as the file-size for tiffs is extremely large at the same quality. This allows you to set the options

  • filename: well something with a  ”.png” at the end

  • width and height: To control the scale and of the image.

  • units: To

  • resolution: To Journals love images with at least 300 dpi.

  • bg: To have non-transparent background simply use “white”.

> > dev.print(png, "yoda.png", width=8, height=6, units="in", res=300, bg="white") > >

Enjoy the complete script

par(mar=c(2,0,2,2))
pie(c(1,1), labels="", col=c("black", "white"), main="Your options according to Yoda", init.angle=90)
legend("right", c("do", "do not", "try"), fill=c("black", "white", "gold"), bty="n", cex=1.4)
dev.print(png, "yoda.png", width=8, height=6, units="in", res=300, bg="white")

I started working with Endnote about 8 years ago as part of my first job at uni. A big chunk of my work was sorting and filing papers for my Prof. Even though it was a great way to become familiar with all the different topics, I always dreamed about all the other things I could have done during these hours. However, as long as the gold-standard for time-efficient referencing was the ability to query the social science citation index from within Endnote and only if you had a pricey subscription - these were only dreams.

With Zotero these dreams come finally true. Zotero is a plugin for Firefox that helps you organize your academic literature. It can be installed even when you do not have administrator privileges as e.g. at your work-PC. I will not talk about all the nice features of Zotero as importing existing databases, quickly adding references while browsing, or sharing libraries with your colleagues, workmates and friends.

Today I want to focus on an addon Zotpress that allows you to insert your whole library, a collection or papers tagged witha specific keay-word on your wordpress blog.The advantages are:

  • You only have to update one library, which you can also use to write your next paper.

  • As you can only show papers authored by specific authors, you can easily manage and genrate overlapping citation lists for a whole workgroup that specializes in different topics.

How to use Zotpress

To use it, you first need to have Zotero, add some references to your library, and a wordpress blog.

1. Install zotpress

Just go to the “plugin” menu in your dashboard and search for “zotpress”. Follow the white rabbit and you are done!

2. Connect to your Zotero library

This is the only step that took me some time, as I could not figure out how to generate a “public key”. This can be done right here on the zotero webpage. On this page, you can also see your userID. These two pieces of information then have to be entered in the wordpress-zotpress plugin page. Click on zotpress and then “Zotero Accounts”. And your done!

3. Add a bibliography to a post

There are many ways to add a bibliography to a post. You can either use in text citations that will also be correctly formatted according to APA, or you can use the Zotpress shortcode to enter a bibliography at a specific place.

The easiest way to test your setup is to just put a whole collection into you post. To find out the name of a specific collection go to the “citations” tab in the zotpress plugin preferences. If you enter the following line in brackets, it will print your collection - in this case my 2011 publications - sorted by date of publication. You can also select only five articles from the collection (by adding limit = “5”), or only a specific year (by adding year = “2010”).

zotpress collection=”COLLECTION_ID” sortby=”date” year = “2011”

[zotpress collection="8S2F7Z2Q" sortby="date" year = "2011"]

After working more seriously with simulations I noticed some updates were necessary to my previous setup. Most notably are the following three:

  • It is very handy to explicitly call the different scenarios instead of using nested loops

  • Storing intermediate results in single files obliviates the need to rerun an almost finished but crashed analysis and seperates very clearly the data-generation from analysis part.

  • Using all availible cores can speed up the processing time, but may render the simulation not reproducible.

So here is my new simulation-study sceleton, that consists of five parts:

  1. Praeamble: Load all the functions that are required

  2. Simulation-function: This is the part, that will most likely be much more complicated in your case. Define the steps that will be repeated for different scenarios. The parameters of this function will be filled in by the scencarios.

  3. Scenario-Description: Explicitly show the range of values that should be passed to the Simulation-Function

  4. Run the analysis: Here you pass all the scenario-descriptions to your simulation-function. Either do this on one or all availible cores. In any case you should set a random seed to make the simulatino reproducible.

  5. Analyze the outputs: Not shown here but You propabely

Here is the complete script:

# 1. Praeamble
setwd("c:/temp/")
require(doSNOW)
require(rlecuyer)

# 2. Simulation-function
sim_fun<-function(a,b,c){
results<-matrix(NA, 1000,4)
for(i in 1:1000){
#a=1; b=2;c=3;i=1
results[i,1:3]<-cbind(a, b, c)
results[i,4]<-mean(rnorm(100))#THIS MAY BE MORE COMPLEX FOR YOU HEHE!
}
write.table(results, file=paste(a,"_", b,"_", c, "_res.csv"))  
}

# 3. Scenario-Description
a<-seq(10, 100, 20)
b<-seq(20, 100, 30)
c<-seq(30, 200, 40)
scenarios<-expand.grid(a, b, c)

# 4.a Run the analysis on one core
set.seed(29012001)
for(i in 1:length(scenarios[,1])){sim_fun(scenarios[i,1], scenarios[i,2], scenarios[i,3])}


# 4.b Run the analysis on all availible Cores
cluster<-makeCluster(4, type = "SOCK")
clusterSetupRNG(cluster, seed = 29012001) 
registerDoSNOW(cluster)

foreach(i= 1:length(scenarios[,1])) %dopar% {sim_fun(scenarios[i,1], scenarios[i,2], scenarios[i,3])}

# compare the time
system.time(
for(i in 1:length(scenarios[,1])){sim_fun(scenarios[i,1], scenarios[i,2], scenarios[i,3])}
)

system.time(
foreach(i= 1:length(scenarios[,1])) %dopar% {sim_fun(scenarios[i,1], scenarios[i,2], scenarios[i,3])}
)

There are also other tutorials on how to run simulations in R. The one I liked most was Roger Koenkers’ “A simple protocoll for simulations in R” (accessible here) that relies more heavily on R’s built in features to solve some of the problems.

I started offloading some of the heavy-lifting and the more time-consuming jobs to my server. While I am more and more comfortable of using the ssh terminal, the mayor problem was that scripts only worked as long as I was logged in. Which goes completely against the whole idea of laying off work to the server. After some goggling I found the small tool “screen” which is available for linux. A more comprehensive introduction can be found here, but after installing I only used three commands: “screen” to start; “screen -ls” to list the open terminals after reconnection, and “screen -r XXX” to reconnect to a running terminal.

Installation

With my Linux (ubuntu) server all I had to do to install it was a simple $ sudo apt-get install screen

Use

After that you can start screen by simply typing: $ screen

Now start your time-consuming job and just quit the terminal whenever you can’t stand waiting any longer. Whenever you want to have a look again just connect via ssh, and get a list of open screens:

$ screen -ls

this gave me: 3467.pts-0.MYSERVER (09/14/2011 08:59:46 PM) (Detached) 2238.pts-0.MYSERVER (09/14/2011 08:45:13 PM) (Attached) 2 Sockets in /var/run/screen/S-gerhi.

To reconnect to the first of these still running terminals, you just enter: $ screen -r 3467.pts-0.MYSERVER

[de]dropbox ist ein Service im Web, mit dessen Hilfe du deine eigenen Dateien immer selbst zu Hand hast und auch problemlos mit anderen teilen kannst. Dropbox stellt ein Netzwerk-Dateisystem für die Synchronisation von Dateien zwischen verschiedenen Rechnern und Benutzern bereit und ermöglicht damit gleichzeitig eine Online-Datensicherung.

Zur Nutzung musst du auf deinem Rechner den Dropbox-Client installieren. Dadurch wird auf deinem Rechner ein neuer Ordner (die Dropbox) erstellt, in welchem Dateien von dir abgelegt werden können. Auf diese Dateien kannst du dann auch von anderen Rechnern, auf denen du Dropbox installierst, zugreifen, zusätzlich sind sind sie jederzeit über die Dropbox-Webseite abrufbar. Die Bedienoberfläche ist insgesamt sehr angenehm.

Pro

  • In der Dropbox können weitere Ordner erstellt werden, welche nach belieben für Freunde oder Mitarbeiter (die ebenfalls Dropbox installiert haben müssen) freigegeben werden können. Dadurch können Dateien ausgetauscht werden oder an gemeinsamen Projekten gearbeitet werden.

  • Die Synchronisation der Daten läuft über einen Server, der die Daten ebenfalls speichert. Dadurch müssen nicht alle Rechner gleichzeitig in Betrieb sein. Die Daten werden einfach aktualisiert, sobald ein zugehöriger Rechner ins Netz geht.

  • Dropbox ermöglicht eine manuelle Einstellung der Bandbreite , so wird nicht deine gesamte Internetverbindung in Beschlag genommen.

  • Über zwei vordefinierte Ordner Photos und Public besteht die Möglichkeit, Dateien auch mit Personen zu teilen, die Dropbox nicht nutzen.

  • Dropbox funktioniert mit folgenden Betriebssystemen: Windows, Mac, Linux, iPad, iPhone, Android und BlackBerry.

Cons

  • Die kostenlose Variante beinhaltet lediglich eine anfängliche Speicherkapazität von 2GB, durch Werbung von Neumitgliedern oder weiteren Aktionen (zur Zeit z.B. der Authentifizierung als Mitglied einer Universität über eine Uni-Emailadresse) kann der Speicherplatz allerdings erhöht werden.

  • Serverseitig werden die Dateien mit einer AES256-Verschlüsselung versehen. Die Nutzer können für diese Verschlüsselung keine eigenen Schlüssel anlegen. Es besteht aber die Möglichkeit, beispielsweise mit TrueCrypt verschlüsselte Dateien über den Dropbox-Dienst zu synchronisieren

[/de]

[en]dropbox is a Web-based Service, that enables you in having access to your files at different places and in sharing them with others. It uses file synchronisation between different computers and users and represents an online data backup.

To use dropbox you have to install the dropbox client. Afterwards a new folder (the dropbox) will be generated in which you can store your files. You have acces to these data by other computers (on which you installed dropbox) as well as you can retrieve them from the Dropbox-Website.

Pro

  • You can create new folders in the dropbox, which you can release for your friends or associates (who have installed dropbox themselves). Now you can share and exchange data or work on projects together.

  • The file synchronisation runs on a server, who saves the data as well. So the involved computers don´t have to be in operation at the same time. The data will be updated, as soon as an involved Computer goes online.

  • Dropbox enables the manual adjusting of the bandwidth, so it won´t engross your hole internet connection.

  • by the defined folders “photos” and “public” you can share files with persons who are not using dropbox

  • Dropbox works with the following operation systems: Windows, Mac, Linux, iPad, iPhone, Android and BlackBerry.

Cons

  • The free version contains an initial storage capacity of 2GB. But by readvertising new members or other actions (like proofing you are a member of an university by your uni emailadress) you can increase your storage capacity.

  • The server performs a AES256-encoding on the files. The users are not able to attach their own code. But there is the opportunity e.g. to synchronize files with Dropbox, that are encoded by TrueCrypt. [/en]

Based on a related post on one of my favorite python-lists I remembered, that I wrote a similar snipplet some time ago.

So if you want to dump your whole MySQL database to csv-files you can recycle the following code:

require(RMySQL)
m<-MySQL()
summary(m)
con<-dbConnect(m, dbname = "YOURDB", host="localhost", port=8889, user="YOURUSER", pass="YOURPASS", unix.sock="/Applications/MAMP/tmp/mysql/mysql.sock") # in case you are using MAMP 
tables<-dbListTables(con)

for (i in 1 : length(tables)){
temp<-(dbReadTable(con, tables[i]))
write.table(temp, tables[i], row.names=F)}

This also is a great way to use my new source-code plugin (WP-CodeBox) Enjoy!

[en]This is a short intro on how Python-scripts can be packaged into an easy to distribute mac-app.

Essentially I followed two tutorials. The first on how to turn your python script into a self-contained app, the second on how to make compressed diskimages for mac. The final result aswell as the files necessary to reproduce the steps can be downloaded here. A detailed desription can be found in the remainder of this post.

0. Make a nice skript

I used my previous python skript that batch-queries google maps to gather distances and travel times between a start and an end-address. I added two guis one that asks for the input file and one that asks for the location where results should be stored.

1. Use py2app to compile the binary

The package py2app (install/upgrade with “easy_install -U py2app”) is a package that automatically gathers all necessary packages and builds an app. It is employed in two steps:

1.1. Make a setup.py file:

This can be done with the following command:

<blockquote>py2applet --make-setup Batchtraveller.py</blockquote>

#As mine resulted in an errors, I had to add the option argv_emulation: False.
<blockquote>"""
This is a setup.py script generated by py2applet

Usage:
python setup.py py2app
"""

from setuptools import setup

APP = ['Batchtraveller.py']
DATA_FILES = []
<strong>OPTIONS = {'argv_emulation': False}</strong>

setup(
app=APP,
data_files=DATA_FILES,
options={'py2app': OPTIONS},
setup_requires=['py2app'],
)</blockquote>

1.2. Build the app:

To build the app (with a nice icon), the following command was used:

python setup.py py2app --iconfile browser.icns

This results in two new folders “built” and “dist”, the latter of which contains the Batchtraveller.app.

2. Make a “standard” compressed *.dmg

Instead of just giving the application away, I wanted to make a compressed diskimage, that contains the file, the licence-information and a shortcut to the applications directory. So here is what I did.

  1. I made a new diskimage of roughly the same size as the files using the Mac Disk Utility.

  2. Then I copied the *.app and licence.txt files into the disk image, and made a shortcut to the Applications directory.

  3. I used the Disk Utility again to convert the diskimage to a compressed dmg.[/en]

[de]Dies ist eine kurze Einführung bezüglich der Frage, wie Python-Scripts in einer verteilende Mac-App präsentiert werden können.

Im Wesentlichen habe ich mich an zwei Übungen orientiert. Die erste (first) bezüglich wie man ein Python-Script in eine eigenständige App wandelt und die Zweite (second) bezüglich wie komprimierte Disk-Images für Mac hergestellt werden. Die Ergebnisse sowie die Dateien, die zur Reproduzierung der Schritte nötig sind, können here heruntergeladen werden. Eine detaillierte Beschreibung kann im Rest dieses Posts gefunden werden.

0. Erstelle ein schönes Skript

Ich habe mein previous python -Skript genutzt, welches google maps batchs abfragt, um Abstände und Reisezeit zwischen einem Start- und Zielpunkt zu erfassen. Ich fügte zwei GUIs hinzu, eins fragt nach der Input-Datei und eins nach dem Ort, in dem die Ergebnisse gespeichert werden sollen.

1. Nutze py2app um die Binary zu übersetzen

Das Paket py2app (install/upgrade with “easy_install -U py2app”) ist ein Paket, welches automatisch alle nötigen Pakete sammelt und eine App erstellt. Es ist in zwei Schritten tätig:

1.1. Erstelle eine setup.py Datei:

Das kann mit folgendem Befehl realisiert werden:

py2applet --make-setup Batchtraveller.py

Da meins in Fehlern resultierte, musste ich die Option argv_emulation: False. hinzufügen

""" This is a setup.py script generated by py2applet Usage: python setup.py py2app """ from setuptools import setup APP = ['Batchtraveller.py'] DATA_FILES = [] **OPTIONS = {'argv_emulation': False}** setup( app=APP, data_files=DATA_FILES, options={'py2app': OPTIONS}, setup_requires=['py2app'], )

1.2. Erstellen der App:

Um die App (mit einem netten Icon) zu erstellen, wurde der folgende Befehl genutzt:

python setup.py py2app --iconfile browser.icns

Das resultierte in zwei neuen Ordnern “built” und “dist”, der letztere enthält die Batchtraveller.app.

2. Mach eine “standard” komprimierte *.dmg

Anstelle des einfachen Herausgeben der App wollte ich ein komprimiertes Disk-Image machen, welches die Datei, die Lizenzinformation und eine Abkürzung zum Apps-Verzeichnis enthält. Hier ist, was ich gemacht habe:

  1. Ich habe ein neues Disk-Image erstellt, mit ungefähr der gleichen Größe wie die Datei, welche Mac Disk Utility nutzt.

  2. Anschließend habe ich die *.app und die licence.txt-Datei ins Disk-Image kopiert und eine Abkürzung zum Apps-Verzeichnis erstellt.

  3. Ich habe die Disk Utility erneut genutzt, um das Disk-Image in eine komprimierte dmg zu konvertieren.[/de]