Addendum: add title to arrange.grid(plot)

Coming back to the example how to get an overview of differences between two types across a number of different sites or treatments, I forgot to add a title. Plus, maybe you want to specify the number of rows or columns.

Since do.call() expects a grob, we can’t just pass the argument to the grid. All of these don’t work:

do.call(grid.arrange, p, main = "This does not work!")
do.call(grid.arrange, p + ggtitle(main = "This does also not work!"))
do.call(grid.arrange, p + opts(title = "And this does not wor, too"))

And of course, we can’t title the grid in the loop: only the single plots can get a title there. Now, this is more like it:

do.call(grid.arrange, c(p, ncol = 3, main = "Compare between types, facetted by sites"))

Neat. Concatenating the list object p (our plots) parameters like ncol, nrow and main is the trick to easily tweak. R is smart enough and adds the additional elements $nrow: num 3 and $main: chr "Compare between types, facetted by sites" to the list object we created in the loop. And grid.arrange() interprets these as grobs. There we go, added a title to our plot, and arranged it in three columns.

Plotting multivariate data: comparing groups among groups

(This is the first post I write in markdown+knitr, so please don’t mind any glitches, e.g. in the inline-R comments.)

Plotting can be good way to get an idea about the data. Now, imagine you have got a multivariate dataset with a lot of variables. You can, e.g. do a simple boxplot of each variable. But what if you want several plots to compare variables between several groups?

What annoyed me for some time was that I had, e.g., two types of plots, at several sites. That means two factors which could be used to group the data. A certain scenario that actually happens a lot in life sciences: you want to compare groups of samples from different pools, at a glance.

To be reproducible, let’s first construct such a dataset with, say, 120 measurements (or plots). They belong to two types (a, b) and were sampled at four different sites (A,B,C,D). You can also think of treatments in different groups, if you are more of an experimental person. Or of habitat type on four different continents, if you are coming from a macroecological background.

rows <- 120
cols <- 25
test <- data.frame(Type = factor(round(runif(rows, 1, 2))),
                   ## this is just one way to create 2 random factors
                   Site = factor(rbinom(rows, 3, 0.3)))
                   ## this is another way to create 4 random factors.
levels(test[,1]) <- c("a", "b")            
  ## rename the created factors for convenience;
levels(test[,2]) <- c("A", "B", "C", "D")  
  ## also rename, we don't want meaningless "numbers" as labels
  for(i in 1:cols) { 
        varname <- paste("var_",letters[i], sep="") 
          ## name the variables from a to y, i.e. letters(1:25)
        test[,varname]<- rnorm(rows,10)}
          ## fills the named coloum with random data around 10
dim(test) ## Now, let's look at our dataframe's dimensions:
## [1] 120  27

Next, we construct a vector of the variables we want to plot. Like this, you can always choose from larger datasets wich part you really need. We also construct an empty list to store our plots in. (I keep forgetting how to do that, that’s part of the reason for me writing this now.)

In the loop, the i*th variable should be added as plot to the *i*th position of the list, and we plot the variable against the factor ‘*Type’. So, we end up with 25 plots in the list, each containing two boxplots (a,b) in a single panel. You might have noticed that the loop uses vars[i]. This would not work with aes(x=Type, y=vars[i]). aes_string() is our friend here. It passes the string through, but mind to set the quotation for the coloum name you directly use. Now, for the second factor, the ‘Sites’: we add facets to the plot! + facet_grid() does the trick.

for (i in seq_along(vars)) {
    p[[i]] <- ggplot(data = test, aes_string(x = "Type", y = vars[i])) + geom_boxplot() + 
        facet_grid(. ~ Site)
}

Still, if we print it now, we end up overplotting each four pairs of boxplots, 25 times. A simple print(p) will not do. Instead, we use something out of the box of the extraGrid package. And that’s the final magic.

library(gridExtra)
do.call(grid.arrange, p)

Personally, I find it embarrassing that I keep forgetting to use aes_string() in combination with facet_grid(), but the real McCoy was to find the do.call(grid.arrange)-part and combine it with a list of plots. I used to plot with viewports, which is very helpful if you exactly know how you want to present your stuff. But if you just want to get an overview of relations between two plot types across several sites (or treatments, or whatever), the outline above is much more flexible, I think. Of course, the gridded plot is not beautiful and publication-ready - but hey, we want to have a short look at a massive amount of data, nothing more!

Just to give you the whole chunk for copy-pasting:

## create data
rows <- 120
cols <- 25
test <- data.frame(Type = factor(round(runif(rows, 1, 2))), Site = factor(rbinom(rows, 
    3, 0.3)))
levels(test[, 1]) <- c("a", "b")
levels(test[, 2]) <- c("A", "B", "C", "D")
for (i in 1:cols) {
    varname <- paste("var_", letters[i], sep = "")
    test[, varname] <- rnorm(rows, 10)
}
vars <- names(test)[3:length(test)]

## plot
p <- vector("list", length(vars))
for (i in seq_along(vars)) {
    p[[i]] <- (ggplot(data = test, aes_string(x = "Type", y = vars[i])) + geom_boxplot() + 
        facet_grid(. ~ Site))
}
do.call(grid.arrange, p)

(via: stackexchange 1 on multiple plots and gridExtra, stackexchange 2 on aes_string, stackexchange on aes_string considering for-loops and apply)

If you’ve got any suggestions how to do this more elegantly, please submit them, as we don’t have a Disqus-forum (yet).

Text
Photo
Quote
Link
Chat
Audio
Video