Follow posts tagged #google form in seconds.
Sign upGoogle Forms Multi-response Separator
Previously: Using Google Drive to make Survey Forms and importing answers into R
I noticed that if you make a question with multiple responses then that responder’s response to that question will be a concatenation of responses. Not very useful for data analysis.
Suppose two items on the survey ask to select which Apple & Microsoft products the responder has used in the past 6 months. When you import the responses into R using that importer code you might see responses 1. “iPad, iPhone” and 2. “iPod Touch, iPhone, iMac” in the Apple column, and 1. “Xbox, Surface” and 2. “Zune” in the Microsoft column.
So we run
x <- separate(survey.data, vars = c("Apple", "Microsoft"))
Which would output a list with two components, each of which is a data frame with indicator variable for each possible response (using the first 5 characters). If we combine these two components into one data frame using cbind we might see:
The code for the separator function is:
separate <- function(x, vars) {
# x : data frame vars : vector of column names
temp <- list()
for (i in 1:length(vars)) {
temp[[i]] <- sapply(x[, vars[i]], function(y) {
strsplit(as.character(y), ", ")
})
}
lvls <- lapply(temp, function(y) {
unname(sapply(substr(unique(unlist(y)), 1, 5), function(z) {
if (is.na(z))
"NA" else z
}))
})
n.lvls <- sapply(lvls, length)
VARS <- list()
for (i in 1:length(vars)) {
OBS <- as.data.frame(matrix(0, nrow = length(temp[[i]]), ncol = n.lvls[i]))
names(OBS) <- lvls[[i]]
for (j in seq(along = temp[[i]])) {
if (!(is.na(temp[[i]][[j]])[1])) {
for (k in seq(along = temp[[i]][[j]])) {
OBS[j, substr(temp[[i]][[j]][k], 1, 5)] <- 1
}
} else {
OBS[j, ] <- rep(NA, n.lvls[[i]])
}
}
which.na <- which(lvls[[i]] == "NA")
names(OBS) <- paste(vars[[i]], lvls[[i]], sep = ".")
VARS[[i]] <- OBS[, -which.na]
}
names(VARS) <- vars
return(VARS)
}
I admit that the approach I’ve taken above is inefficient but it works.
NOTE: If one of the possible responses contains commas then multiple columns will be created for that response. So, for example, if the responder can check “Online (Amazon, eBay)” along with “In-store (Best Buy, Frys)” then we will see the following columns (which have the same number of 1s and 0s): Onlin, eBay, In-st, Frys. This is because the function uses “, ” to separate a response into multiple possible responses. This is unavoidable so be careful.
Using Google Drive to make Survey Forms and importing answers into R
I remember when Google Docs first launched. I was still in high school and I immediately became a Google evangelist. I told everyone to start using this wonderful new cloud-based service. I don’t think the term ‘cloud-based’ even existed at the time, although it’s more likely that I was simply not aware of its existence. Since then the service has grown substantially. It includes a lot more features, a significantly better UI, and it even lets people design surveys!
It’s even easy to make a survey with branching questions. See:

The form can accessed here. Feel free to submit a response. This is the form we’ll be using in this example. (Responses so far.)
The results of such surveys are stored as Spreadsheets on Google Drive (formerly Google Docs & Spreadsheets). But what if we want to access all those answers in R and perform some EDA or analysis? And what if we don’t want to go to the Google Drive page and download the results as a CSV manually? Let me show you how we can achieve all of this with some R code after a quick-and-easy initial setup.
I assume you’ve designed the form and may or may not have responses already. Open it up and you should see the spreadsheet. Click on File -> Publish to the web. Then click Start Publishing and pick CSV under Get a link to the published data
Copy the link. You will use this as the filename.
filename <- "https://docs.google.com/spreadsheet/pub?key=0ApOyZxZwgCv6dC1uUUVVbl9ieEJSQjhMQWpGZUxuYUE&output=csv"
By default, R has some issues downloading files through https; so we need to use a package called RCurl.
# Checks if RCurl is installed. If not, installs it.
if (sum(installed.packages()[, 1] == "RCurl") < 1) install.packages("RCurl")
require(RCurl)
Then we need to actually download the survey results. This is done through the following script:
txt = tryCatch(getURL(filename), error = function(e) {
getURL(filename, ssl.verifypeer = FALSE)
})
tc = textConnection(txt) # Opens a connection.
survey.results <- read.csv(tc, header = T, stringsAsFactors = TRUE, na.strings = "")
close(tc) # Closes the connection.
rm(txt, tc, filename) # Cleans up the workspace.
You’ll actually get really long column names as well as a column of the dates and times of the submissions. Let’s clean this up this really quick:
# remember to modify data frame name to your needs
statisticians <- survey.results[, -1]
# remember to modify as appropriate
names(statisticians) <- c("software", "used.ggplot2", "role")
Here’s what the data looks like:
Thanks for reading! Enjoy :)