1. Loading data

Download a set of qPCR experiments results here. The file is a zip archive which you should uncompress into a subfolder of your project.

For example, if you downloaded the file into your project folder you can run the following command:

unzip("pcr.zip", exdir = "data/pcr") # will create a data/pcr subfolder and extract the files

These qPCR results were obtained from 2 different samples and were replicated 5 times. They are stored in 10 different files. A filename looks like “mlc1_1.csv” where the first number is the sample id and the second the replicate id.

Read in the qPCR results

Identify the file format

  1. Using a text editor or RStudio, try to identify how the flat file has been encoded.
  2. Try to import the file mlc1_1.csv using the read_delim() function.

Import multiple files

  1. Create a vector named pcr_files containing the path to all 10 data files using the list.files() function and adjust the full.names argument accordingly (if required you might get some help using ?list.files).

  2. Now use map() from the purrr package to import all files.
    • What is the type of the output (you might want to use glimpse())
    • Are you able to identify from which file each element has been imported ?
  3. map() will name each output element according to the names found in the input vector. Use set_names() to keep this information.

  • (Supplementary question) remove the path and extension from the filename using basename() and tools::file_path_sans_ext().
  1. Getting a single tibble out of all files would be much handier. Instead of further transforming this output we will use another member of the map() family of functions that will immediately create the desired output. Replace your call to map() in your previous code by the alternative member of the family to get a data frame directly.

  2. Are you still able to identify the different samples and replicates? You probably need to adjust the appropriate argument in your mapping function (have a look at the help page).

Rearrange the data and save multiple files

Now that we were able to read in our data into a single data frame we would like to group together the measures for each individual gene and store them as separate .csv files.

  1. first nest the data to let the measures associated to each gene appear in their own tibble.
  2. create a folder to store the files

Tip

# First we create a new folder inside the data folder to store the output files
# You can create such a folder relative to your project or Rmarkdown folder using the following command:
dir.create(file.path("data", "by_gene"), showWarnings = FALSE) 
# You can set `showWarnings = FALSE` to avoid a warning each time you knit
# or execute the chunk telling that the folder already exists 

Tip

In this tutorial we stored our input files in the data/pcr subfolder relative to the projects path. To build up a platform independent path (windows uses \ while linux and MAC OS are using /) we can use the function file.path() which will choose the appropriate path separator.

  1. create a new column containing the path to the target name
    • first create a filename using paste0
    • then combine the folder path with the filename using file.path().
  2. now we should use a function able to write .csv files and map it to the appropriate vector(s). If read_csv() is able to read .csv files, which would be the readr function able to write such a file?
    • Is writing a file to a disk a side-effect? What would be the appropriate purrr function?

2. Loading data and handling untidy data

Download the excel file called sizes.xls. The file contains measures (width, height and depth) of 5 different samples before and after a treatment. We would like to calculate the volume of the object before and after the treatment.

Load the content of the excel file

First load the measures into a data frame using the powerful tools provided by the purrr package.

  • What function would you like to repeat?
  • How do you generate your starting vector?

Calculate the volume

It turns out that the size (\(width \times height \times depth\)) was not entered in a tidy form. Extract the different values in order to calculate the volume.

Warning

We already saw the tidyr::separate() function which might help you to get what you want. Here we would like to stick to a purrr approach. Think about how many elements you would like to provide and how many elements you would like to get.

Tip

  • You might want to use base::strsplit() or stringr::str_split()