Vertically merge dataframes in R from a list
1
0
Entering edit mode
3.1 years ago
ellieuk ▴ 40

Hi there,

I have some coverage stats and I want to merge them together. However, I want to merge them regardless of how many samples I have. Let's say in one batch I have 5 samples (A, B, C, D and E). Their corresponding files are (A.cov, B.cov, C.cov, D.cov and E.cov).

I first load these into R

library(tidyverse)
library(dplyr)

 #load in coverage.stats files
 coverage_stats = list.files(pattern="*.cov")
 for (i in 1:length(coverage_stats)) assign(coverage_stats[i], read.table(coverage_stats[i]))
 cov_stats_files = lapply(coverage_stats, read.table)

All is good here. This is the structure of the new dataframes (all the same) - this is A.cov:

structure(list(V1 = c("sample", "total_reads", "mapped_to_target_reads", 
"percentage", "mapped_to_target_reads_plus_150bp", "percentage"
), V2 = c("A", "56402158", "45018562", "79.82", "56664165", 
"100.46")), row.names = c(NA, 6L), class = "data.frame")

This is B.cov so you have another for good measure:

structure(list(V1 = c("sample", "total_reads", "mapped_to_target_reads", 
"percentage", "mapped_to_target_reads_plus_150bp", "percentage"
), V2 = c("B", "56402458", "45018555", "80.82", "5666416", 
"98")), row.names = c(NA, 6L), class = "data.frame")

I want to transform the tables into a nicer format:

transform_tables <- function(x) {
x %<>% t()
x <- as.data.frame(x)
x %<>% setNames(as.character(x[1,]))
x <- x[-1,] 
}

cov_stats_files <- lapply(cov_stats_files, transform_tables)

Now I have my tables in the format I want. I now want to bind all the tables vertically (like an rbind, but without explicitly giving the objects. I want to use the list cov_stats_files to do this (since each batch will have a different number of samples). This is where I'm stuck! I don't know how to iterate through the list and bind each dataframe together...

Would appreciate any help pls! E

R • 811 views
ADD COMMENT
2
Entering edit mode
3.1 years ago
Ram 43k

You should be able to do that with Reduce(rbind, cov_stats_files), I think.

EDIT:

Use do.call(rbind, cov_stats_files) - it will be much faster as it uses rbind being a vectorized function to pass the entire list instead of passing the list's members two at a time (which is what Reduce does).

ADD COMMENT
1
Entering edit mode

So easy! Thank you so much! Works beautifully! :-)

ADD REPLY
1
Entering edit mode

Courtesy rpolicastro - rbind is vectorized, so you can use do.call instead of Reduce as rbind doesn't need to be supplied arguments 2 at a time.

do.call(rbind, cov_stats_files)

will be faster than Reduce

ADD REPLY

Login before adding your answer.

Traffic: 2978 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6