combining quantification (featureCounts) result files into a single dataset
1
1
Entering edit mode
4.7 years ago
HK ▴ 40

Hey All,

I have almost 160 output files in a folder from the featurecounts (quantification of RNA-Seq) and now i want to put that in one datframe to be use for DESeq.

The format of the fetaure counts result is havinf 7 columns:

Geneid   Chr    Start   End Strand  Length  Sample1

What is need is a dataframe with 1st column as geneid and rest taking the 7th column from r=each file

Geneid   Sample1   Sample 2  Sample 3    Sample4 .......................

The code that i have used is :

library(limma)
library(edgeR)
myfiles=list.files(pattern = "*.txt" )  #reading files from the directory having 160 samples.
x <- readDGE(files, columns=c(1,7))
counts=as.data.frame(x$counts)

But i get the error

" Error in read.table(file = file, header = header, sep = sep, quote = quote,  : 
  more columns than column names" after running x <- readDGE(files, columns=c(1,7))
featurecounts matix combine • 8.1k views
ADD COMMENT
1
Entering edit mode

FeatureCounts can take multiple alignment files as input.

featureCounts [options] -o counts.txt file1.bam file2.bam file3.bam
ADD REPLY
0
Entering edit mode

@sej i already have the result from the featurecounts and have multiple txt files, i need to combine all the txt files in one dataframe.

ADD REPLY
0
Entering edit mode

You can merge multiple dataframes by rownames assuming that unique geneids are defined as rownames. https://stackoverflow.com/questions/7739578/merge-data-frames-based-on-rownames-in-r

ADD REPLY
0
Entering edit mode

While you could do what @Sej has already suggested, rerunning featureCounts with the right order of BAM files (so you should not need to mess with the matrix later) will be much easier to get a single matrix file.

ADD REPLY
0
Entering edit mode

If you have run featurecounts on each BAM file separately and you would like to merge the counts from all individual files, check this article for merging the counts from all individual files https://www.reneshbedre.com/blog/featurecounts-matrix.html

ADD REPLY
2
Entering edit mode
4.7 years ago

Below is the function I tend to use to read in multiple featureCounts outputs (one per sample):

DESeqDataSetFromFeatureCounts <- function (sampleTable, directory = ".", design, ignoreRank = FALSE, ...) 
{
    if (missing(design)) 
        stop("design is missing")
    l <- lapply(as.character(sampleTable[, 2]), function(fn) read.table(file.path(directory, fn), skip=2))
    if (!all(sapply(l, function(a) all(a$V1 == l[[1]]$V1)))) 
        stop("Gene IDs (first column) differ between files.")
    tbl <- sapply(l, function(a) a$V7)
    colnames(tbl) <- sampleTable[, 1]
    rownames(tbl) <- l[[1]]$V1
    rownames(sampleTable) <- sampleTable[, 1]
    dds <- DESeqDataSetFromMatrix(countData = tbl, colData = sampleTable[, 
        -(1:2), drop = FALSE], design = design, ignoreRank, ...)
    return(dds)
}

You'll note that it's essentially identical to DESeqDataSetFromHTSeqCount() or whatever that's called, just tweaked slightly to work with featureCounts output. You'll need a sampleTable, just as with the standard functions built into DESeq2.

ADD COMMENT
0
Entering edit mode

Thanks @Devon. Can you please explain ths "You'll need a sampleTable, just as with the standard functions built into DESeq2." Should i just use this list.files() function and have all the .txt files saved in myfiles variable . And then pass in the function?? myfiles=list.files(pattern = "*.txt" )

ADD REPLY
0
Entering edit mode
help(DESeqDataSetFromHTSeqCount)
ADD REPLY

Login before adding your answer.

Traffic: 1759 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6