Question: featurecounts output changes the underscores in the sample name into dots
0
gravatar for wangdp123
8 months ago by
wangdp123250
Oxford
wangdp123250 wrote:

Hi there,

In the latest version Rsubread package, I run the featurecounts analysis on a few samples but the output has changed the underscores in the sample names into dots, which is very inconvenient.

For example,

the original bam file name: sample_1.bam

the new sample name: sample.1.bam

Is there any way to avoid this conversion?

Many thanks,

Tom

featurecounts • 245 views
ADD COMMENTlink modified 8 months ago by Gordon Smyth1.9k • written 8 months ago by wangdp123250

featureCounts() of Rsubread does not generate bam files but uses them as input.

You are probably using the align() function from the said package. If that is the case, you might want to check the output_file argument. Just change the default output_file = paste(readfile1,"subread",output_format,sep=".") to output_file = paste(readfile1,"subread",output_format,sep="_").

ADD REPLYlink written 8 months ago by Haci370
1

OP uses featureCounts and the issue is real, it replaces underscore by dot. The output (column) name in the count matrix is the name of the bam file. Will move this to comment.

ADD REPLYlink written 8 months ago by ATpoint38k
2
gravatar for Gordon Smyth
8 months ago by
Gordon Smyth1.9k
Australia
Gordon Smyth1.9k wrote:

No, you can't avoid the conversion. featureCounts is trying to protect the names from systems that can't handle punctuation in variable names, but I agree it is not necessary here.

Of course you had to input the file names to featureCounts in the first place:

fc <- featureCounts(files, ...)

so you can easily put them back at the end:

colnames(fc$counts) <- files

I personally like to use

colnames(fc$counts) <- limma::removeExt(basename(files))
ADD COMMENTlink modified 8 months ago • written 8 months ago by Gordon Smyth1.9k
0
gravatar for ATpoint
8 months ago by
ATpoint38k
Germany
ATpoint38k wrote:

Why don't you simply grep the sample names from disk and then put them as colnames back into the output? Something like this:

tmp.files <- list.files(path = "~/Desktop/", pattern = ".bam", full.names = TRUE)

tmp.names <- sapply(strsplit(tmp.files, split="\\/"), function(x) rev(x)[1])

countmatrix<-featureCounts(files = tmp.files,
                           annot.ext = "~/Desktop/test.saf")

colnames(countmatrix$counts) <- c(tmp.names)
colnames(countmatrix$stat)   <- c("Status", tmp.names)
ADD COMMENTlink modified 8 months ago • written 8 months ago by ATpoint38k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1180 users visited in the last hour