Question

featureCounts isoform vs gene summarization

1

Entering edit mode

9.5 years ago

lkmklsmn ▴ 970

I am somewhat new to RNAseq data and I have been using featureCounts from the subread package to summarize reads/fragments across genomic features (genes, transcripts).

In particular I am curious about what parameter choice you guys use regarding overlap. The default setting ignores reads which overlap more than one feature. However, when summarizing at the isoform level (e.g. UCSCid), this choice will ignore all reads mapping into exons shared between isoforms and lead to very low reads. At the isoform level it seems to be the better choice to use the non-default setting -O counting reads overlapping features for each feature. In a subsequent step one could choose the highest expressed isoform to represent a given gene.

At the gene level I think the default setting makes more sense. Here, however, you will sum reads across all isoforms, inflating the count of reads of any "true" single isoform or RNA species.

So what option do you usually use?

Summarize reads across isoforms -> choose highest expressed to represent gene
Summarize reads across genes

I just wanted to get a feeling for what others are doing regarding this choice.

featureCounts RNAseq • 8.9k views

ADD COMMENT • link updated 2.2 years ago by Ram 43k • written 9.5 years ago by lkmklsmn ▴ 970

0

Entering edit mode

The only way to get meaningful counts for isoforms is with an expectation-maximization method (e.g., Express or Flux Capacitor). There's no way around that.

ADD REPLY • link 9.5 years ago by Devon Ryan 104k

Ram · Answer 1 · 2014-11-01

I think of counting at two levels:

Gene level
Exon level

I do not think that simple counting at the isoform level using tools like featureCounts is likely to be practically very useful in human genomes for the reasons noted in the question. Isoform deconvolution/expression estimation is not an easy problem and recent literature suggests that, while there are some pretty good tools, there is much need for improvement.

Ram · Answer 2 · 2014-11-01

0

Entering edit mode

9.5 years ago

EagleEye 7.5k

I do not think using featurecount for isoform level is an good idea. You can try the tools discussed in this post

ADD COMMENT • link updated 2.2 years ago by Ram 43k • written 9.5 years ago by EagleEye 7.5k

Ram · Answer 3 · 2015-12-07

0

Entering edit mode

8.4 years ago

Kirill Tsyganov ▴ 370

In featureCounts you can select "the range" to check against. For example you can set "the range" to be transcript coordinates. You can then you group by option to bin your reads into transcripts. You'd need to use -t transcript and -g transcript_id. I think (not too sure) you will also need to specify -f option to tell featureCounts to summarise you reads based on transcript level..

p.s I haven't actually tried that.. I did use those option to counts reads per exons.. but in theory I don't see why that wouldn't work on the transcript level..

ADD COMMENT • link updated 4.4 years ago by Ram 43k • written 8.4 years ago by Kirill Tsyganov ▴ 370

3

Entering edit mode

I would be very hesitant to recommend that to anyone. What featureCounts will end up doing is either (A) ignoring alignments because they overlap multiple isoforms, thereby killing your statistical power or (B) counting a given alignment toward multiple isoforms, thereby screwing up the statistics. This is actually a great use-case for Salmon, which didn't yet exist when this question was posted.

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.4 years ago by Devon Ryan 104k

0

Entering edit mode

Yes, this is exactly what I thought afterwards :) I did think I should count read towards both isoforms, but didn't know how to proceed from that.. I'll have look at Salmon sounds interesting.

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.4 years ago by Kirill Tsyganov ▴ 370

0

Entering edit mode

Salmon can handle multimapping issues? TELL ME MORE!

ADD REPLY • link 2.2 years ago by O.rka ▴ 710

0

Entering edit mode

Read the paper, that's kind of the point of salmon.

ADD REPLY • link 2.2 years ago by Devon Ryan 104k