Question: featureCounts isoform vs gene summarization
1
gravatar for lkmklsmn
4.8 years ago by
lkmklsmn890
United States
lkmklsmn890 wrote:

I am somewhat new to RNAseq data and I have been using featureCounts from the subread package to summarize reads/fragments across genomic features (genes, transcripts).  

In particular I am curious about what parameter choice you guys use regarding overlap. The default setting ignores reads which overlap more than one feature. However, when summarizing at the isoform level (e.g. UCSCid), this choice will ignore all reads mapping into exons shared between isoforms and lead to very low reads. At the isoform level it seems to be the better choice to use the non-default setting -O counting reads overlapping features for each feature. In a subsequent step one could choose the highest expressed isoform to represent a given gene.  

At the gene level I think the default setting makes more sense. Here, however, you will sum reads across all isoforms, inflating the count of reads of any "true" single isoform or RNA species.  

So what option do you usually use?  

1) Summarize reads across isoforms -> choose highest expressed to represent gene  

2) Summarize reads across genes  

I just wanted to get a feeling for what others are doing regarding this choice.  

featurecounts rnaseq • 4.6k views
ADD COMMENTlink modified 3.7 years ago by Kirill260 • written 4.8 years ago by lkmklsmn890

The only way to get meaningful counts for isoforms is with an expectation-maximization method (e.g., Express or Flux Capacitor). There's no way around that.

ADD REPLYlink written 4.8 years ago by Devon Ryan91k
1
gravatar for Sean Davis
4.8 years ago by
Sean Davis25k
National Institutes of Health, Bethesda, MD
Sean Davis25k wrote:

I think of counting at two levels:

  1. Gene level
  2. Exon level

I do not think that simple counting at the isoform level using tools like featureCounts is likely to be practically very useful in human genomes for the reasons noted in the question.  Isoform deconvolution/expression estimation is not an easy problem and recent literature suggests that, while there are some pretty good tools, there is much need for improvement.

ADD COMMENTlink written 4.8 years ago by Sean Davis25k
0
gravatar for EagleEye
4.8 years ago by
EagleEye6.4k
Sweden
EagleEye6.4k wrote:
I do not think using featurecount for isoform level is an good idea. You can try the tools discussed in this post: A: How to determine alternative splicing read counts
ADD COMMENTlink written 4.8 years ago by EagleEye6.4k
0
gravatar for Kirill
3.7 years ago by
Kirill260
Australia
Kirill260 wrote:

In featureCounts you can select "the range" to check against. For example you can set "the range" to be transcript coordinates. You can then you group by option to bin your reads into transcripts. You'd need to use `-t transcript` and `-g transcript_id`. I think (not too sure) you will also need to specify `-f` option to tell featureCounts to summarise you reads based on transcript level..

p.s I haven't actually tried that.. I did use those option to counts reads per exons.. but in theory I don't see why that wouldn't work on the transcript level..  

ADD COMMENTlink written 3.7 years ago by Kirill260
3

I would be very hesitant to recommend that to anyone. What featureCounts will end up doing is either (A) ignoring alignments because they overlap multiple isoforms, thereby killing your statistical power or (B) counting a given alignment toward multiple isoforms, thereby screwing up the statistics. This is actually a great use-case for Salmon, which didn't yet exist when this question was posted.
 

ADD REPLYlink written 3.7 years ago by Devon Ryan91k

Yes, this is exactly what I thought afterwards :) I did think I should count read towards both isoforms, but didn't know how to proceed from that.. I'll have look at Salmon sounds interesting. 

ADD REPLYlink written 3.7 years ago by Kirill260
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 804 users visited in the last hour