Question

FeatureCounts summarizing at exon level

2

Entering edit mode

5.8 years ago

yamar ▴ 30

Hi,

Im trying to run featureCounts to summarize the read counts at exon levels (eg feature level) to assess gene expression levels of different expressed isoforms of a gene. However im a bit uncertain regarding which parameters to use for this purpose such as how to handle with multi-mapping reads and curious with how other people deal with this.

Im using subread v 1.5.0 on paired-end RNAseq data as follows:

featureCounts -p -s 1 -T 5 -f -t exon -g exon_id -a genes.gtf -o counts_output.txt myRNAseqBam.bam

There are additional options such as:

"-B" (only fragments that have both ends successfully aligned will be considered for summarization)
"-O" ( reads (or fragments if -p is specifed) will be allowed to be assigned to more than one matched meta-feature (or feature if -f is specifed)
"-M" (multi-mapping reads/fragments will be counted)

and wondering if people specify these options or not (in the case of paired-end RNA-seq data.).

Is there any best practice for RNA-seq data? Naturally it depends on what you want to do, but here we can assume gene expression quantification at exon level for quantification of expressed isoforms. Also, what is the difference between "-O" and "-M"?

rna-seq featurecounts exon • 7.5k views

ADD COMMENT • link updated 5.8 years ago by dbpzdbpz ▴ 210 • written 5.8 years ago by yamar ▴ 30

0

Entering edit mode

Hello yamar,

Please use the formatting bar (especially the code option) to present your post better. genomax done it for you this time.
code_formatting

Thank you!

ADD REPLY • link 5.8 years ago by lakhujanivijay 5.8k

0

Entering edit mode

Thank you for the info! I will remember to do so from now on.

ADD REPLY • link 5.8 years ago by yamar ▴ 30

score 5 · Answer 1 · 2018-06-20

The "-B" option is not necessary for exon-level counting, but it can improve (maybe very little) the accuracy of analysis by not considering the read-pairs that are suspiciously low-quality.

The "-O" option is not necessary if your aligner is not junction-aware, namely they don't report the exon-exon junctions in the CIGAR strings in your mapping results. However, if your aligner is junction-aware, for example, Subjunc, TopHat or STAR, then you HAVE to use the "-O" option, or you will lose all the reads or read-pairs that overlap with multiple exons.

The "-O" option deals with the reads or read-pairs that overlaps with multiple exons or genes. Say, a read is mapped to only one location, but there are 3 exons that all overlap with this location, then you have to use "-O" to have this read counted (it contributes one count to each of the 3 exons). If you don't use "-O", this read will be assigned to no exon because of the ambiguity.

The "-M" option deals with the reads or read-pairs that can be mapped to multiple locations. Some aligners have options to report many mapping locations of a read or read-pair. If you don't use the "-M" option, then all the reads that have 2 or more mapping locations are not counted at all, even if each of the mapping location overlaps with only one exon.