Question: FeatureCounts summarizing at exon level
0
gravatar for yamar
10 months ago by
yamar0
yamar0 wrote:

Hi,

Im trying to run featureCounts to summarize the read counts at exon levels (eg feature level) to assess gene expression levels of different expressed isoforms of a gene. However im a bit uncertain regarding which parameters to use for this purpose such as how to handle with multi-mapping reads and curious with how other people deal with this.

Im using subread v 1.5.0 on paired-end RNAseq data as follows:

featureCounts -p -s 1 -T 5 -f -t exon -g exon_id -a genes.gtf -o counts_output.txt myRNAseqBam.bam

There are additional options such as:

  • "-B" (only fragments that have both ends successfully aligned will be considered for summarization)
  • "-O" ( reads (or fragments if -p is specifed) will be allowed to be assigned to more than one matched meta-feature (or feature if -f is specifed)
  • "-M" (multi-mapping reads/fragments will be counted)

and wondering if people specify these options or not (in the case of paired-end RNA-seq data.).

Is there any best practice for RNA-seq data? Naturally it depends on what you want to do, but here we can assume gene expression quantification at exon level for quantification of expressed isoforms. Also, what is the difference between "-O" and "-M"?

exon rna-seq featurecounts • 1.2k views
ADD COMMENTlink modified 10 months ago by dbpzdbpz90 • written 10 months ago by yamar0

Hello yamar,

Please use the formatting bar (especially the code option) to present your post better. genomax done it for you this time.
code_formatting

Thank you!

ADD REPLYlink written 10 months ago by Vijay Lakhujani4.0k

Thank you for the info! I will remember to do so from now on.

ADD REPLYlink written 10 months ago by yamar0
0
gravatar for dbpzdbpz
10 months ago by
dbpzdbpz90
Australia
dbpzdbpz90 wrote:

The "-B" option is not necessary for exon-level counting, but it can improve (maybe very little) the accuracy of analysis by not considering the read-pairs that are suspiciously low-quality.

The "-O" option is not necessary if your aligner is not junction-aware, namely they don't report the exon-exon junctions in the CIGAR strings in your mapping results. However, if your aligner is junction-aware, for example, Subjunc, TopHat or STAR, then you HAVE to use the "-O" option, or you will lose all the reads or read-pairs that overlap with multiple exons.

The "-O" option deals with the reads or read-pairs that overlaps with multiple exons or genes. Say, a read is mapped to only one location, but there are 3 exons that all overlap with this location, then you have to use "-O" to have this read counted (it contributes one count to each of the 3 exons). If you don't use "-O", this read will be assigned to no exon because of the ambiguity.

The "-M" option deals with the reads or read-pairs that can be mapped to multiple locations. Some aligners have options to report many mapping locations of a read or read-pair. If you don't use the "-M" option, then all the reads that have 2 or more mapping locations are not counted at all, even if each of the mapping location overlaps with only one exon.

ADD COMMENTlink modified 10 months ago • written 10 months ago by dbpzdbpz90

great! Thank you for very much for the nice clarifications. Yes, we aligned the data with TopHat and when running without "-O" I ended up with very few assigned reads at "exon-level"...so im running with the "-O" option now.

ADD REPLYlink written 9 months ago by yamar0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1036 users visited in the last hour