Question: FeatureCounts summarizing at exon level
1
gravatar for yamar
14 months ago by
yamar10
yamar10 wrote:

Hi,

Im trying to run featureCounts to summarize the read counts at exon levels (eg feature level) to assess gene expression levels of different expressed isoforms of a gene. However im a bit uncertain regarding which parameters to use for this purpose such as how to handle with multi-mapping reads and curious with how other people deal with this.

Im using subread v 1.5.0 on paired-end RNAseq data as follows:

featureCounts -p -s 1 -T 5 -f -t exon -g exon_id -a genes.gtf -o counts_output.txt myRNAseqBam.bam

There are additional options such as:

  • "-B" (only fragments that have both ends successfully aligned will be considered for summarization)
  • "-O" ( reads (or fragments if -p is specifed) will be allowed to be assigned to more than one matched meta-feature (or feature if -f is specifed)
  • "-M" (multi-mapping reads/fragments will be counted)

and wondering if people specify these options or not (in the case of paired-end RNA-seq data.).

Is there any best practice for RNA-seq data? Naturally it depends on what you want to do, but here we can assume gene expression quantification at exon level for quantification of expressed isoforms. Also, what is the difference between "-O" and "-M"?

exon rna-seq featurecounts • 1.7k views
ADD COMMENTlink modified 14 months ago by dbpzdbpz100 • written 14 months ago by yamar10

Hello yamar,

Please use the formatting bar (especially the code option) to present your post better. genomax done it for you this time.
code_formatting

Thank you!

ADD REPLYlink written 14 months ago by lakhujanivijay4.3k

Thank you for the info! I will remember to do so from now on.

ADD REPLYlink written 14 months ago by yamar10
1
gravatar for dbpzdbpz
14 months ago by
dbpzdbpz100
Australia
dbpzdbpz100 wrote:

The "-B" option is not necessary for exon-level counting, but it can improve (maybe very little) the accuracy of analysis by not considering the read-pairs that are suspiciously low-quality.

The "-O" option is not necessary if your aligner is not junction-aware, namely they don't report the exon-exon junctions in the CIGAR strings in your mapping results. However, if your aligner is junction-aware, for example, Subjunc, TopHat or STAR, then you HAVE to use the "-O" option, or you will lose all the reads or read-pairs that overlap with multiple exons.

The "-O" option deals with the reads or read-pairs that overlaps with multiple exons or genes. Say, a read is mapped to only one location, but there are 3 exons that all overlap with this location, then you have to use "-O" to have this read counted (it contributes one count to each of the 3 exons). If you don't use "-O", this read will be assigned to no exon because of the ambiguity.

The "-M" option deals with the reads or read-pairs that can be mapped to multiple locations. Some aligners have options to report many mapping locations of a read or read-pair. If you don't use the "-M" option, then all the reads that have 2 or more mapping locations are not counted at all, even if each of the mapping location overlaps with only one exon.

ADD COMMENTlink modified 14 months ago • written 14 months ago by dbpzdbpz100

great! Thank you for very much for the nice clarifications. Yes, we aligned the data with TopHat and when running without "-O" I ended up with very few assigned reads at "exon-level"...so im running with the "-O" option now.

ADD REPLYlink written 13 months ago by yamar10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 779 users visited in the last hour