Question

featureCounts for paired bam files and dealing with multimapping reads?

1

Entering edit mode

6.5 years ago

bioinfo8 ▴ 230

Hi,

I have multiple paired-end bam files from rnaseq experiments (hisat2 aligned). I extracted properly-paired reads, sorted, indexed and ran featureCounts using the following command (as per http://bioinf.wehi.edu.au/featureCounts/):

featureCounts -p -t exon -g gene_id -a species.gtf -o bam.featureCounts *.bam

While it is still running, I can see the following in the log file:

||                 Threads : 1                                                ||
||                   Level : meta-feature level                               ||
||              Paired-end : yes                                              ||
||         Strand specific : no                                               ||
||      Multimapping reads : not counted                                      ||
|| Multi-overlapping reads : not counted                                      ||
||   Min overlapping bases : 1                                                ||
||                                                                            ||
||          Chimeric reads : counted                                          ||
||        Both ends mapped : not required                                     ||

I am confused regarding whether I should count multimapping reads or not and how it would effect the differential gene expression analysis? I have gone through A: Dealing with multimapping reads in featureCounts post, but would like to know more opinion.

Also the userguide says: "Due to the mapping ambiguity, it is recommended that multi-mapping reads should be excluded from read counting (default behavior of featureCounts program) to produce as accurate counts as possible."

Thanks!

featureCounts paired multimapping reads RNA-Seq • 10k views

ADD COMMENT • link updated 6.5 years ago by Devon Ryan 104k • written 6.5 years ago by bioinfo8 ▴ 230

score 4 · Accepted Answer · 2017-11-01

4

Entering edit mode

6.5 years ago

Devon Ryan 104k

You should not count multimapping alignments; the default featureCounts settings are appropriate.

ADD COMMENT • link 6.5 years ago by Devon Ryan 104k

0

Entering edit mode

A read from a gene can map to both the parent gene and as well as to a similar pseudogene. Removing ambiguous reads will under represent the parent gene even though it was expressed and counting them will over-represent the pseudogene. What do we do in such a situation?