featureCounts for paired bam files and dealing with multimapping reads?
1
0
Entering edit mode
4.7 years ago
bioinfo8 ▴ 180

Hi,

I have multiple paired-end bam files from rnaseq experiments (hisat2 aligned). I extracted properly-paired reads, sorted, indexed and ran featureCounts using the following command (as per http://bioinf.wehi.edu.au/featureCounts/):

featureCounts -p -t exon -g gene_id -a species.gtf -o bam.featureCounts *.bam

While it is still running, I can see the following in the log file:

||                 Threads : 1                                                ||
||                   Level : meta-feature level                               ||
||              Paired-end : yes                                              ||
||         Strand specific : no                                               ||
||      Multimapping reads : not counted                                      ||
|| Multi-overlapping reads : not counted                                      ||
||   Min overlapping bases : 1                                                ||
||                                                                            ||
||          Chimeric reads : counted                                          ||
||        Both ends mapped : not required                                     ||

I am confused regarding whether I should count multimapping reads or not and how it would effect the differential gene expression analysis? I have gone through A: Dealing with multimapping reads in featureCounts post, but would like to know more opinion.

Also the userguide says: "Due to the mapping ambiguity, it is recommended that multi-mapping reads should be excluded from read counting (default behavior of featureCounts program) to produce as accurate counts as possible."

Thanks!

featureCounts paired multimapping reads RNA-Seq • 8.7k views
ADD COMMENT
4
Entering edit mode
4.7 years ago

You should not count multimapping alignments; the default featureCounts settings are appropriate.

ADD COMMENT
0
Entering edit mode

A read from a gene can map to both the parent gene and as well as to a similar pseudogene. Removing ambiguous reads will under represent the parent gene even though it was expressed and counting them will over-represent the pseudogene. What do we do in such a situation?

ADD REPLY
1
Entering edit mode

We still ignore multimappers if your tool needs integers.

ADD REPLY

Login before adding your answer.

Traffic: 1993 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6