Hi,
I have multiple paired-end bam files from rnaseq experiments (hisat2 aligned). I extracted properly-paired reads, sorted, indexed and ran featureCounts using the following command (as per http://bioinf.wehi.edu.au/featureCounts/):
featureCounts -p -t exon -g gene_id -a species.gtf -o bam.featureCounts *.bam
While it is still running, I can see the following in the log file:
||                 Threads : 1                                                ||
||                   Level : meta-feature level                               ||
||              Paired-end : yes                                              ||
||         Strand specific : no                                               ||
||      Multimapping reads : not counted                                      ||
|| Multi-overlapping reads : not counted                                      ||
||   Min overlapping bases : 1                                                ||
||                                                                            ||
||          Chimeric reads : counted                                          ||
||        Both ends mapped : not required                                     ||
I am confused regarding whether I should count multimapping reads or not and how it would effect the differential gene expression analysis? I have gone through A: Dealing with multimapping reads in featureCounts post, but would like to know more opinion.
Also the userguide says: "Due to the mapping ambiguity, it is recommended that multi-mapping reads should be excluded from read counting (default behavior of featureCounts program) to produce as accurate counts as possible."
Thanks!
A read from a gene can map to both the parent gene and as well as to a similar pseudogene. Removing ambiguous reads will under represent the parent gene even though it was expressed and counting them will over-represent the pseudogene. What do we do in such a situation?
We still ignore multimappers if your tool needs integers.