I aligned some paired-end RNA-seq data using STAR and then used featureCounts on the output .bam to assign them to genes. However I'm really confused as the output of featureCounts shows that I have more fragments (read-pairs) than the number of total input reads (I'm assuming also read-pairs/fragments) in the STAR output.
STAR output:
Number of input reads | 45661688
Average input read length | 199
UNIQUE READS:
Uniquely mapped reads number | 41983291
Uniquely mapped reads % | 91.94%
featureCounts output:
|| Paired-end reads are included. ||
|| Assign fragments (read pairs) to features... ||
|| ||
|| WARNING: reads from the same pair were found not adjacent to each ||
|| other in the input (due to read sorting by location or ||
|| reporting of multi-mapping read pairs). ||
|| ||
|| Read re-ordering is performed. ||
|| ||
|| Total fragments : 49756690 ||
|| Successfully assigned fragments : 36826889 (74.0%) ||
45661688 vs 49756690... where have these extra ~4 million reads come from? Also, aren't multimapped reads excluded from the .bam STAR output anyway so really it is 41983291 vs 49756690? Anyone any ideas?? Thanks!
By default maximum number of multiple alignments allowed for a read is set to 10 in
STAR
. If a read exceeds this number there is no alignment output. If you did not change the value of--outFilterMultimapNmax
then you should have reads that map 10 times or less in your data.Ok if I understand right then featureCounts is counting multi mapped fragments each as unique fragments (with different genomic co-ords) even though they are the same original read-pair? Would you recommend leaving the STAR default as 10 for aligning to the genome? On a side note I've also been reading a bit about mmquant, would you recommend using that instead of feature counts for assigning features? I have one dataset (single-end) that had 55% assigned reads to features with featureCount as many were lost due to multi-mapping. I'm new to all this so for me 55% seems very low?...