Hi,
I do not know if anyone has asked the similar question or has met the similar problem.
I have an RNA-seq (from Illumina, 75 bp) data. When I am aligning the reads with bwa to zebrafish genome and count the number of reads with featureCounts I got the following summary:
Sample-1
Assigned 9265562
Unassigned_Ambiguity 147537
Unassigned_MultiMapping 0
Unassigned_NoFeatures 3884814
Unassigned_Unmapped 265481
Unassigned_MappingQuality 0
Unassigned_FragmentLength 0
Unassigned_Chimera 0
Unassigned_Secondary 0
Unassigned_Nonjunction 0
Unassigned_Duplicate 0
When I am aligning with topHat2 and count the number of reads with featureCounts I got the following summary:
Status accepted_hits.bam
Assigned 197566
Unassigned_Ambiguity 3850
Unassigned_MultiMapping 114795
Unassigned_NoFeatures 54359
Unassigned_Unmapped 0
Unassigned_MappingQuality 0
Unassigned_FragmentLength 0
Unassigned_Chimera 0
Unassigned_Secondary 0
Unassigned_Nonjunction 0
Unassigned_Duplicate 0
The alignment summary in align_summary.txt
is:
Reads:
Input : 344347
Mapped : 278609 (80.9% of input)
of these: 22834 ( 8.2%) have multiple alignments (2 have >20)
80.9% overall read mapping rate.
Why do I have such a big difference between the alignments? Moreover, the topHat alignment also has the following summary in bowtie.left_kept_reads.log
:
16839634 reads; of these:
16839634 (100.00%) were unpaired; of these:
5058477 (30.04%) aligned 0 times
10330251 (61.34%) aligned exactly 1 time
1450906 (8.62%) aligned >1 times
69.96% overall alignment rate
Thanks in advance
ilyas