Question

Alignment read number differ between topHat2 and bwa

0

Entering edit mode

8.5 years ago

m.i.cosacak • 0

Hi,

I do not know if anyone has asked the similar question or has met the similar problem.

I have an RNA-seq (from Illumina, 75 bp) data. When I am aligning the reads with bwa to zebrafish genome and count the number of reads with featureCounts I got the following summary:

                            Sample-1
Assigned                    9265562
Unassigned_Ambiguity        147537
Unassigned_MultiMapping     0
Unassigned_NoFeatures       3884814
Unassigned_Unmapped         265481
Unassigned_MappingQuality   0
Unassigned_FragmentLength   0
Unassigned_Chimera          0
Unassigned_Secondary        0
Unassigned_Nonjunction      0
Unassigned_Duplicate        0

When I am aligning with topHat2 and count the number of reads with featureCounts I got the following summary:

Status    accepted_hits.bam
Assigned    197566
Unassigned_Ambiguity    3850
Unassigned_MultiMapping    114795
Unassigned_NoFeatures    54359
Unassigned_Unmapped    0
Unassigned_MappingQuality    0
Unassigned_FragmentLength    0
Unassigned_Chimera    0
Unassigned_Secondary    0
Unassigned_Nonjunction    0
Unassigned_Duplicate    0

The alignment summary in align_summary.txt is:

Reads:
          Input     :    344347
           Mapped   :    278609 (80.9% of input)
            of these:     22834 ( 8.2%) have multiple alignments (2 have >20)
80.9% overall read mapping rate.

Why do I have such a big difference between the alignments? Moreover, the topHat alignment also has the following summary in bowtie.left_kept_reads.log:

16839634 reads; of these:
  16839634 (100.00%) were unpaired; of these:
    5058477 (30.04%) aligned 0 times
    10330251 (61.34%) aligned exactly 1 time
    1450906 (8.62%) aligned >1 times
69.96% overall alignment rate

Thanks in advance

ilyas

bwa bowtie2 topHat alignment • 1.9k views

ADD COMMENT • link updated 20 months ago by Ram 43k • written 8.5 years ago by m.i.cosacak • 0

score 0 · Answer 1 · 2015-11-12

0

Entering edit mode

8.5 years ago

Antonio R. Franco ★ 5.1k

If you have used the GENOME as reference, this contains introns

BWA is not splicing aware

Tophat uses bowtie for an initial mapping, but after that, it split the unmapped reads and try to map them again in a second round, This means that Topohat is splicing aware..

ADD COMMENT • link 8.5 years ago by Antonio R. Franco ★ 5.1k