TopHat alignment on Trinity Transcripts giving a lot of unmapped results.
0
0
Entering edit mode
6.8 years ago
kanika.151 ▴ 130

I had run DeNovo Assembly on my data using Trinity. The Assembled .fasta file was then aligned by using TopHat without giving an annotation file. The amount of unmapped.bam is more than expected as it ranges anywhere from 100 MB to 600 MB for different conditions. I have 6 different conditions and I have paired-end data.

My question is it normal to get such high number of unmapped reads?

One of the align_summary.txt:

Left reads:
Input     :  12382431
Mapped   :  11331326 (91.5% of input)
of these:  10276265 (90.7%) have multiple alignments (312919 have >20)
Input     :  12382431
Mapped   :  11346906 (91.6% of input)
of these:  10290863 (90.7%) have multiple alignments (312928 have >20)

Aligned pairs:  11146003
of these:  10125490 (90.8%) have multiple alignments
65963 ( 0.6%) are discordant alignments
89.5% concordant pair alignment rate.


Should I be concerned?

Unmapped tophat Trinity • 1.9k views
0
Entering edit mode

You should not look at the file size. Check what percentage of reads are unmapped. From the align_summary, 91% of reads mapped back to the assembled transcriptome.

Note: As you are aligning the data to transcriptome, which might have multiple transcripts assembled for same gene (redundancy), so you get more multi mapped reads.

0
Entering edit mode

As Trinity assemblies results in a factor of 3 in my case. I was expecting that some of it will be unmapped but 15-20% of the data is not aligned that raised some flags.

0
Entering edit mode

There could be better ways but I would just BLAST few of the unmapped reads and see what are they.