Same alignment but with different output on tophat2, any ideas?
1
0
Entering edit mode
8.7 years ago
tiago211287 ★ 1.4k

A year ago I started my bioinformatics learning from scratch and 0 background with unix systems. By then, I performed some alignment with tophat2 and rna-seq data from Mus musculus heart.

Two or three months ago, by accident I erase all my home in my cluster account (did a very stupid move, creating a file named $HOME, and you can imagine what happened).

I had the backup only of the raw reads, and not from alignments, all data files containing what I did was erased also, and I had nothing on physical paper. The only thing that survived was the samstat outputs from the alignments.

I got angry but not, super angry, because would took only 16 hours to align everything again, but then, I started to become pretty confused, because the output was different, as you can see on the too samstat statistical output file, the MAPQ were different now.

Using the most recent genome or the same I used makes no difference.

I am not sure the exactly tophat command line I used, but pretty much I am using the same thing, with 0 mismatches, the same Gtf file, using --coverage search, the same genome and everything else on default.

but what call my attention was the strange samstat output in the Base Quality Distributions.(EDIT: As you can see on the first plot, my first alignment got 95% with MAPQ > 30 in samstat, but only 85% on the second)

In the second plot, the (Base quality distributions), are the same for all bases on the first alignment, and completely different from the second.)

These two samstat output were generated using the same accepted_hits.bam.

Anyone has any idea about what can be happening?

[1] old x new alignment samstat output

http://s28.postimg.org/wunusnl71/Untitled.png

[2] old x new alignment samstat output

http://s16.postimg.org/rqs9id1cl/Untitled2.png

Thank you

sam samstat RNA-Seq alignment • 1.8k views
ADD COMMENT
0
Entering edit mode

you have two plots generated with different tools neither of which os particularly informative plus they display different things.

you need to spend more time formulating your question to be other than "strange output" what does that even mean?

ADD REPLY
0
Entering edit mode

I made an edit on the post trying to be more clear. The two plots were generated using samstat, on the same accepted_hits.bam. I am just trying to figure out why all bases has the same MAPQ number on the second plot. I need to reproduce this error so I can move on.

ADD REPLY
1
Entering edit mode
8.7 years ago
h.mon 35k

You are not repeating what you've done before.

Check the last line of your first figure: the total number of reads differ between the two files.

P.S.: you should post your $HOME mistake here, I've done something similar in the past but didn't have the guts to assume in public.

ADD COMMENT
0
Entering edit mode

I think, because now, a considerable number of reads are going to unmapped.bam, what did not happened before.

Thank you for you suggestion. I just did this.

ADD REPLY
0
Entering edit mode

Also, Just saw that the axis scale are different, What may causing this effect. But! can you explain to me why there is reads unmapped on the accepted_hits.bam?

ADD REPLY
0
Entering edit mode

If you do not have the exact commands executed before, and (maybe?) if you are using a different version of TopHat / Bowtie, you should not overthink the issue - too many unknowns. In any case, seems like watching the lecture mentioned in this post is a good idea.

ADD REPLY
0
Entering edit mode

Thank you for this lecture. Yes, I don't have my original commands and yes, maybe I am using a different version of tophat/bowtie.

ADD REPLY

Login before adding your answer.

Traffic: 2557 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6