How To Tell How Well A Sam File Was Mapped
5
2
Entering edit mode
12.1 years ago
Applet ▴ 150

I'm comparing using STAMPY vs BWA to map some reads to a reference genome.

Now that I've generated two sam files how do I tell how well the mapping did?

bwa mapping sam samtools • 8.5k views
ADD COMMENT
4
Entering edit mode
12.1 years ago
Swbarnes2 ★ 1.6k

Convert the .sam to .bam with samtools view. .sams are huge. samtools flagstat will take a .bam, and tell you how many reads mapped total.

ADD COMMENT
1
Entering edit mode

I think that's probably true. You could use awk to filter out reads that have mapping quality of 0 from your .sam. I'm not sure that you'd expect a great deal of difference in how many reads have mapping quality of 0 between two aligners, since that's mostly about what your reference sequence is, and how long your reads are.

ADD REPLY
0
Entering edit mode

I fear samtools flagstat counts reads mapping multiple times not unique. Correct me, if I'm wrong.

ADD REPLY
0
Entering edit mode

Well...even a small difference that (might) happen, are differences, right? ;)

ADD REPLY
0
Entering edit mode

Well...even small differences that (might) happen, are differences, right? ;)

ADD REPLY
0
Entering edit mode

Quick Question: Does bam file need to be sorted in order to get the correct flagstat file?

ADD REPLY
0
Entering edit mode

The sorting does not have an effect on flagstat, I do not believe.

ADD REPLY
0
Entering edit mode

I would recommend you to check it. Run flagstat on both files and compare the results.

ADD REPLY
0
Entering edit mode

Just confirmed.BAM file does not need to be sorted for flagstat!

ADD REPLY
3
Entering edit mode
ADD COMMENT
2
Entering edit mode
12.1 years ago

e.g. you can count the mapped reads and use the percentage of mapped reads as an indicator of how well the mapping tool performed.

ADD COMMENT
2
Entering edit mode
12.1 years ago

In order to assure, you don't have an M somewhere else (read-ID, mitochondrial chromosome, ...), or even reads mapping to multiple positions in the genome (both would basically destroy your statistics), I recommend to do something like this:

perl -F'\t' -ane 'if($F[5]=~/M/){print "$F[0]\n";};' file.sam | sort | uniq | wc

short explanation: if the 6th column contains a 'M' (a match exists), print the first column (the read-ID). then sort and uniq the read-IDs, to assure you count multiple mapped reads only once. and then the word-count as ever.

ADD COMMENT
0
Entering edit mode
12.1 years ago

Use grep to find reads that match to the reference. Generally speaking, only lines with matches contain 'M' at position 6 in each line . Use cut -f 6 file.sam| grep M | wc -l

ADD COMMENT
0
Entering edit mode

In order to assure, you don't have an M somewhere else (read-ID, mitochondrial chromosome, ...), or even reads mapping to multiple positions in the genome (both would basically destroy your statistics), I recommend to do something like this:

perl -F't' -ane 'if($F[5]=~/M/){print "$F[0]n";};' file.sam | sort | uniq | wc -l

short explanation: if the 6th column contains a 'M' (a match exists), print the first column (the read-ID). then sort and uniq the read-IDs, to assure you count multiple mapped reads only once. and then the word-count as ever.

ADD REPLY
0
Entering edit mode

In order to assure, you don't have an M somewhere else (read-ID, mitochondrial chromosome, ...), or even reads mapping to multiple positions in the genome (both would basically destroy your statistics), I recommend to do something like this:

perl -F'\t' -ane 'if($F[5]=~/M/){print "$F[0]\n";};' file.sam | sort | uniq | wc -l

short explanation: if the 6th column contains a 'M' (a match exists), print the first column (the read-ID). then sort and uniq the read-IDs, to assure you count multiple mapped reads only once. and then the word-count as ever.

ADD REPLY
0
Entering edit mode

That doesn't work if you have an M somewhere else, e.g. in your read name

ADD REPLY
0
Entering edit mode

@Christof thanks. Edited the response

ADD REPLY

Login before adding your answer.

Traffic: 1907 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6