Question: Samtools Idxstats
gravatar for KCC
7.4 years ago by
Cambridge, MA
KCC3.9k wrote:

I am reading the output from samtools idxstats. From the website for samtools, it says "The output is TAB delimited with each line consisting of reference sequence name, sequence length, # mapped reads and # unmapped reads."

My input to samtools is a bam file, that I generated from a sam file, produced by bwa, which wasa used to align some reads to the reference genome.

I am trying to understand why a chromosome can have an unmapped read i.e. how is it that a read can be unmapped and yet assigned to a chromosome?

samtools • 14k views
ADD COMMENTlink written 7.4 years ago by KCC3.9k
gravatar for Swbarnes2
7.4 years ago by
Swbarnes21.4k wrote:

A few reasons. For one, bwa concatenates all the references sequences together before aligning. So if a read hangs off of one sequence onto the next, it's given the appropriate mapping position, and the unmapped flag is also set, as a sign that something is off about the alignment.

Second, SAM specs call for unmapped reads to be given the chromososme and position of their mapped partner. This is so that when you sort the reads by chromosome and position, the unmapped read sorts next to its mapped mate. Again, the 4 flag tells you that the read really is unmapped. SAM specs say that if the 4 flag is set, you can't believe chromosome, positions, CIGAR strings, mapping quality, or anything else in the .sam entry.

ADD COMMENTlink written 7.4 years ago by Swbarnes21.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 641 users visited in the last hour