I am reading the output from samtools idxstats. From the website for samtools, it says "The output is TAB delimited with each line consisting of reference sequence name, sequence length, # mapped reads and # unmapped reads."
My input to samtools is a bam file, that I generated from a sam file, produced by bwa, which wasa used to align some reads to the reference genome.
I am trying to understand why a chromosome can have an unmapped read i.e. how is it that a read can be unmapped and yet assigned to a chromosome?
A few reasons. For one, bwa concatenates all the references sequences together before aligning. So if a read hangs off of one sequence onto the next, it's given the appropriate mapping position, and the unmapped flag is also set, as a sign that something is off about the alignment.
Second, SAM specs call for unmapped reads to be given the chromososme and position of their mapped partner. This is so that when you sort the reads by chromosome and position, the unmapped read sorts next to its mapped mate. Again, the 4 flag tells you that the read really is unmapped. SAM specs say that if the 4 flag is set, you can't believe chromosome, positions, CIGAR strings, mapping quality, or anything else in the .sam entry.
“a read hangs off of one sequence onto the next”;;what's meaning of this sentence?
a read that cover part of chr1(for a example) and chr2(for a example)? is thus read exist???how is it