Difference Between Mapped And Unmapped Reads
2
1
Entering edit mode
12.1 years ago
Ric ▴ 190

Hello,

I aligned paired-end files with BWA. BWA's sam file contain all reads, whether they hit the reference or not. With the following command I filtered only reads which hit the reference.

$ bwa sampe <reference.fasta> F3.sai F5.sai F3.fastq F5.fastq | \
awk '{if (substr($1,1,1)=="@") {print $0} else {if ($3!="*") {print $0}}}' > aln_hit_only.sam

Now, I am confused what mapped reads and unmapped reads mean, because of running

$ samtools idxstats aln_hit_only.bam

which give me the number for both mapped and unmapped reads numbers. I would expect that unmapped reads are reads which do not hit the reference.

What is the difference between mapped and unmapped reads?

Thank you in advance.

samtools bwa paired • 18k views
ADD COMMENT
7
Entering edit mode
12.1 years ago

Just a short comment:

To check whether the segment was mapped or not, can be checked much easier, since this information is saved in the bitwise FLAG (column 2 of a sam entry).

To get the number of all mapped entries:

samtools view -S -F0x4 aln_complete.sam | wc -l

To get the number of unmapped reads:

samtools view -S -f0x4 aln_complete.sam | wc -l
ADD COMMENT
0
Entering edit mode

More efficient: samtools view -c -F0x4 aln_complete.sam

ADD REPLY
1
Entering edit mode

Does it work with sam files? I think only with bam. But you're right. It's more efficient. One more comment here! For the mapped reads one should unique the reads mapped to multiple loci, otherwise the sum of mapped and unmapped might be greater than the number of all: "samtools view -S -F0x4 aln_complete.sam | cut -f1 | sort | uniq | wc -l"

ADD REPLY
1
Entering edit mode

More efficient? If you have already indexed the bam file, idxstats is far far more efficient. view has to scan through every read.

In answer to the original question: idxstats outputs "chorm chrom_size mapped_reads unmapped_reads". Unmapped reads who have a mate mapped are assigned to the same chrom. Unmapped reads with no mate or an unmapped mate are assigned to chrom "*"

ADD REPLY
4
Entering edit mode
12.1 years ago
Swbarnes2 ★ 1.6k

Unmapped reads are given the mapping coordinates of their mapped mate. It's in the samtools specs, and that's what bwa does. Feature, not bug.

So your awk statement won't do what you want it to do. You have to rely on the binary flag.

samtools view will filter a .bam based on the binary flag, so use that. Reads with a 4 techincally unmapped, regardless of any other info in the line, like a mapping coordinate, or a CIGAR string, etc.

Also, specific to bwa, if your read hangs off of one reference sequence onto another, it will be given an appropriate mapping position, based on where the read starts, but the unmapped flag will also be set. Feature, not bug.

ADD COMMENT

Login before adding your answer.

Traffic: 2524 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6