Hello,
I aligned paired-end files with BWA. BWA's sam file contain all reads, whether they hit the reference or not. With the following command I filtered only reads which hit the reference.
$ bwa sampe <reference.fasta> F3.sai F5.sai F3.fastq F5.fastq | \
awk '{if (substr($1,1,1)=="@") {print $0} else {if ($3!="*") {print $0}}}' > aln_hit_only.sam
Now, I am confused what mapped reads and unmapped reads mean, because of running
$ samtools idxstats aln_hit_only.bam
which give me the number for both mapped and unmapped reads numbers. I would expect that unmapped reads are reads which do not hit the reference.
What is the difference between mapped and unmapped reads?
Thank you in advance.
More efficient:
samtools view -c -F0x4 aln_complete.sam
Does it work with sam files? I think only with bam. But you're right. It's more efficient. One more comment here! For the mapped reads one should unique the reads mapped to multiple loci, otherwise the sum of mapped and unmapped might be greater than the number of all: "samtools view -S -F0x4 aln_complete.sam | cut -f1 | sort | uniq | wc -l"
More efficient? If you have already indexed the bam file, idxstats is far far more efficient. view has to scan through every read.
In answer to the original question: idxstats outputs "chorm chrom_size mapped_reads unmapped_reads". Unmapped reads who have a mate mapped are assigned to the same chrom. Unmapped reads with no mate or an unmapped mate are assigned to chrom "*"