Hello! I've mapped illumina paired-end reads to a reference genome using bwa mem2. When I Samtools flagstat the output.sam, it looks like the mapping was successful. It shows reads for read1 and read2, mapped reads and properly paired reads, but once I use Samtools view to convert this sam to a bam file, the flagstat loses all of read1 and only displays reads for read2. Is the issue coming from the mapping or is this because of something in samtools view? Thank you for any solutions or suggestions. I've been stuck here for a while, unfortunatley.
This is the command I ran to map:
bwa2 mem -t 30 -S -o NaB5.sam -R "@RG\tID:NaB5\tSM:NaB5\tPL:NovaSeqX\tLB:Tpase\tPU:NaB5\tDT:2024" $REF $FWD $REV
This is the samtools view command I ran to convert the sam to a bam file:
samtools view NaB5.sam --threads 30 --reference $REF -F 0x2564 -b -u -o NaB5_view.bam
Output from samtools flagstat of NaB5.sam:
8824404 + 0 in total (QC-passed reads + QC-failed reads)
77352872 + 0 primary
0 + 0 secondary
1471532 + 0 supplementary
0 + 0 duplicates
0 + 0 primary duplicates
78439918 + 0 mapped (99.51% : N/A)
76968386 + 0 primary mapped (99.50% : N/A)
77352872 + 0 paired in sequencing
38676436 + 0 read1
38676436 + 0 read2
66553836 + 0 properly paired (86.04% : N/A)
76940004 + 0 with itself and mate mapped
28382 + 0 singletons (0.04% : N/A)
8149260 + 0 with mate mapped to a different chr
596028 + 0 with mate mapped to a different chr (mapQ>=5)
Output from samtools flagstat of NaB5_view.bam:
19622212 + 0 in total (QC-passed reads + QC-failed reads)
19269948 + 0 primary
0 + 0 secondary
352264 + 0 supplementary
0 + 0 duplicates
0 + 0 primary duplicates
19622212 + 0 mapped (100.00% : N/A)
19269948 + 0 primary mapped (100.00% : N/A)
19269948 + 0 paired in sequencing
0 + 0 read1
19269948 + 0 read2
16658923 + 0 properly paired (86.45% : N/A)
19263149 + 0 with itself and mate mapped
6799 + 0 singletons (0.04% : N/A)
2045972 + 0 with mate mapped to a different chr
147809 + 0 with mate mapped to a different chr (mapQ>=5)
You may want to remove that property from the flag. I would also double check you want to filter out plus strand reads, i.e. "mate reverse strand" was also checked. This is not a usual filtering property.
The pipeline I'm using later uses fixmate. Is this supposed to ressolve the removal of read1 in samtools view or would it just be best to use -F 1316 in samtools view? This is my pipeline I'm using but when I ran this as is, my files were empty:
1) replacing 0x2564 with 1316 would solve your problem (but how about supplementary alignments ? see https://broadinstitute.github.io/picard/explain-flags.html )
2) samtools view between "bwa" and "samtools sort" is now useless
3) you could use "samtools collate' instead of 'samtools sort' before "fixmate'
4) you have 6 tools using '$CPUs' at the same time.
5) check you're using
set -o pipefail