Hi! Thank you for taking the time to read this post 😊 I am currently in the process of aligning my trimmed reads to a reference genome. For that, I am using bwa alignment. After the alignment was completed, I ran flagstats (samtools) to get a sense of the % of alignment to the reference genome. This is the command line I used for that:
bwa mem -P -M Gac-HiC_revised_genome_assembly.fa MUI004_output_forward_paired.fq MUI004_output_reverse_paired.fq > bwa_alignment_P_MUI004.sam
And when running flagstats, this was my output
52248219 + 0 in total (QC-passed reads + QC-failed reads) 1105309 + 0 secondary 0 + 0 supplementary 0 + 0 duplicates 27363819 + 0 mapped (52.37% : N/A) 51142910 + 0 paired in sequencing 25571455 + 0 read1 25571455 + 0 read2 0 + 0 properly paired (0.00% : N/A) 20761748 + 0 with itself and mate mapped 5496762 + 0 singletons (10.75% : N/A) 3606028 + 0 with mate mapped to a different chr 1105347 + 0 with mate mapped to a different chr (mapQ>=5)*
I saw that other people here have posted about this 0.0% properly paired issue, and they suggested to remove -P out of the command line. Once I removed P, the new output was this:
52248219 + 0 in total (QC-passed reads + QC-failed reads) 1105309 + 0 secondary 0 + 0 supplementary 0 + 0 duplicates 37815854 + 0 mapped (72.38% : N/A) 51142910 + 0 paired in sequencing 25571455 + 0 read1 25571455 + 0 read2 32281574 + 0 properly paired (63.12% : N/A) 34482372 + 0 with itself and mate mapped 2228173 + 0 singletons (4.36% : N/A) 1994818 + 0 with mate mapped to a different chr 939713 + 0 with mate mapped to a different chr (mapQ>=5)*
My questions are:
- What does properly paired means in this context?
- Why removing “-P” caused a change from 0 to 63.12% ?
- Are these values “good”? I am aligning trimmed DNA reads from one species to the reference genome of a second species (although they are closely related)… should my percentage of properly aligned paired then be higher? Or my results are within what is expected for a situation such as this one? Is there a command that I could use to increase the number of properly aligned pairs?
Beforehand, I appreciate your help in this regard. I am new to using shell commands and bioinformatics, so any help is greatly appreciated!