Hi! Thank you for taking the time to read this post 😊 I am currently in the process of aligning my trimmed reads to a reference genome. For that, I am using bwa alignment. After the alignment was completed, I ran flagstats (samtools) to get a sense of the % of alignment to the reference genome. This is the command line I used for that:
bwa mem -P -M Gac-HiC_revised_genome_assembly.fa MUI004_output_forward_paired.fq MUI004_output_reverse_paired.fq > bwa_alignment_P_MUI004.sam
And when running flagstats, this was my output
52248219 + 0 in total (QC-passed reads + QC-failed reads)
1105309 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
27363819 + 0 mapped (52.37% : N/A)
51142910 + 0 paired in sequencing
25571455 + 0 read1
25571455 + 0 read2
0 + 0 properly paired (0.00% : N/A)
20761748 + 0 with itself and mate mapped
5496762 + 0 singletons (10.75% : N/A)
3606028 + 0 with mate mapped to a different chr
1105347 + 0 with mate mapped to a different chr (mapQ>=5)*
I saw that other people here have posted about this 0.0% properly paired issue, and they suggested to remove -P out of the command line. Once I removed P, the new output was this:
52248219 + 0 in total (QC-passed reads + QC-failed reads)
1105309 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
37815854 + 0 mapped (72.38% : N/A)
51142910 + 0 paired in sequencing
25571455 + 0 read1
25571455 + 0 read2
32281574 + 0 properly paired (63.12% : N/A)
34482372 + 0 with itself and mate mapped
2228173 + 0 singletons (4.36% : N/A)
1994818 + 0 with mate mapped to a different chr
939713 + 0 with mate mapped to a different chr (mapQ>=5)*
My questions are:
- What does properly paired means in this context?
- Why removing “-P” caused a change from 0 to 63.12% ?
- Are these values “good”? I am aligning trimmed DNA reads from one species to the reference genome of a second species (although they are closely related)… should my percentage of properly aligned paired then be higher? Or my results are within what is expected for a situation such as this one? Is there a command that I could use to increase the number of properly aligned pairs?
Beforehand, I appreciate your help in this regard. I am new to using shell commands and bioinformatics, so any help is greatly appreciated!
I did trim the reads using Trimmomatic and It kept over 95% of the pairs, so I don't know if this is due to the trimming. More over, as I mentioned in the second portion of my post, once that I removed -P from the bwa parameters, it did change from 0% to 63.12% of pairs properly aligned ... I guess that a fundamental part of all of this is that I don't quite understand what "properly paired" means in this context
95% of the pairs might mean that your read1 and read2 files are out of sync. Though if lots of your reasd are properly paired, that's likely not your problem. But I think ATpoint has the right answer here.