I currently have ONT R10.4.1 simplex reads corrected by Dorado, and I am considering which aligner would be more suitable — minimap2 or winnowmap.
I plan to perform the following analyses after mapping:
ONT Read phasing, using an existing VCF generated from short reads (from the same sample) and the hg38 reference.
Assembly polishing.
Structural variant calling.
I know that minimap2 already provides the lr:hq and lr:hqae presets specifically designed for ONT R10.4.1 data. On the other hand, winnowmap currently seems to lack equivalent parameters to reproduce the lr:hq preset environment (issue for lr:hq preset discussion).
However, I’m not sure whether minimap2’s latest version has improved performance in repetitive regions compared to previous releases.
I’d really appreciate hearing others’ experiences or opinions:
Which aligner would you recommend for ONT R10.4.1 Dorado-corrected reads, especially for downstream phasing and SV detection?
Thank you very much for your time and suggestions!
Good summary. I never tried winnowmap despite it's apparent advantages. If you have really repeat rich regions like centromeres that it may be useful. However my gut feeling is that ONT sequencing has moved on again considerably in the last 2 years so the error model of Winnowmap might not be appropriate any more. Perhaps ask the authors ? Certainly, the more modern minimap presets would appear to be beneficial for dealing with the latest ONT data.
If this is just one alignment though, then why not do both and let the community know about your experiences ?
I’m currently trying to perform alignments using both aligners, but I wanted to ask if anyone has conducted a similar systematic comparison before.
Actually, I’m not quite sure about the best way to evaluate the performance differences between aligners. I was thinking of comparing metrics such as the NM tag, samtools flagstat, or MAPQ values.
Could you please give me some advice or suggestions on how to properly compare the results?
Just a very brief summary of the results using the latest versions of minimap2 and winnowmap (by samtools stats).
It seems that winnowmap produces fewer secondary and supplementary alignments, as well as fewer mismatches overall.
Any suggestions or thoughts on how to interpret this difference?
Minimap2 (minimap2 -ax lr:hq -t 96)
6877353 0 total (QC-passed reads + QC-failed reads)
4904554 0 primary
1109654 0 secondary
863145 0 supplementary
0 0 duplicates
0 0 primary duplicates
6876784 0 mapped
99.99% N/A mapped %
4903985 0 primary mapped
99.99% N/A primary mapped %
0 0 paired in sequencing
0 0 read1
0 0 read2
0 0 properly paired
N/A N/A properly paired %
0 0 with itself and mate mapped
0 0 singletons
N/A N/A singletons %
0 0 with mate mapped to a different chr
0 0 with mate mapped to a different chr (mapQ>=5)
# This file was produced by samtools stats (1.21+htslib-1.21) and can be plotted using plot-bamstats
# This file contains statistics for all reads.
# The command line was: stats -F 0x900 -@ 64 preprocessed_correct_minimap2.bam
# CHK, Checksum [2]Read Names [3]Sequences [4]Qualities
# CHK, CRC32 of reads which passed filtering followed by addition (32bit overflow)
CHK 85705490 7aed57ce ed2579dd
# Summary Numbers. Use `grep ^SN | cut -f 2-` to extract this part.
SN raw total sequences: 6877353 # excluding supplementary and secondary reads
SN filtered sequences: 1972799
SN sequences: 4904554
SN is sorted: 1
SN 1st fragments: 4904554
SN last fragments: 0
SN reads mapped: 4903985
SN reads mapped and paired: 0 # paired-end technology bit set + both mates mapped
SN reads unmapped: 569
SN reads properly paired: 0 # proper-pair bit set
SN reads paired: 0 # paired-end technology bit set
SN reads duplicated: 0 # PCR or optical duplicate bit set
SN reads MQ0: 37300 # mapped and MQ=0
SN reads QC failed: 0
SN non-primary alignments: 0
SN supplementary alignments: 0
SN total length: 181205124161 # ignores clipping
SN total first fragment length: 181205124161 # ignores clipping
SN total last fragment length: 0 # ignores clipping
SN bases mapped: 181204951593 # ignores clipping
SN bases mapped (cigar): 177052477463 # more accurate
SN bases trimmed: 0
SN bases duplicated: 0
SN mismatches: 1820182430 # from NM fields
SN error rate: 1.028047e-02 # mismatches / bases mapped (cigar)
SN average length: 36946
SN average first fragment length: 36946
SN average last fragment length: 0
SN maximum length: 417259
SN maximum first fragment length: 417259
SN maximum last fragment length: 0
SN average quality: 255.0
SN insert size average: 0.0
SN insert size standard deviation: 0.0
SN inward oriented pairs: 0
SN outward oriented pairs: 0
SN pairs with other orientation: 0
SN pairs on different chromosomes: 0
SN percentage of properly paired reads (%): 0.0
6116874 0 total (QC-passed reads + QC-failed reads)
4904554 0 primary
583606 0 secondary
628714 0 supplementary
0 0 duplicates
0 0 primary duplicates
6116058 0 mapped
99.99% N/A mapped %
4903738 0 primary mapped
99.98% N/A primary mapped %
0 0 paired in sequencing
0 0 read1
0 0 read2
0 0 properly paired
N/A N/A properly paired %
0 0 with itself and mate mapped
0 0 singletons
N/A N/A singletons %
0 0 with mate mapped to a different chr
0 0 with mate mapped to a different chr (mapQ>=5)
# This file was produced by samtools stats (1.21+htslib-1.21) and can be plotted using plot-bamstats
# This file contains statistics for all reads.
# The command line was: stats -F 0x900 -@ 36 preprocessed_correct_winnowmap.bam
# CHK, Checksum [2]Read Names [3]Sequences [4]Qualities
# CHK, CRC32 of reads which passed filtering followed by addition (32bit overflow)
CHK 85705490 9c76eb9f ed2579dd
# Summary Numbers. Use `grep ^SN | cut -f 2-` to extract this part.
SN raw total sequences: 6116874 # excluding supplementary and secondary reads
SN filtered sequences: 1212320
SN sequences: 4904554
SN is sorted: 1
SN 1st fragments: 4904554
SN last fragments: 0
SN reads mapped: 4903738
SN reads mapped and paired: 0 # paired-end technology bit set + both mates mapped
SN reads unmapped: 816
SN reads properly paired: 0 # proper-pair bit set
SN reads paired: 0 # paired-end technology bit set
SN reads duplicated: 0 # PCR or optical duplicate bit set
SN reads MQ0: 35758 # mapped and MQ=0
SN reads QC failed: 0
SN non-primary alignments: 0
SN supplementary alignments: 0
SN total length: 181205124161 # ignores clipping
SN total first fragment length: 181205124161 # ignores clipping
SN total last fragment length: 0 # ignores clipping
SN bases mapped: 181205022091 # ignores clipping
SN bases mapped (cigar): 177279689107 # more accurate
SN bases trimmed: 0
SN bases duplicated: 0
SN mismatches: 1558339665 # from NM fields
SN error rate: 8.790289e-03 # mismatches / bases mapped (cigar)
SN average length: 36946
SN average first fragment length: 36946
SN average last fragment length: 0
SN maximum length: 417259
SN maximum first fragment length: 417259
SN maximum last fragment length: 0
SN average quality: 255.0
SN insert size average: 0.0
SN insert size standard deviation: 0.0
SN inward oriented pairs: 0
SN outward oriented pairs: 0
SN pairs with other orientation: 0
SN pairs on different chromosomes: 0
SN percentage of properly paired reads (%): 0.0
Good summary. I never tried winnowmap despite it's apparent advantages. If you have really repeat rich regions like centromeres that it may be useful. However my gut feeling is that ONT sequencing has moved on again considerably in the last 2 years so the error model of Winnowmap might not be appropriate any more. Perhaps ask the authors ? Certainly, the more modern minimap presets would appear to be beneficial for dealing with the latest ONT data.
If this is just one alignment though, then why not do both and let the community know about your experiences ?
Thank you for your reply.
I’m currently trying to perform alignments using both aligners, but I wanted to ask if anyone has conducted a similar systematic comparison before.
Actually, I’m not quite sure about the best way to evaluate the performance differences between aligners. I was thinking of comparing metrics such as the NM tag,
samtools flagstat
, or MAPQ values.Could you please give me some advice or suggestions on how to properly compare the results?