Question: How to evaluate sequence alignment (e.g. number of indels) of ONT data after data preprocessing
gravatar for BCArg
14 months ago by
BCArg60 wrote:

We are sequencing a bacterial genome with a Gridion machine from ONT. As already expected, the error rate was quite high and I noticed lots of insertions and deletions compared to the reference genome.

Although I reckon the sequencing and the mapping to the reference genome both went well, I was wondering if 'polishing' the fastq files could improve the mapping stats. For instance, I checked in this post that the quality of the first 40-50 nucleotides in the reads tend to be low. Also I wanted to evaluate if/ to which extent selecting reads of of higher quality (e.g. 12 (phred scale), which is actually the median of reads quality) would enhance alignment.

I am now wondering how could I evaluate the mapping of the reads after polishing/ filtering the fastq files described above. I initially checked the alignment with tablet, but I am more after a quantitative (other than visual) assessment.

So far I have used samtools flagstat which gave me:

7014141 + 0 in total (QC-passed reads + QC-failed reads)
781294 + 0 secondary
409167 + 0 supplementary
0 + 0 duplicates
6836429 + 0 mapped (97.47% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

I guess the percentage of mapped reads (97.47%) can be useful, but it is already a high mapping rate and I am not really expecting that trimming the first nucleotides will increase the mapping rate (please correct me in case I am interpreting the samtools flagstat output wrongly).

I also found a tool called Qualimap, though it appears to be computationally very expensive and the command line tool does not appear to work on linux.

Has anyone already carried out this analysis i.e. how can one assess the improvement in the mapping to the reference genome after polishing the fastq files?

Sequence alignment was done with minimap2 and indexing, sorting with samtools

sequencing next-gen alignment • 388 views
ADD COMMENTlink modified 14 months ago by colindaven2.3k • written 14 months ago by BCArg60

I have found this tool called MUMmer which has a function called dnadiff. I think I can then export the consensus sequence from the alignment using the 'raw' reads and that of the 'polished reads' and compare each of them to the reference in order to check if there was an improvement. Any suggestion?

ADD REPLYlink modified 14 months ago • written 14 months ago by BCArg60
gravatar for colindaven
14 months ago by
Hannover Medical School
colindaven2.3k wrote:
  • qualimap might not be modified and appropriate for ONT data
  • If you want to easily correct the ONT reads do an assembly with Canu, it will output corrected reads as part of the assembly process if all goes well
  • 97% is already a great alignment rate
  • mummer is good but does not deliver SAM output AFAIK
  • there is a new tool called FiltLong which might help you filter reads if you like
  • if you want more stats on the alignment try

    samtools stats



to get a nice report

ADD COMMENTlink modified 14 months ago • written 14 months ago by colindaven2.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 915 users visited in the last hour