Question: How to evaluate sequence alignment (e.g. number of indels) of ONT data after data preprocessing
0
gravatar for BCArg
4 months ago by
BCArg60
UAntwerpen
BCArg60 wrote:

We are sequencing a bacterial genome with a Gridion machine from ONT. As already expected, the error rate was quite high and I noticed lots of insertions and deletions compared to the reference genome.

Although I reckon the sequencing and the mapping to the reference genome both went well, I was wondering if 'polishing' the fastq files could improve the mapping stats. For instance, I checked in this post that the quality of the first 40-50 nucleotides in the reads tend to be low. Also I wanted to evaluate if/ to which extent selecting reads of of higher quality (e.g. 12 (phred scale), which is actually the median of reads quality) would enhance alignment.

I am now wondering how could I evaluate the mapping of the reads after polishing/ filtering the fastq files described above. I initially checked the alignment with tablet, but I am more after a quantitative (other than visual) assessment.

So far I have used samtools flagstat which gave me:

7014141 + 0 in total (QC-passed reads + QC-failed reads)
781294 + 0 secondary
409167 + 0 supplementary
0 + 0 duplicates
6836429 + 0 mapped (97.47% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

I guess the percentage of mapped reads (97.47%) can be useful, but it is already a high mapping rate and I am not really expecting that trimming the first nucleotides will increase the mapping rate (please correct me in case I am interpreting the samtools flagstat output wrongly).

I also found a tool called Qualimap, though it appears to be computationally very expensive and the command line tool does not appear to work on linux.

Has anyone already carried out this analysis i.e. how can one assess the improvement in the mapping to the reference genome after polishing the fastq files?

Sequence alignment was done with minimap2 and indexing, sorting with samtools

sequencing next-gen alignment • 175 views
ADD COMMENTlink modified 3 months ago by colindaven1.8k • written 4 months ago by BCArg60

I have found this tool called MUMmer which has a function called dnadiff. I think I can then export the consensus sequence from the alignment using the 'raw' reads and that of the 'polished reads' and compare each of them to the reference in order to check if there was an improvement. Any suggestion?

ADD REPLYlink modified 4 months ago • written 4 months ago by BCArg60
0
gravatar for colindaven
3 months ago by
colindaven1.8k
Hannover Medical School
colindaven1.8k wrote:
  • qualimap might not be modified and appropriate for ONT data
  • If you want to easily correct the ONT reads do an assembly with Canu, it will output corrected reads as part of the assembly process if all goes well
  • 97% is already a great alignment rate
  • mummer is good but does not deliver SAM output AFAIK
  • there is a new tool called FiltLong which might help you filter reads if you like
  • if you want more stats on the alignment try

    samtools stats
    

then

multiqc

to get a nice report

ADD COMMENTlink modified 3 months ago • written 3 months ago by colindaven1.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1715 users visited in the last hour