SNP calling ONT sequenced files
3
0
Entering edit mode
9 days ago
blur ▴ 280

Hi,

I have ONT files (fastq) and want to read SNPs on them. I have read some papers saying ONTs are too error-prone - is it still true?

Also, I have tried doing this on CLC workbench and couldn't - the fastq files were too big. The company support could not offer help. My sequenced isolates are bacterial - 3-6 million base genomes, so the large chromosomal contig is usually 3-6 million bases long.

Is there a good and reliable tool for ONT SNPs reading? Would it make sense to break the contigs into smaller portions or will that hurt the reliability of the results?

Thanks in advance

ONT SNPs • 576 views
ADD COMMENT
0
Entering edit mode

What is the median length of raw reads and how many do you have? Has the data been basecalled with "high" or "super" accuracy?

I have tried doing this on CLC workbench and couldn't - the fastq files were too big

If you have excessive coverage (> 50x), you could subsample the fastq data and see if CLC is able to work with that.

ADD REPLY
3
Entering edit mode
9 days ago
cfos4698 ★ 1.2k

My personal preference is using clair3. I'm unsure if CLC didn't work because your compute resources are too limited, or if there are restraints placed on CLC workbench itself (e.g. RAM limiting). In any case, you'll likely need to use the command line for clair3 if you use it regardless. A simple workflow that you could expand on would be to QC/trim the reads with something like nanoq, map them to your chosen reference genome with minimap2, then call variants with clair3.

ADD COMMENT
1
Entering edit mode
9 days ago

The quality of (modern, SUP basecalled, but certainly not fast basecalled) ONT sequences should be fine, especially for bacterial SNP calling these days.

Have a look at deepvariant (resource-intensive) or longshot (simple) via bioconda to get started with SNP calling on long reads.

ADD COMMENT
0
Entering edit mode
10 hours ago
Момчил ▴ 10

Yes, one can say that ONT reads do still have higher error rates in comparison to Illumina short reads, but the situation has improved significantly in recent years. You can check the following resources:

https://nanoporetech.com/news/news-new-nanopore-sequencing-chemistry-developers-hands-set-deliver-q20-99-raw-read

https://nanoporetech.com/platform/accuracy

https://pmc.ncbi.nlm.nih.gov/articles/PMC11594029

So, if your sequencing data is recent (produced in 2024–25), you may use it for SNP calling. This should be done carefully, of course, as calling some very low frequency variants can still be challenging.

Please accept this part of my comment as a bioinformatics enthusiast.

Regarding the issue with CLC software, my best guess is you were trying to run a workflow/pipeline that contains a tool which is optimized for short Illumina reads, and the ONT reads exceeded its limits, therefore got rejected.

Please be informed there are other CLC tools optimized for long ONT and PacBio reads for steps like importing, read mapping, structural variant calling (variants of length greater than 35bp), and others.

For variant calling of SNPs in particular, if the mapping is produced by the dedicated tools, the existing variant caller can be used. The parameters of the variant callers would need to be adjusted to reflect the coverage profile each sequencing technology brings.

Please accept this part of my comment as a member of the QIAGEN Digital Insights Support team. If you wish to discuss further SNP calling with ONT data using CLC, please write to ts-bioinformatics@qiagen.com. We will be happy to help.

ADD COMMENT

Login before adding your answer.

Traffic: 4145 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6