How to fix long read from ONT : ERROR: Invalid char while reading fastq.gz file
1
0
Entering edit mode
17 months ago
ben@f ▴ 10

Hi biostars,

I was assembling an eukaryote genome of 1.1Gb of length using the flye assembler and a set of long reads produced by the nanopore tech (ONT).

First, I could run the flye assembler with one fastq file out of the other six files that I got ( Those ONT raw reads are from the same individual but with different run libraries). and because of the lower coverage (around 10X) per file. Then I cat all fastq files into one.

Then, I tried to perform de novo assembly again with flye using the generated file with all reads: it seems that my reads are truncated or contain unknown characters by flye.

Command used to run the flye pipline: flye -t 40 -g 1.1g --nano-raw tlongsreads.fastq.gz -o flye_ont_60x

NB: I tried other assemblers (canu and wtdbg2), and all work fine with the same generated file.

This is the log error from the flye assembler:

> INFO: Starting Flye 2.9.1-b1780
> INFO: >>>STAGE: configure
> INFO: Configuring run
> WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
> ERROR: Invalid char while reading /path/to/ont/fly_run1_ont/tlongsreads.fastq.gz
> ERROR: Pipeline aborted

I searched the GitHub repo of the flye assembler issue, but I could not figure out how to fix this problem.

Is there a way that I can fix the reads raw error: such as deleting empty space or removing those Invalid char.

Any guidance or support is very appreciated.

Thank in advance

long-reads assembly Fastq • 1.1k views
ADD COMMENT
1
Entering edit mode
17 months ago

Try

  • gunzipping and gzipping the file again (any errors ?)
  • Using various greps to find non-ACTG characters in the gunzipped fastq
  • check the end of the file - most likely to be problematic if truncated - using tail -n 100 x.fastq
  • check each sequence line has a quality line of the same length
ADD COMMENT
1
Entering edit mode

You could also convert the fastq file to fasta and try assembling again

ADD REPLY
0
Entering edit mode

colindaven, thank you so much for the quick reply. I tried your suggestion. But one of the original raw reads was corrupted. Since I have sufficient coverage of more than 43X, I just removed it from the concatenated file, and the assembler works fine.

ADD REPLY

Login before adding your answer.

Traffic: 3120 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6