DiscovarDenovo - fastq files should be interlaced
2
0
Entering edit mode
7.2 years ago
Kenny ▴ 30

Hi all,

I have an illumina TrueSeq long read fastq file and I would like to run DiscovarDenovo to perform genome assembly.

Here's my code:

DiscovarDeNovo READS=TSSLR-BDA11_LongRead.fastq.gz OUT_DIR=./TSSLR_DISCOVAR

But I got the message, saying:

The file 
TSSLR-BDA11_LongRead.fastq.gz
should be interlaced and hence have an even number of entries.  It does not.

Wondering why?

Kenny

Assembly discovar fastq interlaced • 2.8k views
ADD COMMENT
3
Entering edit mode
7.2 years ago
maxwhjohn1988 ▴ 130

Sounds like you have an odd number of reads in your input fastq file. Discovar De Novo is designed to work with paired-end data (with a specific read length) - you should have an even number of reads if you've got paired-end reads. I'm not too familiar with TruSeq but the file name with "LongRead" in it sounds like it might be an indication that Discovar De Novo isn't the best tool for this assembly.

Did you start with paired-end reads, and trim them in some way? That could possibly have resulted in removal of some mates, which would leave you with an odd number of reads (it shouldn't happen but some trimming algorithms do this unless you tell them not to). I often hear that Discovar De Novo works best with totally un-trimmed reads - adapter sequences, low-quality bases, short reads, a lot of people have told me to leave them in, so if you have trimmed your reads in some way, maybe try doing an assembly with the raw reads and see if you get the same error.

Maybe irrelevant if you only have the one read file, but just because I think it's a cool tool - SeqTK is also able to do interleaving of fastq files. https://github.com/lh3/seqtk . I'm not advocating its use instead of the BBMap suite - BBMap is awesome and fantastic and I rate it very highly indeed, just throwing some love to SeqTK ;)

ADD COMMENT
0
Entering edit mode

I think when you provide Discovar De Novo with a single read file, it expects that the only reason you would do this is because the file in question is an interleaved read file.

ADD REPLY
0
Entering edit mode

You're right. My fastq file has odd number of reads :(

zcat TSSLR-BDA11_LongRead.fastq.gz | echo $((`wc -l`/4))
198625
ADD REPLY
1
Entering edit mode
7.2 years ago

You need to interleave your R1 and R2 files.

Try bbmap for this.

 reformat.sh | grep inter
 Description:  Reformats reads to change ASCII quality encoding, interleaving, file format, or compression format.
 **If input is paired and there is only one output file, it will be written interleaved.**
 int=f                   (interleaved) Determines whether INPUT file is considered interleaved.
 verifyinterleaved=f     (vint) sets 'vpair' to true and 'interleaved' to true.
 addslash=f              Append ' /1' and ' /2' to read names, if not already present.  Please include the flag 'int=t' if the reads are interleaved.
 addcolon=f              Append ' 1:' and ' 2:' to read names, if not already present.  Please include the flag 'int=t' if the reads are interleaved.
 ihist=<file>            Insert size histograms.  Requires paired reads interleaved in sam file.
ADD COMMENT

Login before adding your answer.

Traffic: 1605 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6