Question: Optimal minimum read length PE HiSeq reads for denovo assembly
0
gravatar for Lesley Sitter
4.2 years ago by
Lesley Sitter460
Netherlands
Lesley Sitter460 wrote:

Hi everyone,

I have a pair ended HiSeq dataset obtained from a sort of RADseq experiment which i want to use to do a denovo assembly with. Now i have trimmed the reads using TrimGalore, but this left me with a bunch of reads of size 20 (the default minimum read length in TrimGalore). 

I don't know if the reads are too short to actually use downstream. These small reads lower the optimal kmer estimations and will probably reduce quality of the assembly. I was wondering if there is some sort of method to decide the optimal minimum read length.

Thanks in advance,

With kind regards,
Lesley

ADD COMMENTlink modified 4.2 years ago by dylan.storey60 • written 4.2 years ago by Lesley Sitter460
1
gravatar for dylan.storey
4.2 years ago by
dylan.storey60
United States
dylan.storey60 wrote:

If most of your reads are that short , your assembly is going to be bad no question about it. Seeing as it is RADSeq data though I don't think you're overly interested in getting long contigs. I would suggest running your assembler with multiple parameters to test for the configuration that works best. Something like this should get you started(I'm assuming you're working on a Linux/Unix distribution with a bash like shell):

 

for k in {10..100}; do ./My_assembler --kmer_setting $k --output_directory_setting $k/; done;

 

Changing the integers in the curly brackets ({low...high}) will allow you to adjust the range. 

Then you simply need to figure out how you want to assess your outputs for quality and pick the best one.  

ADD COMMENTlink written 4.2 years ago by dylan.storey60

Thanks!

I hoped that there was some sort of tool for it, but this works just as well! 

Muchos Gracias :D

ADD REPLYlink written 4.1 years ago by Lesley Sitter460
0
gravatar for Brian Bushnell
4.2 years ago by
Walnut Creek, USA
Brian Bushnell16k wrote:

Reads shorter than than the kmer length used in assembly will be ignored, at least for contig-building.  They could be used for scaffolding.   I would throw away reads shorter than about 35bp if you are going to do scaffolding, to reduce spurious joins in low-complexity or repeat areas.

ADD COMMENTlink written 4.2 years ago by Brian Bushnell16k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1458 users visited in the last hour