Question

Optimal minimum read length PE HiSeq reads for denovo assembly

0

Entering edit mode

10.2 years ago

Lesley Sitter ▴ 610

Hi everyone,

I have a pair ended HiSeq dataset obtained from a sort of RADseq experiment which I want to use to do a denovo assembly with. Now I have trimmed the reads using TrimGalore, but this left me with a bunch of reads of size 20 (the default minimum read length in TrimGalore).

I don't know if the reads are too short to actually use downstream. These small reads lower the optimal kmer estimations and will probably reduce quality of the assembly. I was wondering if there is some sort of method to decide the optimal minimum read length.

Thanks in advance

With kind regards,
Lesley

denovo-assembly PE-reads HiSeq read-length • 3.0k views

ADD COMMENT • link updated 2.4 years ago by Ram 45k • written 10.2 years ago by Lesley Sitter ▴ 610

Ram · Answer 1 · 2015-05-01

1

Entering edit mode

10.2 years ago

dylan.storey ▴ 60

If most of your reads are that short , your assembly is going to be bad no question about it. Seeing as it is RADSeq data though I don't think you're overly interested in getting long contigs. I would suggest running your assembler with multiple parameters to test for the configuration that works best. Something like this should get you started(I'm assuming you're working on a Linux/Unix distribution with a bash like shell):

for k in {10..100}; do ./My_assembler --kmer_setting $k --output_directory_setting $k/; done;

Changing the integers in the curly brackets ({low...high}) will allow you to adjust the range.

Then you simply need to figure out how you want to assess your outputs for quality and pick the best one.

ADD COMMENT • link updated 2.4 years ago by Ram 45k • written 10.2 years ago by dylan.storey ▴ 60

0

Entering edit mode

Thanks!

I hoped that there was some sort of tool for it, but this works just as well!

Muchos Gracias :D

ADD REPLY • link updated 2.4 years ago by Ram 45k • written 10.2 years ago by Lesley Sitter ▴ 610

Ram · Answer 2 · 2015-05-01

0

Entering edit mode

10.2 years ago

Brian Bushnell 20k

Reads shorter than than the kmer length used in assembly will be ignored, at least for contig-building. They could be used for scaffolding. I would throw away reads shorter than about 35bp if you are going to do scaffolding, to reduce spurious joins in low-complexity or repeat areas.

ADD COMMENT • link updated 2.4 years ago by Ram 45k • written 10.2 years ago by Brian Bushnell 20k