2
0
Entering edit mode
7.1 years ago
Lesley Sitter ▴ 580

Hi everyone,

I have a pair ended HiSeq dataset obtained from a sort of RADseq experiment which i want to use to do a denovo assembly with. Now i have trimmed the reads using TrimGalore, but this left me with a bunch of reads of size 20 (the default minimum read length in TrimGalore).

I don't know if the reads are too short to actually use downstream. These small reads lower the optimal kmer estimations and will probably reduce quality of the assembly. I was wondering if there is some sort of method to decide the optimal minimum read length.

With kind regards,
Lesley

1
Entering edit mode
7.1 years ago
dylan.storey ▴ 60

If most of your reads are that short , your assembly is going to be bad no question about it. Seeing as it is RADSeq data though I don't think you're overly interested in getting long contigs. I would suggest running your assembler with multiple parameters to test for the configuration that works best. Something like this should get you started(I'm assuming you're working on a Linux/Unix distribution with a bash like shell):

for k in {10..100}; do ./My_assembler --kmer_setting $k --output_directory_setting$k/; done;

Changing the integers in the curly brackets ({low...high}) will allow you to adjust the range.

Then you simply need to figure out how you want to assess your outputs for quality and pick the best one.

0
Entering edit mode

Thanks!

I hoped that there was some sort of tool for it, but this works just as well!

Muchos Gracias :D

0
Entering edit mode
7.1 years ago

Reads shorter than than the kmer length used in assembly will be ignored, at least for contig-building.  They could be used for scaffolding.   I would throw away reads shorter than about 35bp if you are going to do scaffolding, to reduce spurious joins in low-complexity or repeat areas.