De novo assembly
2
1
Entering edit mode
6.2 years ago

I am interested in de novo assembly of illumina reads belong to an insect with a genome size about 300Mbp.Can anyone help me with the assembler program that I should use? Any manual?

assembly sequencing • 2.5k views
ADD COMMENT
1
Entering edit mode

Best de novo assembler for insect genome ?

Minia is supposed to be a good choice if you have access to limited compute resources.

ADD REPLY
0
Entering edit mode

Please provide some more details on your data: coverage, is it genomic reads after all, PE insert size etc.?

ADD REPLY
0
Entering edit mode

Hi

When you what to perform assembly few parameters should be considered 1) what genome library (SE/PE/Mate-pair) 2)Insert size 3) Read length 4) Which sequencing platform 5)Quality of your raw reads.

In case if you have low coverage data you go for an assembler which works well for low coverage data.

You can go for popular K-mer construction deburjin graph based assemblers velvet, SOAPdenovo which are very popular and robust softwares. Gives you better N50 statistic.

All are command line, simple to use.

ADD REPLY
0
Entering edit mode

Thank you very much. I did not receive the data yet. So, I don't know the error rate. Reads are 150bp PE and 100X coverage. Illumina non human HiSeq platform.

ADD REPLY
2
Entering edit mode
6.2 years ago
flo24 ▴ 20

You usually need to run a few different assemblers and see what works best with your data. If you have 2x150bp reads from a single PCR-free library based on gel-free fragment selection, you could try DISCOVAR de novo. Although the DDN authors recommend 250 base reads, reads as short as 150 bases may work. Other options might be SPAdes, SGA, ABySS 2, Meraculous2, and MaSuRCA.

In my experience you can get medium-sized insect genome assemblies with good gene content and contiguity by correcting reads with BFC, assembling contigs with SPAdes (turning off its error correction module, BayesHammer), scaffolding with SGA, and fixing errors with Pilon. If your insect genome is actually much larger than 300Mbp, using SPAdes is probably not a good idea. Platanus is another option, specially if the genome is highly heterozygous, although in my experience you get very poor results with a single paired-end library; you would need reads from at least one mate-pair library. ALLPATHS-LG is another alternative if the paired-end reads overlap and you have at least one mate-pair library. If perhaps you can sequence long reads, you could try a hybrid assembly with SPAdes or other assemblers, too.

ADD COMMENT
0
Entering edit mode
6.2 years ago
brs1111 ▴ 10

spades (http://bioinf.spbau.ru/spades) might be a better option

ADD COMMENT

Login before adding your answer.

Traffic: 2571 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6