Question: Best strategy for de novo assembly with illumina reads ?
3
gravatar for Picasa
3.1 years ago by
Picasa390
Picasa390 wrote:

Hello,

I have paired end and mate pair reads from illumina. My expected genome size is about 1 GB. I'm not a bioinformatician and I'm trying to figure out how to assemble my data.

1) Do you have any recommendation for a software ?

2) For paired end data, is it worth to merge it with a tier software or is it done with the assembler ?

assembly • 4.2k views
ADD COMMENTlink modified 3.0 years ago by Shyam130 • written 3.1 years ago by Picasa390

I used Spades and I've had good results with it.

ADD REPLYlink written 3.0 years ago by midox220

SPAdes is great, but is designed for bacterial assemblies, which is probably not the case here based on the genome size.

ADD REPLYlink written 3.0 years ago by igor7.6k

I am trying to use SPAdes with a bacterial genome, also having Paired-end and Mate-pair, cannot find out how to adapt the Mate-pair reads since SPAdes only accepts "high quality reads" (also having this problem with IotTorrent mate-pairs. How are you doing this? Thanks

ADD REPLYlink written 3.0 years ago by alexandra.auz20
7
gravatar for igor
3.1 years ago by
igor7.6k
United States
igor7.6k wrote:

There was a big project Assemblathon that published a thorough review of different assemblers: http://gigascience.biomedcentral.com/articles/10.1186/2047-217X-2-10

They used three different species with 1.0-1.6 Gb genomes, so it's especially relevant in your case.

ADD COMMENTlink written 3.1 years ago by igor7.6k

Thanks for the paper.

ADD REPLYlink written 3.1 years ago by Picasa390
1
gravatar for Buffo
3.1 years ago by
Buffo1.5k
Buffo1.5k wrote:

Trinity or IDBA_UD works really good for illumina reads.

ADD COMMENTlink written 3.1 years ago by Buffo1.5k
1
gravatar for onspotproductions
3.1 years ago by
United States
onspotproductions130 wrote:

I have used the BBmap toolset to get some initial information in paired-end data and then use those parameters to run the data through trinity to produce the final assembly. I would also recommend running it through TransDecoder to find ORFs.

ADD COMMENTlink written 3.1 years ago by onspotproductions130
1
gravatar for Shyam
3.0 years ago by
Shyam130
United States
Shyam130 wrote:

You can use Abyss assembly program using both mate pair and paired end reads. Recent wheat genome survey sequences were assembled using it. You can also use SOAPdenovo. Have seen assemblies of 4Gb assembled with it. You need to try different k-mer assemblies to get the sweet spot for your data.

2) For paired end data, is it worth to merge it with a tier software or is it done with the assembler ?

You mean merging the forward and reverse reads by overlap. The two programs I mentioned takes two separate files for forward and reverse reads. You dont need to merge them.

ADD COMMENTlink written 3.0 years ago by Shyam130
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1706 users visited in the last hour