Question: Denovo Assembly/Scafolding Pipeline(S) For The Eukaryote Genome: Single End Illumina Reads
gravatar for Rm
8.5 years ago by
Danville, PA
Rm7.8k wrote:

I have Single End Illumina reads with read length around 76bp; and probably around 30X coverge for the Drosophila related species. After Quality filtering the Reads, I am trying denovo assembly with Abyss and IDBA assemblers, Got contigs with N50 length of around 800 to 1200 bp depending on the K-mer value used (25-60) (Is it terrible?). I am facing difficulty to build super-contigs (scaffolds) since I do not have Paired end information as they are SE. Suggest me with possible denovo assembly pipeline(s) to build scaffolds with SE data in hand.

Treat it as follow up of the previous Question assked in the Biostar

ADD COMMENTlink modified 8.4 years ago by Senthilkumar90 • written 8.5 years ago by Rm7.8k
gravatar for Bach
8.5 years ago by
Bach550 wrote:

Without paired-end you are sorry out of luck when it comes to reliable scaffolding. You simply cannot build "scaffolds" if your data is not paired-end.

What you can do is to place the contigs you have via simple MUMMER searches (or BLAST or MegaBLAST or even aligners if this seems appropriate to you) on some closely related species. Be very aware that this placement of contigs will absolutely not reflect the organism you sequenced as you will not be able to say with confidence that the order of contigs you get out of this placement is the one from your organism.

And regarding a N50 of ~1kb: it is terrible. One can use this kind of data to go on a gene fishing expedition in prokaryotes or perhaps even targeted sequence analysis in higher eukaryotes, but for everything else I think it's just junk (sorry to be so blunt).

ADD COMMENTlink written 8.5 years ago by Bach550

Thanks for your suggestions @Bach and @ketil; Actually SE reads were initially generated keeping in mind that we map them on to the D.melanogaster as reference. But now attempting to see if we can get deno assemble of it. But it looks like a very challenging task without PE/mate pairs.

ADD REPLYlink written 8.5 years ago by Rm7.8k
gravatar for Ketil
8.5 years ago by
Ketil3.9k wrote:

Of the de-Bruijn assemblers, I got best result with the commercial CLC - way better than SOAP or Abyss. De Bruijn is very sensitive to filtering, so make sure you aggressively remove low quality reads. You might also want to try out Celera, which is more difficult to get to run, but tends to give the best results in many cases.

ADD COMMENTlink written 8.5 years ago by Ketil3.9k
gravatar for lexnederbragt
8.5 years ago by
Oslo, Norway
lexnederbragt1.2k wrote:

Since you have a refernce genome (it seems) you could try MAIA:

Using this tool you could (in principle, no personal experience) use different assembly programs and 'merge' them. MAIA uses the reference to find beginning and endpoints in the merged 'contig graphs'.

ADD COMMENTlink written 8.5 years ago by lexnederbragt1.2k
gravatar for Yannick Wurm
8.5 years ago by
Yannick Wurm2.3k
Queen Mary University London
Yannick Wurm2.3k wrote:

I've had very good experience with SOAP denovo for a eukaryotic genome (fire ant with a ~500mb genome). It performed lightyears better than Abyss or Velvet. Obviously we had paired data. But I think you should give it a shot. (it's free) Have you removed duplicate reads?

Also, you could try reducing the complexity of your assembly by separating your dataset into subsets (maybe chromosomes or smaller regions that are syntenic between closely related Drosophilae). But give how cheap sequencing would be... it's probably not worth your time.

ADD COMMENTlink written 8.5 years ago by Yannick Wurm2.3k
gravatar for Senthilkumar
7.3 years ago by
Senthilkumar90 wrote:

hi guys , im new to this forum.

Could anyone help me abt the single end read assembly (SOAP de brujian )procedure and publication related to the same.

ADD COMMENTlink written 7.3 years ago by Senthilkumar90
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1182 users visited in the last hour