Denovo Assembly/Scafolding Pipeline(S) For The Eukaryote Genome: Single End Illumina Reads
4
4
Entering edit mode
11.4 years ago
Rm 8.1k

I have Single End Illumina reads with read length around 76bp; and probably around 30X coverge for the Drosophila related species. After Quality filtering the Reads, I am trying denovo assembly with Abyss and IDBA assemblers, Got contigs with N50 length of around 800 to 1200 bp depending on the K-mer value used (25-60) (Is it terrible?). I am facing difficulty to build super-contigs (scaffolds) since I do not have Paired end information as they are SE. Suggest me with possible denovo assembly pipeline(s) to build scaffolds with SE data in hand.

Treat it as follow up of the previous Question assked in the Biostar

scaffolding read assembly next-gen sequencing • 6.3k views
ADD COMMENT
0
Entering edit mode

hi guys , im new to this forum.

Could anyone help me abt the single end read assembly (SOAP de brujian )procedure and publication related to the same.

ADD REPLY
2
Entering edit mode
11.4 years ago
Bach ▴ 550

Without paired-end you are sorry out of luck when it comes to reliable scaffolding. You simply cannot build "scaffolds" if your data is not paired-end.

What you can do is to place the contigs you have via simple MUMMER searches (or BLAST or MegaBLAST or even aligners if this seems appropriate to you) on some closely related species. Be very aware that this placement of contigs will absolutely not reflect the organism you sequenced as you will not be able to say with confidence that the order of contigs you get out of this placement is the one from your organism.

And regarding a N50 of ~1kb: it is terrible. One can use this kind of data to go on a gene fishing expedition in prokaryotes or perhaps even targeted sequence analysis in higher eukaryotes, but for everything else I think it's just junk (sorry to be so blunt).

ADD COMMENT
0
Entering edit mode

Thanks for your suggestions @Bach and @ketil; Actually SE reads were initially generated keeping in mind that we map them on to the D.melanogaster as reference. But now attempting to see if we can get deno assemble of it. But it looks like a very challenging task without PE/mate pairs.

ADD REPLY
1
Entering edit mode
11.4 years ago
Ketil 4.1k

Of the de-Bruijn assemblers, I got best result with the commercial CLC - way better than SOAP or Abyss. De Bruijn is very sensitive to filtering, so make sure you aggressively remove low quality reads. You might also want to try out Celera, which is more difficult to get to run, but tends to give the best results in many cases.

ADD COMMENT
1
Entering edit mode
11.4 years ago
lexnederbragt ★ 1.3k

Since you have a refernce genome (it seems) you could try MAIA: http://bioinformatics.oxfordjournals.org/content/26/18/i433.short

Using this tool you could (in principle, no personal experience) use different assembly programs and 'merge' them. MAIA uses the reference to find beginning and endpoints in the merged 'contig graphs'.

ADD COMMENT
1
Entering edit mode
11.4 years ago
Yannick Wurm ★ 2.4k

I've had very good experience with SOAP denovo for a eukaryotic genome (fire ant with a ~500mb genome). It performed lightyears better than Abyss or Velvet. Obviously we had paired data. But I think you should give it a shot. (it's free) Have you removed duplicate reads?

Also, you could try reducing the complexity of your assembly by separating your dataset into subsets (maybe chromosomes or smaller regions that are syntenic between closely related Drosophilae). But give how cheap sequencing would be... it's probably not worth your time.

ADD COMMENT

Login before adding your answer.

Traffic: 1178 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6