Question: Smaller Assembled Genome Size Than Expected
1
gravatar for Rahul Sharma
7.1 years ago by
Rahul Sharma560
Germany and India
Rahul Sharma560 wrote:

Dear all, I am doing an assembly of 40 Mb genome with expected coverage of 181x. I am using Illumina reads 76bp length with insert size 200 bp (Sd 20 bp). I have tried velvet for these assemblies and 86-99% of reads were used in this assembly with N50 of 80kb (with k-mer's 21,55,2). But the strange thing is that I am getting only 19 Mb genome after all assemblies. The whole genome has been covered during the library preparations. What could be the possible reason behind this? Is this due to repeat elements, as some of my NODE's covered more than 5000x? I would appreciate your suggestions.

Thanks in advance Rahul

ADD COMMENTlink modified 5.2 years ago by Adrian Pelin2.2k • written 7.1 years ago by Rahul Sharma560

I think there's a good chance that you have an over-coverage of some elements. try maybe reducing the files you assemble (e.g. from 181X to 40X) see if you get the same results. also, check this: http://www.illumina.com/Documents/products/technotes/technote_denovo_assembly_ecoli.pdf

ADD REPLYlink written 7.1 years ago by Schrodinger'S Cat210
6
gravatar for Casey Bergman
7.1 years ago by
Casey Bergman18k
Athens, GA, USA
Casey Bergman18k wrote:

Yes, collapsed repeats can lead to a smaller than expected assembly size. See Myers et al (2000) for a good discussion on how to detect collapsed repeat contigs. If this is the case then you have a very repetitive genome on your hands.

Also, have you confirmed that your observed sequencing throughput is compatible with your expected throughput? You can do this by reference mapping against a single copy locus that was isolated previously from your species of interest. If the library/sequencing was poor, you may have a lower coverage than you think which could lead to a partial assembly, although in the range you are talking about this seems unlikely.

ADD COMMENTlink written 7.1 years ago by Casey Bergman18k

Many thanks for your valuable comments, I will do some analysis and will get back again :)

ADD REPLYlink written 7.1 years ago by Rahul Sharma560
4
gravatar for Francois Olivier Hébert
7.1 years ago by
Quebec
Francois Olivier Hébert280 wrote:

It is possible indeed. It is strange that you successfully assemble so many reads and you get such a small genome size. Have you tried to BLAST your "un-assembled reads" against a database containing only repeated elements (e.g repbase)?

It also depends on how you obtained your reads... maybe the whole genome isn't in your sample, because even if there is a lot of repeated elements in the genome, they should be there in multiple copies. You wouldn't assemble almost 100% of the reads. A whole bunch of reads very similar among them wouldn't assemble.

ADD COMMENTlink written 7.1 years ago by Francois Olivier Hébert280

mank thanks Francois for your valuable comments. I will do some analysis and get back soon.

ADD REPLYlink written 7.1 years ago by Rahul Sharma560
1
gravatar for Ahdf-Lell-Kocks
7.1 years ago by
Ahdf-Lell-Kocks1.6k
Ahdf-Lell-Kocks1.6k wrote:

Many assemblers don't do well with repetitive regions and collapse them up, which can lead to smaller assemblies than the expected genome size.

ADD COMMENTlink written 7.1 years ago by Ahdf-Lell-Kocks1.6k
0
gravatar for Adrian Pelin
5.2 years ago by
Adrian Pelin2.2k
Canada
Adrian Pelin2.2k wrote:

I suggest trying spades assembler. It permits the usage of multiple kmers, merging all kmers into a final assembly.

Some regions will benefit from lower kmers, others from higher kmers. Try k=23,33,43,54,65

Adrian

ADD COMMENTlink written 5.2 years ago by Adrian Pelin2.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2497 users visited in the last hour