Question: Denovo genome assembly
1
gravatar for pezhmansafdari
16 months ago by
Finland
pezhmansafdari20 wrote:

Hi,

I have been trying to assemble a genome that, based on flow cytometry, should be between 1.2 to 1.5 GB. I have Illumina (more than 100x), Pacbio (50x) 10x and Hic reads. The bests assembly that we have got so far is 450 MB and 93 percent complete. It seems that the assembly collapses despite having all the newest sequencing technology data. We have tried CANU, MASURCA, FALCON, SuperNova, and MINIassem. I should mention that the flow cytometry suggests that the genome is a diploid but bioinformatic analysis suggests both diploid and tetraploid. So, my question is, does anyone have any idea why the genome assembly collapses and is there any assembly software available which can handle all these different types of reads and perform better assembly which I might not be aware of? Any suggestions and directions are greatly appreciated.

All the bests, Pezhman Safdari

assembly genome • 469 views
ADD COMMENTlink modified 16 months ago by Sam20 • written 16 months ago by pezhmansafdari20

Genome sizing by flow citometry is an approximation as ploidy. But also, the assembly would depend on the genome complexity (at sequence level), even if you have 100X in all the technologies, definitely it is not a guaranty to obtaining a complete genome sequence.

ADD REPLYlink modified 16 months ago • written 16 months ago by Buffo1.8k

polyploidy will certainly influence your assembly result. Have you checked (did you analyse the data to see if it might indeed be polyploid and if so, how did you do it?).

An interesting and quite straightforward approach to estimate genome size (and even polyploidy to some extent) is to make those Kmer-frequency plots. A useful website for this is genomescope .

If it turns out to be polyploid, go an have a look in the literature to see how other people have tackled this (eg. cotton genome, soybean, wheat, .... ). I now there is also specific software around to assembly highly heterozygous genomes, I think platanus is one of them and Falcon-unzip (from the top of my head to be honest)

ADD REPLYlink written 16 months ago by lieven.sterck9.4k

What do you mean by 93 percent complete (BUSCO complete? BUSCO complete + fragmented)?
One of the best approach nowadays is pacbio+Hi-C with Falcon-phase. Then you can correct with illumina. Otherwise did you try scaffolding you pacbio assembly with your 10X data?

ADD REPLYlink modified 16 months ago • written 16 months ago by Juke344.9k
0
gravatar for Sam
16 months ago by
Sam20
canada
Sam20 wrote:

Have you tried these combinations correct and PacBio reads with CANU Hybrid assembly ( corrected pacbio + illumina) using Masurca polish with pilon if necessary

There is also a concept called meta-assembly using the different assembly from multiple assemblers to creat best one follow the link for more info https://www.nature.com/articles/s41588-018-0110-3#Sec3

ADD COMMENTlink written 16 months ago by Sam20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2328 users visited in the last hour