Question: Constructing reference-based draft genome
gravatar for ugurcabuk
7 months ago by
ugurcabuk130 wrote:

Hi all,

I have been dealing for a while about getting draft genome of bacteria. I have chosen a closely related bacteria to make reorder contigs and concatenate them using ABACAS. This closely related bacteria was selected based on phylogenetic tree result. However, the problem is that genome size of organism and size of assembly is different.

After I did this, I visualized circular genome of the species but the size of assembly normally increased and some gaps (NNNNs) occured. Therefore, I am a bit confused If I am on the right way. What do you think about it ? Do you think that this approach is true ? Or, since size of reference genome is bigger than assembly, I could remove gaps in the draft genome in the final step. Does it make sense ?


next-gen assembly genome • 241 views
ADD COMMENTlink written 7 months ago by ugurcabuk130

If you are missing sequence in your data then there is simply no way to create it. If you must have a closed circular genome you may need think about creating a net library or use a different technique (e.g. nanopore long reads) to retrieve the missing data. Have you run a program to estimate how complete your current assembly is?

ADD REPLYlink written 7 months ago by GenoMax94k

Thanks for quick reply, genomax. Right ! Maybe the safest approach is that get long reads to polish it. Unfortunately in my situation, I have to apply only bioinformatics approaches.

I evaluated my assembly using BUSCOs and I got good results over 99%. Can you explain creating a net library a bit? What do you mean ?

I just edited my comment: they are not the same species, just closely related to each other. So, I don't think I missed sequences in my data or assembly. I expect a new species, but since I had many contigs, I couldn't find a way to get draft genome on the assembly, except this approach.

ADD REPLYlink modified 7 months ago • written 7 months ago by ugurcabuk130

If BUSCO analysis indicates a relatively complete genome then you could go forward with what you have, if you are not able to close the genome completely. Bioinformatics approaches are only as good as the data at hand and sounds like you have got the most out of the data already.

ADD REPLYlink written 7 months ago by GenoMax94k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2394 users visited in the last hour