This is the first time I am trying to assemble diploid genome. And I need help to understand MASURCA output.
I run MASURCA for candida genome which is supposed to has around 15 Mb haploid genome size. MASURCA gave;
ESTIMATED_GENOME_SIZE.txt: 32 Mb
Ploidy.txt = 1
Total length = 15 Mb
How should I interpret the estimated genome size and total length results? And why did I get ploidy = 1. So is my genome diploid or haploid according to this result?
• 1.4k views
As the expected genome size seems okay, I wouldn't worry too much. Besides, this is a known issue, see this
masurca ticket on github
If the genome is very heterozygous, it looks to assembler as one genome with double the size, as opposed to two similar copies of the same genome.
You can confirm that easily by aligning the reads you used to assemble back to the assembly, you should be able to see at least some heterozygous differences from the alignment directly.
before adding your answer. Login
Traffic: 2671 users visited in the last hour
Thank you. I understood ploidy part.
My genome is supposed to belong to diploid Candida albicans. And genome size of diploid Candida albicans is around 29Mb. But MASURCA shows the total length around 15Mb. I am a bit confused. So MASURCA gives in statistics of only haploid genome of our sample? Or what do "total length" and "estimated genome size" stand for?
Usually assemblies support a haploid consensus, most C. albicans assemblies have a size of around 15 Mb.