This is the first time I am trying to assemble diploid genome. And I need help to understand MASURCA output.
I run MASURCA for candida genome which is supposed to has around 15 Mb haploid genome size. MASURCA gave;
ESTIMATED_GENOME_SIZE.txt: 32 Mb
Ploidy.txt = 1
Total length = 15 Mb
How should I interpret the estimated genome size and total length results? And why did I get ploidy = 1. So is my genome diploid or haploid according to this result?
As the expected genome size seems okay, I wouldn't worry too much. Besides, this is a known issue, see this
masurca ticket on github
If the genome is very heterozygous, it looks to assembler as one genome with double the size, as opposed to two similar copies of the same genome.
You can confirm that easily by aligning the reads you used to assemble back to the assembly, you should be able to see at least some heterozygous differences from the alignment directly.
Traffic: 2469 users visited in the last hour