I assemble diploid fungal genome (illumina PE 100bp reads, coverage in range 200-300x). I believe, the size of this genome is ~13 Mb, but assembly I got always is between 22-24Mb. I've used Velvet, SOAPdenovo using multiple parameters sets. Interestingly, when scaffolds are aligned against another from the same assembly, you will find around 1/3 of the genome aligns with 80-90% identity. We even sequenced additional insert size library, but results are similar.
How to decide, whether this scaffolds are duplicated or heterozygous allels?