Following is my strange situation: I assembled genomes from same sample (haploid source) with two different methods. Assembly size of two methods are the following:
Method1 = 900 Mb (5400 contigs >10kb)
Method2 = 500 Mb (7000 contigs >10kb)
I suspected duplication in Method 1 and checked for completeness with BUSCO. Surprisingly both the methods gave similar completeness values with no diploid in Method1. Hence, I am highly curious to know where the extra 400 Mb is coming from. For this, I am trying to align the sequences and visualize them. But due to large file size almost most of the methods are failing. For instance, I tried
minidot - error at installation level after repeated attempts
LASTZ alignment -> maf -> aliTV. It fails in the alignment step itself
mummer/nucmer --> the given length exceeds allowed limit (I am using 64-bit version, still fails)
LAST generates around 300 GB of MAF file, which is not readable by any downstream application
Gepard - hangs !
I feel like hitting the dead-end. Kindly let me know, how to handle this situation. I am very curious to know where this extra seqs are from !.
Thanks in advance.