I have a PacBio Hifi assembly (1.1 Gb) that was phased into two haplotypes. Some initial analyses like the dot plots revealed some structural differences between the two haplotypes. Now, I have sent the two haplotypes along with the original PacBio assembly to Dovetail genomics for scaffolding with Hi-C data. But they only provided me with the scaffolded assembly of the two haplotypes. When I asked them why they did not run the scaffolding for the original PacBio assembly, they said
if scaffolding with Hi-C data on the PacBio assembly is done, then it will have a high duplication relative to what is expected for a pseudo-haploid assembly. This will confound the scaffolding process, as the mapping quality will be very poor.
Because my ultimate goal is to perform a GWAS and genome-scan analyses, I wanted to have a single scaffolded assembly. They suggested concatenating two haplotype assemblies into one and then doing SNP calling and subsequence analysis.
My question is, how to concatenate two haplotype assemblies into one while there are structural differences between them? I actually did not find any previous studies doing that. Will it make any sense to do so? Please provide your feedback. Thank you.