INTRO: I have genome assembly obtained from linked reads (10X Genomics) + ONT long reads. The initial assembly was done in Supernova, gaps filling and scaffolding were done in PBJelly. This was done by outsourcing, but the company did not provide any information about the gaps.
PROBLEM: The size of gaps (Ns) varies in the assembly from 10 to 100,000. I need to specify for the submission how the sizes of gaps were estimated, and what number stays for unknown gap size.
- Is there any general procedure/rule for estimation of gap size during this kind of assembly?
- I am especially wondering about gaps with rough numbers like 10, 100, 5000, 100000, etc. What these stand for? Do these represent known size or do they stand for the unknown gap size?
NOTE: Asking the company is not the best way, as this analysis was outsourced three years ago and the company does not communicate much smooth these days.
Thanks a lot in advance Milos