How to determine the best assembly with number of N in the sequence and how to find Telomere and centromere markers
3 months ago
Théo • 0


I have 3 fasta files of fungi genome assembly of 3 different assembler tools and in my fasta files there are some N characters wich represent the lot of Transposable elements in my genomes.

And i wanted to choose the best assembly beetween the 3 files compared to the N.

Is there a rule for N's that says that the one with the least N's is the best ?

There is a cutoff value for the N ?

I have also an other question : What is the way to identify centromer and telomer if they are masked because all repeted regions are N ?

Do i need to check about Repeat Maskers options?

Thanks for your answers.

3 months ago
liorglic ▴ 870

There are several measures for the quality of an assembly, e.g. contiguity (N50, N90), total assembly size, BUSCO score, and also the % of Ns in the assembly. If all other stats are similar, then generally the assembly with lowest % Ns should be favored.
You can try a software like QUAST that will calculate many assembly stats for you, so you can easily compare your results.


