Hi,
I am looking for a better explanation for N50 in genome assembly. As per my understanding, N50 is the length of the contigs which covers 50% of genome. Am I right ?
Also, say for example, if I have 2 tools which give N50 as 500 and 1000 respectively, which of these would be a better tool ?
Thanks.

Your definition/understanding of N50 is somewhat correct indeed.

the way you calculate N50 is : you order your contigs large to small, then you start making the cumulative sum of the lengths of the contigs until you have >50% of your assembly , that number of contigs you need to get to 50% is the L50, and it's length is N50.

Intuitively one should go for the assembly with the highest N50 (1000 in this case), but N50 alone is not a good measure of performance, also total assembled size etc are of importance (NG50 might help here a little).

OP's definition of N50 is incorrect. N50 is the length of the shortest contig that together with all the contigs of the assembly that are the same length or longer than it cover 50% of the genome assembly

Thanks for your reply. I came across a definition which said N50 is the "weighted median statistic" - What does this mean ? Is N50 also described as number of contigs above the median contig ?

well in essence it is something like a weighted median stat indeed.

No, that would be L50 : the number of contigs representing 50% of the assembly (where N50 is the actual length of the L50 contig)

the way you calculate N50 is : you order your contigs large to small, then you start making the cumulative sum of the lengths of the contigs until you have >50% of your assembly , that number of contig you needed to get to >50% is the L50, and it's length is N50 .

Keep in mind that this is in reference to the actual assembly, not the estimated genome size (that would then be NG50 & LG50 )

An interesting read

What is Wrong with N50? How can we make it better?