Genome assembly N50
2
1
Entering edit mode
5.9 years ago

Hi, I am looking for a better explanation for N50 in genome assembly. As per my understanding, N50 is the length of the contigs which covers 50% of genome. Am I right ? Also, say for example, if I have 2 tools which give N50 as 500 and 1000 respectively, which of these would be a better tool ? Thanks.

Assembly assembly tools N50 • 14k views
ADD COMMENT
1
Entering edit mode
ADD REPLY
4
Entering edit mode
5.9 years ago

Your definition/understanding of N50 is somewhat correct indeed.

the way you calculate N50 is : you order your contigs large to small, then you start making the cumulative sum of the lengths of the contigs until you have >50% of your assembly , that number of contigs you need to get to 50% is the L50, and it's length is N50.

Intuitively one should go for the assembly with the highest N50 (1000 in this case), but N50 alone is not a good measure of performance, also total assembled size etc are of importance (NG50 might help here a little).

ADD COMMENT
3
Entering edit mode

OP's definition of N50 is incorrect. N50 is the length of the shortest contig that together with all the contigs of the assembly that are the same length or longer than it cover 50% of the genome assembly

ADD REPLY
0
Entering edit mode

indeed, an assumption I made that might not have been totally clear, so I elaborated on it

ADD REPLY
0
Entering edit mode

Thanks for your reply. I came across a definition which said N50 is the "weighted median statistic" - What does this mean ? Is N50 also described as number of contigs above the median contig ?

ADD REPLY
1
Entering edit mode

well in essence it is something like a weighted median stat indeed.

No, that would be L50 : the number of contigs representing 50% of the assembly (where N50 is the actual length of the L50 contig)

the way you calculate N50 is : you order your contigs large to small, then you start making the cumulative sum of the lengths of the contigs until you have >50% of your assembly , that number of contig you needed to get to >50% is the L50, and it's length is N50 .

Keep in mind that this is in reference to the actual assembly, not the estimated genome size (that would then be NG50 & LG50 )

ADD REPLY
0
Entering edit mode

No contig is L50. Wikipedia puts it well:

L50 count is defined as the smallest number of contigs whose length sum produces N50

ADD REPLY
2
Entering edit mode
ADD COMMENT

Login before adding your answer.

Traffic: 2485 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6