Question: Genome assembly N50
1
gravatar for Inquisitive8995
15 months ago by
Inquisitive8995130 wrote:

Hi, I am looking for a better explanation for N50 in genome assembly. As per my understanding, N50 is the length of the contigs which covers 50% of genome. Am I right ? Also, say for example, if I have 2 tools which give N50 as 500 and 1000 respectively, which of these would be a better tool ? Thanks.

assembly tools n50 assembly • 1.8k views
ADD COMMENTlink modified 15 months ago by lieven.sterck5.8k • written 15 months ago by Inquisitive8995130
1

An interesting read

What is Wrong with N50? How can we make it better?

ADD REPLYlink written 15 months ago by lakhujanivijay4.4k
2
gravatar for lieven.sterck
15 months ago by
lieven.sterck5.8k
VIB, Ghent, Belgium
lieven.sterck5.8k wrote:

Your definition/understanding of N50 is somewhat correct indeed.

the way you calculate N50 is : you order your contigs large to small, then you start making the cumulative sum of the lengths of the contigs until you have >50% of your assembly , that contig is the L50, and it's length is N50.

Intuitively one should go for the assembly with the highest N50 (1000 in this case), but N50 alone is not a good measure of performance, also total assembled size etc are of importance (NG50 might help here a little).

ADD COMMENTlink modified 15 months ago • written 15 months ago by lieven.sterck5.8k
1

OP's definition of N50 is incorrect. N50 is the length of the shortest contig that together with all the contigs of the assembly that are the same length or longer than it cover 50% of the genome assembly

ADD REPLYlink modified 15 months ago • written 15 months ago by 5heikki8.5k

indeed, an assumption I made that might not have been totally clear, so I elaborated on it

ADD REPLYlink written 15 months ago by lieven.sterck5.8k

Thanks for your reply. I came across a definition which said N50 is the "weighted median statistic" - What does this mean ? Is N50 also described as number of contigs above the median contig ?

ADD REPLYlink written 15 months ago by Inquisitive8995130
1

well in essence it is something like a weighted median stat indeed.

No, that would be L50 : the number of contigs representing 50% of the assembly (where N50 is the actual length of the L50 contig)

the way you calculate N50 is : you order your contigs large to small, then you start making the cumulative sum of the lengths of the contigs until you have >50% of your assembly , that contig is the L50, and it's length is N50 .

Keep in mind that this is in reference to the actual assembly, not the estimated genome size (that would then be NG50 & LG50 )

ADD REPLYlink modified 15 months ago • written 15 months ago by lieven.sterck5.8k

No contig is L50. Wikipedia puts it well:

L50 count is defined as the smallest number of contigs whose length sum produces N50

ADD REPLYlink written 15 months ago by 5heikki8.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1950 users visited in the last hour