I am looking for a better explanation for N50 in genome assembly. As per my understanding, N50 is the length of the contigs which covers 50% of genome. Am I right ?
Also, say for example, if I have 2 tools which give N50 as 500 and 1000 respectively, which of these would be a better tool ?
Your definition/understanding of N50 is somewhat correct indeed.
the way you calculate N50 is : you order your contigs large to small, then you start making the cumulative sum of the lengths of the contigs until you have >50% of your assembly , that number of contigs you need to get to 50% is the L50, and it's length is N50.
Intuitively one should go for the assembly with the highest N50 (1000 in this case), but N50 alone is not a good measure of performance, also total assembled size etc are of importance (NG50 might help here a little).