How To Interpret Abyss Stats
3
1
Entering edit mode
11.7 years ago
lin.barnum ▴ 230

I ran abyss on a dataset and obtained the following stats. What do the various columns mean?

n        n:200    n:N50    min    N80 N50    N20      max       sum
8540100    682221    169533    200    288    513    1064    24662    313.4e6    a-unitigs.fa
7186547    447932    89602    200    512    984    2259    688516    352.9e6    a-contigs.fa
7126108    387493    71262    200    584    1208    2810    1552973    352.1e6    a-scaffolds.fa

n is the total number of sequences in the fasta file, this I verified, N50, N80 etc are the N50 lengths and so on but what is the 'sum' column indicating and what is n:N50?

assembly • 6.7k views
ADD COMMENT
3
Entering edit mode
11.7 years ago

n:N50 is the number of contigs of at least N50 (which is "defined as the length N for which 50% of all bases in the sequences are in a sequence of length L < N" according to https://www.broad.harvard.edu/crd/wiki/index.php/N50)

In the case of the "sum" column, I am not totally sure... this is ambiguous. I tried to find some info on that on the Internet, but wasn't very successful. I tried to look in the paper (http://genome.cshlp.org/content/19/6/1117.full) in which the assembler was first described and they present such statistics, but it's not totally clear what the "sum" column means. Maybe you can look into it in more details or you can e-mail the person who built the assembler, you will probably find an answer to your question.

ADD COMMENT
0
Entering edit mode

Would it be possible that the "sum" column is the number of reads that were successfully assembled ? It would make sense, but I am not 100% sure though. But it's very probable.

ADD REPLY
0
Entering edit mode

The number exceeds the total number of reads given to it, so it can't be that.

ADD REPLY
1
Entering edit mode
11.7 years ago
lin.barnum ▴ 230

I took a deeper look into it. It is the number of bases found in contigs/scaffolds > 200 bp in length (or whatever minimum length you have chosen)

ADD COMMENT
0
Entering edit mode
11.7 years ago

Well, in Simpson et al. 2009 (http://genome.cshlp.org/content/19/6/1117.full), the "sum" column is in Gbp, so my other guess would be that it is the total number of nucleotides (or bp). I'm not sure for which contig category though. It might only be for the n:N50 category. You could verify that by summing up all the contig lengths in the different "categories" that you have (all your contigs, n:N50, n:200) and see if it matches the number you have in the "sum" column. I hope it helps !

ADD COMMENT

Login before adding your answer.

Traffic: 2840 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6