Question: How do I access genome coverage using SPADES?
1
gravatar for fhsantanna
3.8 years ago by
fhsantanna440
Brazil
fhsantanna440 wrote:

I have assembled bacterial genomes using SPADES. Now I am going to submit them to Genbank, but I need to know the coverage of each assembly. Should I provide the raw read coverage or the filtered final coverage? If the second possibility is true, how do I access these values from the SPADES log file? 

myposts coverage spades genome • 5.0k views
ADD COMMENTlink modified 7 months ago by DanielC80 • written 3.8 years ago by fhsantanna440

Hello All,

This post was useful - thanks! I have got this result by using bbmap.sh on my data. Could you please tell me how to interpret the coverage here? Average coverage is 209.654 - what does this mean? I would really appreciate your input.

Genome:                 1
Key Length:             13
Max Indel:              16000
Minimum Score Ratio:    0.56
Mapping Mode:           normal
Reads Used:             1821574 (553015125 bases)

Mapping:                1648.545 seconds.
Reads/sec:              1104.96
kBases/sec:             335.46


Read 1 data:            pct reads       num reads       pct bases          num bases

mapped:                  99.2931%         1808698        99.2358%          548788917
unambiguous:             80.4977%         1466326        83.8068%          463464066
ambiguous:               18.7954%          342372        15.4290%           85324851
low-Q discards:           0.0000%               0         0.0000%                  0

perfect best site:        4.4512%           81082         2.4810%           13720350
semiperfect site:        17.7655%          323611        14.4467%           79892367

Match Rate:                   NA               NA        10.2835%          499947774
Error Rate:              90.2312%         1639609        88.7859%         4316473287
Sub Rate:                28.4581%          517117         0.0276%            1340991
Del Rate:                68.6627%         1247684        88.7119%         4312877825
Ins Rate:                53.1733%          966223         0.0464%            2254471
N Rate:                  57.9214%         1052501         0.9307%           45245681

Reads:                                  1821574
Mapped reads:                           1529439
Mapped bases:                           408501909
Ref scaffolds:                          9236
Ref bases:                              1948456

Percent mapped:                         83.963
Percent proper pairs:                   0.000
Average coverage:                       209.654
Standard deviation:                     645.173
Percent scaffolds with any coverage:    76.09
Percent of reference bases covered:     77.89

Thanks!

ADD REPLYlink written 7 months ago by DanielC80

I have tried to obtain this same output file by: bbmap.sh in=reads.fq ref=contigs.fa covstats=covstats.txt But my bbmap.sh does not recognise covstats as parameter? Do you mind posting here the bbmap version you are using and the command you used?

Thanks, Chiara

ADD REPLYlink written 3 months ago by mariachiaracascarano10
6
gravatar for Brian Bushnell
3.8 years ago by
Walnut Creek, USA
Brian Bushnell16k wrote:

The best way to calculate coverage is by mapping, not by looking at the assembler's logs.  For example, with BBMap:

bbmap.sh in=reads.fq ref=contigs.fa covstats=covstats.txt

That will print a message like this:

Average coverage:                       278.50
Percent scaffolds with any coverage:    100.00
Percent of reference bases covered:     99.98

...in addition to creating "covstats.txt" which will list the coverage statistics for each individual scaffold.  The reads you use for mapping should be the ones you fed into Spades.

ADD COMMENTlink written 3.8 years ago by Brian Bushnell16k

Thx! BBMAP was very helpful!

ADD REPLYlink written 5 months ago by Prakki Rama2.3k
1
gravatar for faisal.akbar5
2.2 years ago by
faisal.akbar510 wrote:

You could search "Average covergae" throughout the spades.log file.

ADD COMMENTlink written 2.2 years ago by faisal.akbar510
1
gravatar for shenwei356
2.2 years ago by
shenwei3564.7k
China
shenwei3564.7k wrote:

The contigs also have length and coverage information by which you can compute the average coverage.

$ grep '^>'  contigs.fasta | awk -F _  'BEGIN {OFS="\t"} {print $0,$4,$6}' | more
>NODE_1_length_766747_cov_499.885       766747  499.885
>NODE_2_length_581296_cov_457.579       581296  457.579
>NODE_3_length_399441_cov_525.578       399441  525.578
ADD COMMENTlink written 2.2 years ago by shenwei3564.7k
2

I am not 100% sure, but I think this is the k-mer coverage and not the read coverage. S.a. Confusion about the kmer coverage and http://seqanswers.com/forums/showthread.php?t=6887

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by cedric.laczny50
1

Kmer-based assemblers like Spades generally annotate contigs with kmer coverage. For read coverage, you need to map the reads against the assembly.

ADD REPLYlink written 2.0 years ago by Brian Bushnell16k

Thanks for your explanation.

ADD REPLYlink written 2.0 years ago by shenwei3564.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1658 users visited in the last hour