Question

How do I access genome coverage using SPADES?

1

Entering edit mode

8.7 years ago

fhsantanna ▴ 610

I have assembled bacterial genomes using SPADES. Now I am going to submit them to Genbank, but I need to know the coverage of each assembly. Should I provide the raw read coverage or the filtered final coverage? If the second possibility is true, how do I access these values from the SPADES log file?

SPADES genome coverage • 12k views

ADD COMMENT • link updated 19 months ago by Ram 43k • written 8.7 years ago by fhsantanna ▴ 610

0

Entering edit mode

Hello All,

This post was useful - thanks! I have got this result by using bbmap.sh on my data. Could you please tell me how to interpret the coverage here? Average coverage is 209.654 - what does this mean? I would really appreciate your input.

Genome:                 1
Key Length:             13
Max Indel:              16000
Minimum Score Ratio:    0.56
Mapping Mode:           normal
Reads Used:             1821574 (553015125 bases)

Mapping:                1648.545 seconds.
Reads/sec:              1104.96
kBases/sec:             335.46


Read 1 data:            pct reads       num reads       pct bases          num bases

mapped:                  99.2931%         1808698        99.2358%          548788917
unambiguous:             80.4977%         1466326        83.8068%          463464066
ambiguous:               18.7954%          342372        15.4290%           85324851
low-Q discards:           0.0000%               0         0.0000%                  0

perfect best site:        4.4512%           81082         2.4810%           13720350
semiperfect site:        17.7655%          323611        14.4467%           79892367

Match Rate:                   NA               NA        10.2835%          499947774
Error Rate:              90.2312%         1639609        88.7859%         4316473287
Sub Rate:                28.4581%          517117         0.0276%            1340991
Del Rate:                68.6627%         1247684        88.7119%         4312877825
Ins Rate:                53.1733%          966223         0.0464%            2254471
N Rate:                  57.9214%         1052501         0.9307%           45245681

Reads:                                  1821574
Mapped reads:                           1529439
Mapped bases:                           408501909
Ref scaffolds:                          9236
Ref bases:                              1948456

Percent mapped:                         83.963
Percent proper pairs:                   0.000
Average coverage:                       209.654
Standard deviation:                     645.173
Percent scaffolds with any coverage:    76.09
Percent of reference bases covered:     77.89

Thanks!

ADD REPLY • link 5.5 years ago by DanielC ▴ 170

0

Entering edit mode

I have tried to obtain this same output file by: bbmap.sh in=reads.fq ref=contigs.fa covstats=covstats.txt But my bbmap.sh does not recognise covstats as parameter? Do you mind posting here the bbmap version you are using and the command you used?

Thanks, Chiara

ADD REPLY • link 5.2 years ago by mariachiaracascarano ▴ 10

1

Entering edit mode

7.1 years ago

faisal.akbar5 ▴ 10

You could search for "Average coverage" throughout the spades.log file.

ADD COMMENT • link updated 19 months ago by Ram 43k • written 7.1 years ago by faisal.akbar5 ▴ 10

1

Entering edit mode

7.1 years ago

shenwei356 8.5k

The contigs also have length and coverage information by which you can compute the average coverage.

$ grep '^>'  contigs.fasta | awk -F _  'BEGIN {OFS="\t"} {print $0,$4,$6}' | more
>NODE_1_length_766747_cov_499.885       766747  499.885
>NODE_2_length_581296_cov_457.579       581296  457.579
>NODE_3_length_399441_cov_525.578       399441  525.578

ADD COMMENT • link updated 19 months ago by Ram 43k • written 7.1 years ago by shenwei356 8.5k

2

Entering edit mode

I am not 100% sure, but I think this is the k-mer coverage and not the read coverage. S.a. Confusion about the kmer coverage and http://seqanswers.com/forums/showthread.php?t=6887

ADD REPLY • link 6.9 years ago by cedric.laczny ▴ 50

1

Entering edit mode

Kmer-based assemblers like Spades generally annotate contigs with kmer coverage. For read coverage, you need to map the reads against the assembly.

ADD REPLY • link 6.9 years ago by Brian Bushnell 20k

0

Entering edit mode

Thanks for your explanation.

ADD REPLY • link 6.9 years ago by shenwei356 8.5k

Ram · Accepted Answer · 2015-08-19

7

Entering edit mode

8.7 years ago

Brian Bushnell 20k

The best way to calculate coverage is by mapping, not by looking at the assembler's logs. For example, with BBMap:

bbmap.sh in=reads.fq ref=contigs.fa covstats=covstats.txt

That will print a message like this:

Average coverage:                       278.50
Percent scaffolds with any coverage:    100.00
Percent of reference bases covered:     99.98

...in addition to creating covstats.txt which will list the coverage statistics for each individual scaffold. The reads you use for mapping should be the ones you fed into Spades.

ADD COMMENT • link updated 19 months ago by Ram 43k • written 8.7 years ago by Brian Bushnell 20k

0

Entering edit mode

Thx! BBMAP was very helpful!

ADD REPLY • link 5.3 years ago by Prakki Rama ★ 2.7k