Question: How To Calculate Coverage
3
gravatar for HG
5.8 years ago by
HG1.1k
Germany
HG1.1k wrote:

Hi all, i have been given 50 E.coli genome fastq files. But I don't have so much idea about background of sequencing. The information I have is that they're illumina reads with average length 250bp. Now i want to check coverage of all the genome. I did like this:

coverage = (read count * read length ) / total genome size.

where read count =(wc-l xyz.fastq)/4

read length =250

total genome 5.2million bp

Can any one please suggest me i am doing write way or my calculation wrong??

fastq coverage • 31k views
ADD COMMENTlink modified 5.8 years ago by r.follador60 • written 5.8 years ago by HG1.1k
3
gravatar for Devon Ryan
5.8 years ago by
Devon Ryan92k
Freiburg, Germany
Devon Ryan92k wrote:

(N.B., I took the liberty of editing your question a bit such that it now bears some semblance of grammatical correctness.)

That will give you only an idealized average coverage. In reality, not all of the reads will map (and you might trim adapter contamination and such off before mapping anyway). So, just align everything first (or assemble it or whatever you intend to do with the reads) and then get the actual coverage from that.

For determining coverage from a BAM file, see " tools to calculate average coverage for a bam file? " and the replies.

ADD COMMENTlink modified 5.8 years ago • written 5.8 years ago by Devon Ryan92k

Thank you for your suggestion . I did all the assembly with spades and i did all mapping with Smalt. So i have draft genome as well as .bam file . Could you please suggest me next step.

ADD REPLYlink written 5.8 years ago by HG1.1k

See the tools to calculate average coverage for a bam file? thread for a whole bunch of options.

ADD REPLYlink written 5.8 years ago by Devon Ryan92k
0
gravatar for r.follador
5.8 years ago by
r.follador60
Switzerland
r.follador60 wrote:

Your formula will be good enough for a very rough estimation of the (the upper bound of) average coverage. However, there are more factors which will decrease this value:

  • not all reads in the FASTQ file will be aligned (e.g. low quality reads, reads coming from plasmids or genes not present in the reference sequence)
  • not the entire length (250bp) of the reads will be aligned, because depending on the quality and the alignment software reads can be clipped if the quality is too low

To get a good estimation of the coverage, you need to do the actual alignment and use a software which counts the number of aligned bases (such as GATK, see http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_coverage_DepthOfCoverage.html)

ADD COMMENTlink written 5.8 years ago by r.follador60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1800 users visited in the last hour