How To Calculate Coverage
2
4
Entering edit mode
10.9 years ago
HG ★ 1.2k

Hi all, i have been given 50 E.coli genome fastq files. But I don't have so much idea about background of sequencing. The information I have is that they're illumina reads with average length 250bp. Now i want to check coverage of all the genome. I did like this:

coverage = (read count * read length ) / total genome size.

where read count =(wc-l xyz.fastq)/4

read length =250

total genome 5.2million bp

Can any one please suggest me i am doing write way or my calculation wrong??

fastq coverage • 74k views
ADD COMMENT
0
Entering edit mode

Please how can i determine the coverage using software

ADD REPLY
4
Entering edit mode
10.9 years ago

(N.B., I took the liberty of editing your question a bit such that it now bears some semblance of grammatical correctness.)

That will give you only an idealized average coverage. In reality, not all of the reads will map (and you might trim adapter contamination and such off before mapping anyway). So, just align everything first (or assemble it or whatever you intend to do with the reads) and then get the actual coverage from that.

For determining coverage from a BAM file, see " tools to calculate average coverage for a bam file? " and the replies.

ADD COMMENT
0
Entering edit mode

Thank you for your suggestion . I did all the assembly with spades and i did all mapping with Smalt. So i have draft genome as well as .bam file . Could you please suggest me next step.

ADD REPLY
0
Entering edit mode

See the tools to calculate average coverage for a bam file? thread for a whole bunch of options.

ADD REPLY
1
Entering edit mode
10.9 years ago
r.follador ▴ 90

Your formula will be good enough for a very rough estimation of the (the upper bound of) average coverage. However, there are more factors which will decrease this value:

  • not all reads in the FASTQ file will be aligned (e.g. low quality reads, reads coming from plasmids or genes not present in the reference sequence)
  • not the entire length (250bp) of the reads will be aligned, because depending on the quality and the alignment software reads can be clipped if the quality is too low

To get a good estimation of the coverage, you need to do the actual alignment and use a software which counts the number of aligned bases (such as GATK, see http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_coverage_DepthOfCoverage.html)

ADD COMMENT

Login before adding your answer.

Traffic: 1856 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6