Question: How To Calculate The Coverage Of A Specific Interval Based On Data From A Bam File
gravatar for
6.8 years ago by
rehma.ar220 wrote:

Dear All!

i want to calculate the coverage of a specific fragment from a bam file. i am using this script right now

samtools view -u infile.bam chromosome:76437781-76439092 | samtools pileup - -cf ref.fasta | awk '{if ($2 >= 76437781 || $2 <= 76439092) print }' | awk '{ count++ ; SUM += $8 } END { if (count==0)print SUM  "\t" count "\t" "0";else print "\t" SUM "\t" count "\t" SUM/count }' > out.file

out put is something like that:

coverage 16418 1501 10.938

"76437781-76439092" i am giving these coordinates manually in the shell. i know i can have variables in shell but i want to use a bed file to give the coordinates and put it in the loop to get the coverage of all my fragments in one file. i think i have to do this in perl,this looks quite difficult to me to use three files 1)bam file 2)refrence fasta 3)bed file with coordinates can i do this in perl. or any other easier way.

thanks in advance

perl samtools awk • 5.4k views
ADD COMMENTlink modified 6.8 years ago by Sukhdeep Singh9.8k • written 6.8 years ago by rehma.ar220
gravatar for Sukhdeep Singh
6.8 years ago by
Sukhdeep Singh9.8k
Sukhdeep Singh9.8k wrote:

I calculate coverage using genomeCoverageBed.

So, if your file has read positions of which coverage has to be calculated in a file like

cat reads

and you want to calculate coverage from a bam (mapped reads) file, you can use this pipe, I just made and tested.

while read line; 
do samtools view -b a.bam $line | \
bamToBed -i stdin | \
genomeCoverageBed -bg -i stdin -g ~/src/useFul/ucsc/genomeIndex/mm9.chrom ;done < reads > coverage

We are reading the

  • chromosome co-ordinates line by line,
  • pulling the respective data out of bam file,
  • converting them to a bed file and
  • calculating the coverage of that.

You need to have samtools, bedtools installed for it. You also need the organism's genome co-ordinates and the bam file needs to be indexed. Index using samtools index file.bam


ADD COMMENTlink modified 6.8 years ago by Obi Griffith17k • written 6.8 years ago by Sukhdeep Singh9.8k

sorry if i did not conveyed myself properly. i got your point but you are getting the coverage while i want to calculate the average coverage of the whole fragment between the coordinates as you can see in the question i am using awk '{ count++ ; SUM += $8 to do this. that i think you can not do with genomeCoverageBed.

ADD REPLYlink written 6.8 years ago by rehma.ar220

Check this post, might help you, in that case your script is fine, just reformat a bit using loop or try in R.

ADD REPLYlink written 6.8 years ago by Sukhdeep Singh9.8k
gravatar for Johan
6.8 years ago by
Johan850 wrote:

The DepthOfCoverage Walker in the gatk supports this. Here's a usage example:

java -jar GenomeAnalysisTK.jar -T DepthOfCoverage -R [reference fasta] -I [bam file] -L [bed file]
ADD COMMENTlink written 6.8 years ago by Johan850

Hi Johan, I am having trouble using this over a provided interval file. Although it runs fine if I omit the -L file. The error that I get is: A USER ERROR has occurred (version 1.3-20-g447e9bf):

<h5>ERROR The invalid arguments or inputs must be corrected before the GATK can proceed</h5> <h5>ERROR Please do not post this error to the GATK forum</h5> <h5>ERROR</h5> <h5>ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.</h5> <h5>ERROR Visit our wiki for extensive documentation</h5> <h5>ERROR Visit our forum to view answers to commonly asked questions</h5> <h5>ERROR</h5> <h5>ERROR MESSAGE: File associated with name /Users/Desktop//myinterval.bed is malformed: Interval file could not be parsed in any supported format. caused by BED files must be parsed through Tribble; parsing them as intervals through the GATK engine is no longer supported</h5>

The format of the file is: chr1:XXXX-XXXXXX also tried 1:XXXXX-XXXXXX

Do you know what the problem might be? Thanks!

ADD REPLYlink written 6.4 years ago by Mamta440

You seem to be using a very old version of the GATK - 1.3-20-g447e9bf. The current version is 2.3-9, try downloading and running with the latest version and see if it works. You can download it from here:

ADD REPLYlink written 6.4 years ago by Johan850
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1286 users visited in the last hour