How To Calculate The Coverage Of A Specific Interval Based On Data From A Bam File
2
1
Entering edit mode
11.6 years ago
rehma.ar ▴ 290

Dear All!

i want to calculate the coverage of a specific fragment from a bam file. i am using this script right now

samtools view -u infile.bam chromosome:76437781-76439092 | samtools pileup - -cf ref.fasta | awk '{if ($2 >= 76437781 || $2 <= 76439092) print }' | awk '{ count++ ; SUM += $8 } END { if (count==0)print SUM  "\t" count "\t" "0";else print "\t" SUM "\t" count "\t" SUM/count }' > out.file

out put is something like that:

coverage 16418 1501 10.938

"76437781-76439092" i am giving these coordinates manually in the shell. i know i can have variables in shell but i want to use a bed file to give the coordinates and put it in the loop to get the coverage of all my fragments in one file. i think i have to do this in perl,this looks quite difficult to me to use three files 1)bam file 2)refrence fasta 3)bed file with coordinates can i do this in perl. or any other easier way.

thanks in advance

samtools awk perl • 8.4k views
ADD COMMENT
7
Entering edit mode
11.6 years ago

I calculate coverage using genomeCoverageBed.

So, if your file has read positions of which coverage has to be calculated in a file like

cat reads
chr1:3000239-3001041
chr1:3001239-3003041

and you want to calculate coverage from a bam (mapped reads) file, you can use this pipe, I just made and tested.

while read line; 
do samtools view -b a.bam $line | \
bamToBed -i stdin | \
genomeCoverageBed -bg -i stdin -g ~/src/useFul/ucsc/genomeIndex/mm9.chrom ;done < reads > coverage

We are reading the

  • chromosome co-ordinates line by line,
  • pulling the respective data out of bam file,
  • converting them to a bed file and
  • calculating the coverage of that.

You need to have samtools, bedtools installed for it. You also need the organism's genome co-ordinates and the bam file needs to be indexed. Index using samtools index file.bam

Cheers

ADD COMMENT
0
Entering edit mode

sorry if i did not conveyed myself properly. i got your point but you are getting the coverage while i want to calculate the average coverage of the whole fragment between the coordinates as you can see in the question i am using awk '{ count++ ; SUM += $8 to do this. that i think you can not do with genomeCoverageBed.

ADD REPLY
0
Entering edit mode

Check this post, might help you, in that case your script is fine, just reformat a bit using loop or try in R. http://www.biostars.org/post/show/5165/tools-to-calculate-average-coverage-for-a-bam-file/

ADD REPLY
2
Entering edit mode
11.6 years ago
Johan ▴ 890

The DepthOfCoverage Walker in the gatk supports this. Here's a usage example:

java -jar GenomeAnalysisTK.jar -T DepthOfCoverage -R [reference fasta] -I [bam file] -L [bed file]
ADD COMMENT
0
Entering edit mode

Hi Johan, I am having trouble using this over a provided interval file. Although it runs fine if I omit the -L file. The error that I get is: A USER ERROR has occurred (version 1.3-20-g447e9bf):

<h5>ERROR The invalid arguments or inputs must be corrected before the GATK can proceed</h5> <h5>ERROR Please do not post this error to the GATK forum</h5> <h5>ERROR</h5> <h5>ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.</h5> <h5>ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki</h5> <h5>ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa</h5> <h5>ERROR</h5> <h5>ERROR MESSAGE: File associated with name /Users/Desktop//myinterval.bed is malformed: Interval file could not be parsed in any supported format. caused by BED files must be parsed through Tribble; parsing them as intervals through the GATK engine is no longer supported</h5>

The format of the file is: chr1:XXXX-XXXXXX also tried 1:XXXXX-XXXXXX

Do you know what the problem might be? Thanks!

ADD REPLY
0
Entering edit mode

You seem to be using a very old version of the GATK - 1.3-20-g447e9bf. The current version is 2.3-9, try downloading and running with the latest version and see if it works. You can download it from here: http://www.broadinstitute.org/gatk/download

ADD REPLY

Login before adding your answer.

Traffic: 2790 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6