Question: How To Calculate The Coverage Of A Specific Interval Based On Data From A Bam File
1
gravatar for rehma.ar
7.2 years ago by
rehma.ar230
rehma.ar230 wrote:

Dear All!

i want to calculate the coverage of a specific fragment from a bam file. i am using this script right now

samtools view -u infile.bam chromosome:76437781-76439092 | samtools pileup - -cf ref.fasta | awk '{if ($2 >= 76437781 || $2 <= 76439092) print }' | awk '{ count++ ; SUM += $8 } END { if (count==0)print SUM  "\t" count "\t" "0";else print "\t" SUM "\t" count "\t" SUM/count }' > out.file

out put is something like that:

coverage 16418 1501 10.938

"76437781-76439092" i am giving these coordinates manually in the shell. i know i can have variables in shell but i want to use a bed file to give the coordinates and put it in the loop to get the coverage of all my fragments in one file. i think i have to do this in perl,this looks quite difficult to me to use three files 1)bam file 2)refrence fasta 3)bed file with coordinates can i do this in perl. or any other easier way.

thanks in advance

perl samtools awk • 5.6k views
ADD COMMENTlink modified 7.2 years ago by Sukhdeep Singh9.9k • written 7.2 years ago by rehma.ar230
7
gravatar for Sukhdeep Singh
7.2 years ago by
Sukhdeep Singh9.9k
Netherlands
Sukhdeep Singh9.9k wrote:

I calculate coverage using genomeCoverageBed.

So, if your file has read positions of which coverage has to be calculated in a file like

cat reads
chr1:3000239-3001041
chr1:3001239-3003041

and you want to calculate coverage from a bam (mapped reads) file, you can use this pipe, I just made and tested.

while read line; 
do samtools view -b a.bam $line | \
bamToBed -i stdin | \
genomeCoverageBed -bg -i stdin -g ~/src/useFul/ucsc/genomeIndex/mm9.chrom ;done < reads > coverage

We are reading the

  • chromosome co-ordinates line by line,
  • pulling the respective data out of bam file,
  • converting them to a bed file and
  • calculating the coverage of that.

You need to have samtools, bedtools installed for it. You also need the organism's genome co-ordinates and the bam file needs to be indexed. Index using samtools index file.bam

Cheers

ADD COMMENTlink modified 7.2 years ago by Obi Griffith18k • written 7.2 years ago by Sukhdeep Singh9.9k

sorry if i did not conveyed myself properly. i got your point but you are getting the coverage while i want to calculate the average coverage of the whole fragment between the coordinates as you can see in the question i am using awk '{ count++ ; SUM += $8 to do this. that i think you can not do with genomeCoverageBed.

ADD REPLYlink written 7.2 years ago by rehma.ar230

Check this post, might help you, in that case your script is fine, just reformat a bit using loop or try in R. http://www.biostars.org/post/show/5165/tools-to-calculate-average-coverage-for-a-bam-file/

ADD REPLYlink written 7.2 years ago by Sukhdeep Singh9.9k
2
gravatar for Johan
7.2 years ago by
Johan860
Sweden
Johan860 wrote:

The DepthOfCoverage Walker in the gatk supports this. Here's a usage example:

java -jar GenomeAnalysisTK.jar -T DepthOfCoverage -R [reference fasta] -I [bam file] -L [bed file]
ADD COMMENTlink written 7.2 years ago by Johan860

Hi Johan, I am having trouble using this over a provided interval file. Although it runs fine if I omit the -L file. The error that I get is: A USER ERROR has occurred (version 1.3-20-g447e9bf):

<h5>ERROR The invalid arguments or inputs must be corrected before the GATK can proceed</h5> <h5>ERROR Please do not post this error to the GATK forum</h5> <h5>ERROR</h5> <h5>ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.</h5> <h5>ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki</h5> <h5>ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa</h5> <h5>ERROR</h5> <h5>ERROR MESSAGE: File associated with name /Users/Desktop//myinterval.bed is malformed: Interval file could not be parsed in any supported format. caused by BED files must be parsed through Tribble; parsing them as intervals through the GATK engine is no longer supported</h5>

The format of the file is: chr1:XXXX-XXXXXX also tried 1:XXXXX-XXXXXX

Do you know what the problem might be? Thanks!

ADD REPLYlink written 6.8 years ago by Mamta450

You seem to be using a very old version of the GATK - 1.3-20-g447e9bf. The current version is 2.3-9, try downloading and running with the latest version and see if it works. You can download it from here: http://www.broadinstitute.org/gatk/download

ADD REPLYlink written 6.8 years ago by Johan860
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1115 users visited in the last hour