Finding High Coverage Regions In A Bam File?
3
4
Entering edit mode
12.8 years ago

I would like to have a list of regions that are in the top X% coverage in a bam file, centered at the peak of coverage for a window of size W. For example, something like:

~/find_top_X_perc -perc 10 -window 250 file.bam

That will produce a list of regions ready to be used on samtools tview:

1:1000000-1000250
2:2000250-2000300
[...]

Any suggestions?

bam • 6.7k views
ADD COMMENT
14
Entering edit mode
12.8 years ago

i would go with R & Bioconductor:

library("Rsamtools")
library("rtracklayer")
rbga<-readBamGappedAlignments("myBamfile.bam")
bamCov<-coverage(rbga)
#quantile returns a list of the ith % for each chromosome, you could weight the mean properly if you are ambitious
cutoff<-mean(quantile(bamCov,.9))
hotslices<-slice(bamCov,lower=cutoff)
hotwindows<-resize(hotslices,width=250,fix="center")
dummy<-file("mybed.bed")
export.bed(hotwindows,dummy)
ADD COMMENT
2
Entering edit mode

This is neat - declarative rather than imperative programming - I need to learn the NGS packages for R

ADD REPLY
0
Entering edit mode

you could actually combine lines 3-9 into one line but it would be unreadable

ADD REPLY
3
Entering edit mode
12.8 years ago
Ketil 4.1k

As it happens, I recently wrote a small program to output coverage in user-defined buckets from a BAM file. Currently you need to get it via darcs

darcs get http://malde.org/~ketil/biohaskell/bamcov

And compile it yourself (e.g. per instructions here: http://biohaskell.org/Installation)

The program outputs five columns per bucket: chromosome, position, and paired, orphan, and link coverage (the latter being paired reads mapping to different chromosomes).

ADD COMMENT
0
Entering edit mode

I tried to install bamcov, but failed. As a reference, here are the installation steps: [?] sudo apt-get install darcs sudo apt-get install cabal-install darcs get http://malde.org/~ketil/biohaskell/bamcov cabal update cabal install [...] cabal: Error: some packages failed to install: bamcov-0 depends on seqloc-0.2.1 which failed to install. samtools-0.1.2 depends on seqloc-0.2.1 which failed to install. seqloc-0.2.1 failed during the configure step. The exception was: ExitFailure 1 [?]

ADD REPLY
0
Entering edit mode

Basically, that shoulda worked. You'd need to check out why seqloc failed, it should be in the output somewhere. (The seqloc library is used by the samtools bindings, so this is an unavoidable requirement, I'm afraid.)

ADD REPLY
1
Entering edit mode
12.8 years ago
Zhidkov ▴ 600

You can start with turning your bam file to pileup, perform summary statistics on coverage column (to establish what is high coverage in current bam), then extract coordinates position with high coverage, then again run the samtools this time extract reads which correspond to high coverage regions.

Ilia

ADD COMMENT

Login before adding your answer.

Traffic: 1951 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6