How To Compute The Number Of Reads In Bam That Overlap With A Region? The Overlap Must Be 100%.
3
2
Entering edit mode
9.8 years ago
siyu ▴ 150

how to compute the number of reads in BAM that overlap with a region? the overlap must be 100%.

reads overlap bam • 4.2k views
ADD COMMENT
0
Entering edit mode

overlap must be 100 % ! overlap must be 100 % ! http://commons.wikimedia.org/wiki/File:Female_animal_trainer_and_leopard,_c1906.jpg overlap must be 100 % ! hum... sorry... :-) have a look at GATK depth of coverage.

ADD REPLY
4
Entering edit mode
9.8 years ago

It is unclear from your question whether you want a read to be contained entirely within a region, or a region contained entirely within a read, when counting reads.

But that's okay! You can use the --count operator in BEDOPS bedmap with BEDOPS bam2bed to efficiently calculate either result, specifying different overlap criteria in either case.

First, use BEDOPS sort-bed to make sure your input regions are ready for use with bedmap:

$ sort-bed unsortedRegions.bed > sortedRegions.bed

Preparing input with sort-bed allows BEDOPS tools to work fast and have a very low memory profile, compared with alternatives.

We're now ready to search for reads that overlap regions.

Let's examine the first overlap scenario: Use the --fraction-map operator with bedmap to enforce the overlap criterion that a BAM read (a "mapped" element) must overlap the region by 100%, i.e. that a read is contained entirely within a region-of-interest:

$ bam2bed < reads.bam \
    | bedmap --echo --count --fraction-map 1.0 sortedRegions.bed - \
    > answer.bed

The format of answer.bed will be:

[ region-1 ] | [ number of BAM reads entirely within region-1 ]
[ region-2 ] | [ number of BAM reads entirely within region-2 ]
...
[ region-N ] | [ number of BAM reads entirely within region-N ]

Now let's discuss the second overlap case. If, instead, you want the region-of-interest (a "reference" element) to be contained entirely within the read, then use the --fraction-ref operator:

$ bam2bed < reads.bam \
    | bedmap --echo --count --fraction-ref 1.0 sortedRegions.bed - \
    > answer.bed

The format of answer.bed will be:

[ region-1 ] | [ # BAM reads that region-1 is entirely contained within ]
[ region-2 ] | [ # BAM reads that region-2 is entirely contained within ]
...
[ region-N ] | [ # BAM reads that region-N is entirely contained within ]

You can give any fraction from 0 to 1, inclusive, to --fraction-map and --fraction-ref, depending on the stringency of overlap that you need.

Other overlap criteria are available, which can be useful, depending on your analysis.

ADD COMMENT
3
Entering edit mode
9.8 years ago

You can use bedtools :

something like this (maybe someone can correct me, I'm not sure of the params):

intersectBed -abam in.bam -b region.gtf -c -f 1

ADD COMMENT
0
Entering edit mode

I think this can only find B overlap A with 100%,its A 's 100%, not B'100%

ADD REPLY
0
Entering edit mode

Do you want to find reads that entirely contain a small region, or reads that are entirely within a region larger than the read?

ADD REPLY
0
Entering edit mode
9.8 years ago

You can also try htseq

ADD COMMENT

Login before adding your answer.

Traffic: 2416 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6