Find count of reads in BAM file that fall within BED region
1
2
Entering edit mode
2.9 years ago
KH ▴ 80

I have a set of BAM files from a bwa alignment, as well as a BED file of "target regions" I am interested in.

What I want to know is the count of the reads in the BAM file which overlap with any of the region in the BED file. Essentially, each read in the BAM will have an associated TRUE or FALSE denoting if it overlaps with any of the BED regions.

I can't seem to find the exact tool to use within bedtools, since bedtools multicov returns the total reads which overlap within each BED region, which ends up overcounting since an aligned BAM read can overlap with multiple BED regions.

bedtools intersect -a BAM -b BED -u technically gets me there, but in this case it generates a new bam file with only the reads in the BED regions, which I then have to count with samtools view -c. Since the BAM files are large, this is time-consuming and I was wondering if there is a faster way to do this.

bedtools BAM overlap BED • 5.3k views
ADD COMMENT
0
Entering edit mode

count of the reads in the BAM file which overlap with any of the region in the BED file

See the answers in this thread: Extract Reads From A Bam File That Fall Within A Given Region (samtools view -L BED_file accepts a BED file of regions also).

each read in the BAM will have an associated TRUE or FALSE denoting if it overlaps with any of the BED regions.

Not sure if that would need a custom solution.

ADD REPLY
0
Entering edit mode

Essentially, each read in the BAM will have an associated TRUE or FALSE denoting if it overlaps with any of the BED regions.

hum... what are you trying to do at the end ?

ADD REPLY
0
Entering edit mode

I probably should have clarified the purpose of this, which might make it easier to understand.

We have taken genomic DNA and isolated certain regions out (essentially exons). We then took this DNA enriched for exons and sequenced it.

My question is: how many of our sequencing reads fall within these exon regions? In other words, how efficient is the enrichment for these exon regions?

So what I want is a single number for each BAM file which states how many of the mapped reads overlapped with an exon (which I have as a BED file).

Currently the way I am doing this is:
bedtools intersect -a BAM -b BED -u | samtools view -c
Which works but is very slow.

For reference:
From bedtools intersect documentation, the -u flag will:
"Write original A entry once if any overlaps found in B. In other words, just report the fact at least one overlap was found in B."

ADD REPLY
1
Entering edit mode
2.9 years ago

how many of our sequencing reads fall within these exon regions? In other words, how efficient is the enrichment for these exon regions?

samtools view -c -q 30 -F 3844 in.bam > total.txt
samtools view -c -q 30 -F 3844 -L capture.bed in.bam > in_target.txt
ADD COMMENT

Login before adding your answer.

Traffic: 2721 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6