How Many Reads In A Bam File Are Aligned To An Arbitrary Region
1
12
Entering edit mode
11.0 years ago

I have BAM files from an Illumina GA experiment. I'd like to know how many of my aligned reads fall in a particular region of a given chromosome. It's a bonus if I can filter this by alignment quality. I've checked Picard and SAMtools and don't see an obvious canned command. It looks like I can use PySam to code this myself (e.g. by adapting the example in the

next-gen sequencing sequence • 8.0k views
16
Entering edit mode
11.0 years ago

Assuming you've sorted and indexed your bam file with "samtools sort" and samtools index", you can pull out reads from a given region as follows (for example from chr1:1000000-2000000)

samtools view <BAM> chr1:1000000-2000000 | wc -l


Now, if you also want to limit on mapping quality, use the -q option (here I am asking for MAPQ >= 10).

samtools view -q 10 <BAM> chr1:1000000-2000000 | wc -l


If you need to look for reads overlapping many regions, use BEDTools.

intersectBed -abam <BAM> -b regions.bed > reads.in.regions.bam