How Many Reads In A Bam File Are Aligned To a Specific Region
5
0
Entering edit mode
6.2 years ago
h.rokny • 0

I have BAM files from Hi-C experiment. I'd like to know how many of my aligned reads fall in a particular region of a given chromosome (i have a bed file that contains list of regions).

bed file:

chr1 1000 2000  
chr1 3000 4000   
chrY 1000000 2000000

Thanks

alignment next-gen • 7.7k views
ADD COMMENT
3
Entering edit mode
6.2 years ago
igor 13k

You can use bedtools coverage -a regions.bed -b reads.bam

ADD COMMENT
0
Entering edit mode

I was trying to solve the same problem with Bedtools, version 2.26.0.

I am using Bedtools coverage option to calculate coverage of my aligned reads in .bam format over 2kb windows of chromosome. So, I created a bed file for the chromosome separated in 2kb windows. The bed file has only 3 columns:chr1 0 2000; chr1 2000 4000... and so on. I am running the bedtools coverage option with -a genome.bed -b reads.bam. When I try to upload it to UCSC it tells me ''Error line 1 of custom track: Expecting + or - in strand'' How can I solve this?

The output of bedtools is like this:

chr1 0 2000 0 0 2000 0.0000000
chr1 2000 4000 0 0 2000 0.0000000
ADD REPLY
2
Entering edit mode
6.2 years ago

samtools view -L can take a .bed file as input, and output reads that overlap from the .bam. samtools view -L -c will just return the number of reads.

ADD COMMENT
2
Entering edit mode
6.2 years ago

An R code version using GenomicRanges. Not super fast but will do the job and let you keep working with the information in R.

library(GenomicAlignments)
library(rtracklayer)

## import bam file
bamfile <- readGAlignments('sample.bam') ## assuming single end otherwise use readGAlignmentPairs

## import bed file with regions of interest
bedfile <- import.bed('regions.bed')

## count number of overlaps for each element of the bedfile object
overlap.counts <- countOverlaps(bedfile,bamfile)
ADD COMMENT
1
Entering edit mode
6.2 years ago
EagleEye 7.4k

featureCounts is very simple to use and it is fast,

http://bioinf.wehi.edu.au/featureCounts/

Make your bed file look like SAF file as mentioned in above link.

ADD COMMENT
0
Entering edit mode

If you can make a Gff with your regions of interest (pretty simple and fast), I think like EagleEye => Faster solution is using FeatureCounts.

ADD REPLY
0
Entering edit mode
6.2 years ago
vakul.mohanty ▴ 260

pysam's (python module)fetch can be used to extract and count reads(or do just about anything) for a specific region in a BAM file. I find this a very flexible way to parsing BAM files.

ADD COMMENT

Login before adding your answer.

Traffic: 1531 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6