Question: Filter Bam File Based On Coverage
gravatar for Abhi
6.7 years ago by
United States
Abhi1.5k wrote:

Hi Guys

Is there a tool out there that will filter read-pairs if any pair maps to a location with coverage < N.

I can in theory run the samtools pileup find the positions at which coverage is < N and read the bam file again and exclude the reads that are mapped to positions in my low coverage locations per chromosome.

This can be memory intensive in some cases where coverage is low and I am not sure if I find a read in low coverage region how do I make sure to exclude its mate.

Thanks! -Abhi

bam samtools sam • 4.4k views
ADD COMMENTlink written 6.7 years ago by Abhi1.5k

you could use samtools depth tool and then create a bed from the output. This bed could be used to filter your bam. There must be an easier way?

ADD REPLYlink written 6.7 years ago by Zev.Kronenberg11k

What is the workflow that requires this manipulation--just curious?

ADD REPLYlink written 6.7 years ago by Sean Davis25k

just want to get rid of any mapping artifact and noise. We expect regions of interest to have high coverage. This is actually the same transcript start,end data points and I would like to remove reads in a region with < N coverage. May be I should also think about physical coverage.

ADD REPLYlink written 6.7 years ago by Abhi1.5k

Abhi, How do yo get read a bam file and exclude the reads that are mapped to positions with low coverage?

ADD REPLYlink written 15 months ago by cobalym70

first, you have to detect those low coverage regions. then, you have to filter your bam file with those regions. I would suggest you open a new question rather than commenting a previous related one.

ADD REPLYlink written 15 months ago by Jorge Amigo11k
gravatar for Jorge Amigo
6.7 years ago by
Jorge Amigo11k
Santiago de Compostela, Spain
Jorge Amigo11k wrote:

I think that the idea behind this question is to generate a reduced BAM file based on coverage, leaving only the reads and regions which would actually be useful for any kind of downstream analysis Abhi may want to perform. I can only foresee 2 major set of applications for this idea, which in my opinion shouldn't be addressed this way: a) you want simplify a later coverage calculation, or b) you want to work only on regions with a coverage above certain threshold. in any of these cases, the usual proceeding is to deal directly with the entire BAM, setting filters/thresholds for the analysis to be performed on them. the tools that were designed to deal with BAM files are indeed optimized to perform these filters/thresholds when needed.

I could only understand performing such extra work if your intention is to perform a later intensive work on that BAM file, which just being significantly reduced on size would represent an interesting save of disk usage, hence you'll get reduced timings. if you still want to go for such filtering process, the easiest thing I can think of would be a first pass trying to generate a bed file with the regions of coverage above the desired threshold (bedtools' coverageBed should do the work, and it'll also be very fast), and then a second pass filtering the BAM file with those regions (samtools should do the work, and again that should be very fast indeed).

ADD COMMENTlink written 6.7 years ago by Jorge Amigo11k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2099 users visited in the last hour