Question: Trying To Identify Low Coverage Regions (Bases) In My Sample From A Bed And Bam File.
pfeifferr21500 wrote:

I have a bed of my targets (start and stop coordinates of each exon of my genes of interest). I have a BAM file generated from my ION PGM run. I am trying to identify any location identified in my BED file where I have less than 20x coverage so I can fill in these regions with traditional Sanger sequencing.

Basically, if any bases within the exon are covered at less than 20x, I want to know which. (I have heard 20x is a good threshold for germline variants, agreed?)

Also, if I add a descriptive section to the BED file contain “gene-exon” that could be included would be even more helpful.

I have been reading post online for days, and I am lost here. Can anybody help please?



bed coverage • 2.4k views
ADD COMMENT


ADD REPLY
dariober11k wrote:

Something along these lines, using bedtools, might help. Assuming your bam file is already sorted.

## For speed, sort the file of target regions if not already sorted:
sort -k1,1 -k2,2n -k3,3n targets.bed > targets2.bed

genomeCoverageBed -bga -ibam myreads.bam -g genome.fasta \
| intersectBed -a - -b targets2.bed -sorted \
| awk '$4 < 20 {print $0}' > lowcov.bed

lowcov.bed will be a bed file of intervals with coverage <20x.

ADD COMMENT



