Trying To Identify Low Coverage Regions (Bases) In My Sample From A Bed And Bam File.
1
0
Entering edit mode
10.1 years ago

I have a bed of my targets (start and stop coordinates of each exon of my genes of interest). I have a BAM file generated from my ION PGM run. I am trying to identify any location identified in my BED file where I have less than 20x coverage so I can fill in these regions with traditional Sanger sequencing.

Basically, if any bases within the exon are covered at less than 20x, I want to know which. (I have heard 20x is a good threshold for germline variants, agreed?)

Also, if I add a descriptive section to the BED file contain “gene-exon” that could be included would be even more helpful.

I have been reading post online for days, and I am lost here. Can anybody help please?

Thanks

Pfeifferr

coverage bed • 3.1k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
2
Entering edit mode
10.1 years ago

Something along these lines, using bedtools, might help. Assuming your bam file is already sorted.

## For speed, sort the file of target regions if not already sorted:
sort -k1,1 -k2,2n -k3,3n targets.bed > targets2.bed

genomeCoverageBed -bga -ibam myreads.bam -g genome.fasta \
| intersectBed -a - -b targets2.bed -sorted \
| awk '$4 < 20 {print $0}' > lowcov.bed

lowcov.bed will be a bed file of intervals with coverage <20x.

ADD COMMENT

Login before adding your answer.

Traffic: 2258 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6