Calculate the percentage of genomic region covered from the BED file.
0
0
Entering edit mode
9 months ago
adarsh_pp ▴ 40

Hello,

I have multiple exome capture kit bed files. I need to know if there is any method to calculate how much portion/ percentage of the region in a gene, each capture kit covers. Thereby, I can compare these files based on specific genes. I have visualized these bed files in IGV and it is visually covering the gene of interest. But numerically is there any possibility to find?

Thank you

NGS sequencing genomics exome genes • 807 views
ADD COMMENT
1
Entering edit mode

Aside from the programs/answers linked in @Pierre's answer mosdepth (LINK) is the fastest way to do this.

Note: Are you asking if the BED file covers what portion of each gene? i.e. you are not asking above coverage from BAM alignments?

ADD REPLY
0
Entering edit mode

Not from the BAM files, but from BED files. Even I had the same question myself. Because this coverage from BED was asked to me by another person and I was not able to get an answer.

ADD REPLY
1
Entering edit mode

You would need to do some custom coding to figure that out.

ADD REPLY
1
Entering edit mode

I have multiple exome capture kit bed files. coverage from BED

These are just interval files, you can't get coverage from these.

You can diff them and annotate the intervals, but I don't think this kind of comparison would give you insightful results. Will you be comparing the probe intervals? Because most of the target files of the bed files are just the exon intervals. You can add padding to the probe intervals and compare them, but you can't really know which probes work better without the sequencing data.

ADD REPLY
0
Entering edit mode

I could be off on what you're trying to accomplish, but it sounds like bedtools should be able to do this.

If you, like your questions asks, you want to calculate the % genomic region covered by a bed file, you can use bedtools annotate for this. You provide the regions you want to know the % covered and then supply the files that will be "covering" these regions.

Maybe:

bedtools annotate -i genes.bed -files exome_kit1.bed exome_kit2.bed exome_kit3.bed

Expected output (roughly)

chr start end name exome_kit1 exome_kit2 exome_kit3
chr5 100 200 gene1 0.9 0.9 0.1
chr5 300 400 gene2 0.1 0.1 0.9

In this hypothetical, exome_kits 1 and 2 cover gene 1 90% while Kit3 doesn't have good overlap, but situation is reversed for gene2.

ADD REPLY

Login before adding your answer.

Traffic: 1743 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6