Question: GC Bias in Sequence Capture - Analysis & Quantification
gravatar for jnowacki
24 months ago by
jnowacki70 wrote:

I have a bed file & alignment file (bam). I'd like to know if the reads contained inside the bam are biased towards GC rich regions.


Picard CollectGcBiasMetrics does a great job with whole genome sequencing but doesn't take bed files as input.

Picard HsMetrics can tell me GC % & coverage per region but not all regions are the same size.

I thought about taking the original bedfile and creating a new bed file of sliding 100 base pair windows with bedtools makewindows and then analyzing with Picard HsMetrics. The problem with that is if I use a window of 100 bp and step of 10 bp then large capture regions will have more windows than small capture regions.

100 bp region ----> 1 window of 100 base pairs, 1x coverage in new bed file
1000 bp region ----> 90 windows of 100 base pairs. 10x the size but 90x as many windows.

If I pad things out then there will be windows that have 0 depth of coverage by the bam file and these will be clustered around areas with few base pairs (small capture regions).

ADD COMMENTlink modified 23 months ago by Biostar ♦♦ 20 • written 24 months ago by jnowacki70

Hello jnowacki ,

could you please explain what you hope to find out and why the output of HsMetrics isn't suitable for your needs?


fin swimmer

ADD REPLYlink written 24 months ago by finswimmer14k

I'm trying to detect GC bias. The problem with the normal HsMetrics output is not all of the bed regions are the same size so any binning/graphing will heavily weight the results towards whatever GC content is contained in average of the smaller regions.

ADD REPLYlink written 24 months ago by jnowacki70
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1393 users visited in the last hour