Question: GC Bias in Sequence Capture - Analysis & Quantification
0
gravatar for jnowacki
15 months ago by
jnowacki70
jnowacki70 wrote:

QUESTION
I have a bed file & alignment file (bam). I'd like to know if the reads contained inside the bam are biased towards GC rich regions.

RESEARCH

Picard CollectGcBiasMetrics does a great job with whole genome sequencing but doesn't take bed files as input.

Picard HsMetrics can tell me GC % & coverage per region but not all regions are the same size.

I thought about taking the original bedfile and creating a new bed file of sliding 100 base pair windows with bedtools makewindows and then analyzing with Picard HsMetrics. The problem with that is if I use a window of 100 bp and step of 10 bp then large capture regions will have more windows than small capture regions.

For example: ORIGINAL BED FILE ----> NEW BED FILE
100 bp region ----> 1 window of 100 base pairs, 1x coverage in new bed file
1000 bp region ----> 90 windows of 100 base pairs. 10x the size but 90x as many windows.

If I pad things out then there will be windows that have 0 depth of coverage by the bam file and these will be clustered around areas with few base pairs (small capture regions).

ADD COMMENTlink modified 14 months ago by Biostar ♦♦ 20 • written 15 months ago by jnowacki70

Hello jnowacki ,

could you please explain what you hope to find out and why the output of HsMetrics isn't suitable for your needs?

Thanks.

fin swimmer

ADD REPLYlink written 15 months ago by finswimmer13k

I'm trying to detect GC bias. The problem with the normal HsMetrics output is not all of the bed regions are the same size so any binning/graphing will heavily weight the results towards whatever GC content is contained in average of the smaller regions.

ADD REPLYlink written 15 months ago by jnowacki70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 838 users visited in the last hour