I have a bed file & alignment file (bam). I'd like to know if the reads contained inside the bam are biased towards GC rich regions.
Picard CollectGcBiasMetrics does a great job with whole genome sequencing but doesn't take bed files as input.
Picard HsMetrics can tell me GC % & coverage per region but not all regions are the same size.
I thought about taking the original bedfile and creating a new bed file of sliding 100 base pair windows with bedtools makewindows and then analyzing with Picard HsMetrics. The problem with that is if I use a window of 100 bp and step of 10 bp then large capture regions will have more windows than small capture regions.
ORIGINAL BED FILE ----> NEW BED FILE
100 bp region ----> 1 window of 100 base pairs, 1x coverage in new bed file
1000 bp region ----> 90 windows of 100 base pairs. 10x the size but 90x as many windows.
If I pad things out then there will be windows that have 0 depth of coverage by the bam file and these will be clustered around areas with few base pairs (small capture regions).