Hi Biostars,
I have multiple murine whole genome sequencing samples which show a non-uniform readcount distribution along the genome, all following the same pattern, example attached. Has anyone be facing this pattern in readcount distribution before and may have an idea what could be causative? (i dont see patterns in standard QC parameters)
Thanks for any input on this!
Update on GC bias:
GC content was proposed as a reason, so I plotted mean GC in reference genome along with the readcount distribution. There is no clear correlation, maybe a mild high GC - low coverage, but it’s very subtle. Thus, I guess GC bias is not the (major) cause of the fluctuating coverage. Does anyone have a different idea?
I had a quick look at overall GC content, which looked ok on a first look, but I'll dive deeper into GC, thanks for the hint!