RRBS fastq - biased per base sequence content
14 months ago
I have 3 batches of raw RRBS fastq files. When I ran FASTQC for quality check, I found the per base sequence content of fastq files in one of the batches were different.

batch 1 batch 2

Having high quality bases in all samples, A:T and C:G ratios are almost identical on left, while more A's and C's than the complement bases on the right.

Even though C's were converted to T's by bisulfite reaction, I think the base composition between complement bases should be equal because reads were produced by PCR amplification still generating C's from G's in a template. But I'm not sure I'm understanding correctly.

Anyone has an idea of the typical base composition in RRBS data? Thank you!

