Question: GC content in bilsulfite converted library
4.1 years ago by
I am confused about bisulfite converted library GC content.

Fastqc per base sequence content looks like this:

(%G has decreased and %A has increased, compared to reference genome).
1. Shouldn't %C have decreased instead of %G?

Bismark reports that >90% C's in CHG and CHH were methylated, however people from the wet lab say that in this organism only CpG methylation is possible.

2. Seeing such result (methylation in CHH and CHG) can we speculate that something went bad with bisulfite conversion?

3. Bismark/BS Seeker2 maps only those reads that have non-converted Cs (this is way we get high CH methylation percentage). What can be the reason that reads with converted Cs don't map?  

It looks as if reads have been (reverse) complemented.

This is what we think too. If this is the data we got (Ion torrent) is it possible that something got messed up in base calling stage?

I have no experience with Ion Torrent but I don't see why base calling should complemented. Are you sure this is a "standard" bisulfite library?

4.1 years ago by
Devon Ryan86k
Freiburg, Germany
That's really really strange. In all of my datasets the C percentage falls toward 0, causing T to jump to near 50%. Is this some sort of targeted BS-seq dataset? Did you run other samples at the same time and did they produce similar results?

In general, the C and T percentages should be pushed away from 25% by the bisulfite conversion and G and A percentages should still be around 25%. Not seeing that (and also seeing >90% CHH methylation when that's not expected) suggests pretty strongly to me that something either went very wrong during bisulfite conversion or the reads were treated in a very strange way prior to running fastQC. If you can confirm that no one monkeyed with the reads then I would suggest being very hesitant in trusting this dataset.

