Hello all, I am processing my sequences from an BS/oxBS (bisulfite, oxidative bisulfite) sequencing runs, and observed some amount of contamination from short (~60bp) reads. I suspect that these are the spike-in sequences, because their sizes are also 60bp. I added these as a control during library prep to estimate how the oxidation, conversion went. I would like to remove these reads before alignment. The problem is, these spike-in reads are also bisulfite converted, at various locations and levels.
For example: The SQ6hmC spike in is: TACGATCACGGCGAATCCGATCGAATCAGTCAAGCGCTTTACGAAGTGCGACAGCCTTAG Within this, some Cs are unmethylated, some methylated (5mC), and some hydroxymethylated (5hmC). After BS reaction, all unmethylated Cs will be converted to Ts. After oxBS reaction, all unmethylated C AND 5hmC are converted to Ts.
I've attached the pic here for all spike-in sequences. Green=5hmC; Red=5mC; Grey=C.
What would be the best way to go about removing these spike-in reads? Thank you!