Hello all :)
I have a lot of bisulphite-seq data with really good coverage across the genome, and it seems a waste not to use that data to call SNPs (since a lot of reads in WGBS are non-informative).
Problem is, since Cs are converted to Ts after bisulphite treatment, GATK will probably not only just call a lot of converted Ts as SNPs, but it will likely try realignment and all sorts of other things to properly map reads which are actually already well mapped (via bismark). I fear it could throw off GATK completely.
Why not use BisSNP - the tool for calling SNPs in bisulphite data? Well I will do that too, but it appears that BisSNP does not call indels or do unified calling like GATK does. Ideally I would like to convert all C -> T changes (het and homo) on the forward strand and ALL G -> A changes on the reverse strand, to the reference C or G, and just accept that I will never be able to detect SNPs of that kind.
Does a tool exist that will help me change all my C -> T / G -> A conversions to C/G in the BAM file? Are there other/better ways to do what I'm trying to do?
If I have to write a script that walks through the bam and updates the SEQ to match the reference at that position, I will do so and post it here. I will also post a comparison of bissnp vs GATK once its all done :)