How can I calculate bisulfite conversion rate from RRBS data?
1
1
Entering edit mode
6.2 years ago
Tori ▴ 90

I have RRBS data and I am asked to calculate bisulfite conversion ratio. I have aligned the data using Bismark. How would I be able to calculate bisulfite conversion ratio using the aligned data.?

RRBS • 5.1k views
0
Entering edit mode

Do you sequence unmethylated phage DNA along with your sample? In our lab, that's what we use to calculate the BS conversion rate since all methylation values within the sample are unknown.

4
Entering edit mode
6.2 years ago

Normally you would use a PhiX spike-in to determine this. This step was likely done, but you may or may not have the resulting PhiX reads in your fastq file. This would be the first thing to check on. If you do have them, then align the reads to the PhiX genome and get the conversion efficiency from that (it'll be 1-methylation percent).

0
Entering edit mode

How can I check if fastq files have PhiX reads?

0
Entering edit mode

The absolute simplest method is to just ask whomever did the sequencing. You could also just map the reads to the PhiX sequence. You can find links to the sequence here: What Is The Refseq Id For Illumina Phix Control?

0
Entering edit mode

I mapped the fastq files to Illumina PhiX and got the follow result. How can I know conversion efficiency from these figures?

31403068 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
753492 + 0 mapped (2.40%:-nan%)
31403068 + 0 paired in sequencing
0 + 0 properly paired (0.00%:-nan%)
753492 + 0 with itself and mate mapped
0 + 0 singletons (0.00%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
0
Entering edit mode

What tool did you use for the mapping? Many (bismark, bison, etc.) produce a summary at the end where the approximate methylation statistics are printed. Otherwise, you can use PileOMeth (it's in my github repo) to get per-CpG metrics and then awk to or R to get the average from that.

0
Entering edit mode

Sorry for my stupidity; I used BWA to align the reads. After reading your comment I realize that I should have used Bismark or similar.

From the lab people I got to know that they have used 10-15% PhiX control.

Now I have aligned the redas to PhiX using Bismark. Following is the alignment summary. How do I know the conversion rate? What is logic behind mapping reads to PhiX to know conversion ratio?

Final Cytosine Methylation Report
=================================
Total number of C's analysed:    6050559

Total methylated C's in CpG context:    1289702
Total methylated C's in CHG context:    1229962
Total methylated C's in CHH context:    3491603
Total methylated C's in Unknown context:    73

Total unmethylated C's in CpG context:    8174
Total unmethylated C's in CHG context:    6845
Total unmethylated C's in CHH context:    24273
Total unmethylated C's in Unknown context:    11

C methylated in CpG context:    99.4%
C methylated in CHG context:    99.4%
C methylated in CHH context:    99.3%
C methylated in Unknown context (CN or CHN):    86.9%

0
Entering edit mode

I don't use PhiX, but does it have an expected methylation level? If so, the difference between that and your observed methylation level would be your BS conversion rate

0
Entering edit mode

Hmm, so sometimes the PhiX that's spiked in is unmethylated, so you can then calculate things easily from that. Yours seems to be methylated (or the conversion failed completely, but that's unlikely), so this is apparently not useful then. In general, you want to spike in something that's not methylated (e.g. a PCR product or plasmid) and then use that to determine the conversion efficiency, since ideally it'll show ~0% methylation when mapped.

0
Entering edit mode

Thanks for your comment. I got to know that I should map to Lambda genome which seems to be standard way to calculate conversion efficiency. Do you guys have experience with lambda genome?

1
Entering edit mode

D'oh, I meant lambda above instead of PhiX, mea culpa! You can process against the lambda genome (I don't have a link handy for the fasta file) the same way as I had mentioned above for PhiX. In resulting "C methylated in CpG context:" line should then indicate 100% minus the conversion efficiency (e.g., 99.2% conversion will result in 0.8% C methylated in CpG context).

0
Entering edit mode

I just mapped to the phage lambda genome using fasta here http://www.ncbi.nlm.nih.gov/nuccore/J02459.

0
Entering edit mode

most of my samples have a spike-in lambda unmethylated DNA. I mapped them to lambda genome and calculated the efficiency. I do not have spike-in for one sample. I have to check the conversion efficiency for the red unmethylated C (see below) introduced at the 3’ when end-repair was done. for my bismark pipeline, trim_galore will remove this unmehtylated C if there is adaptor contamination. I read it here http://www.bioinformatics.babraham.ac.uk/projects/bismark/RRBS_Guide.pdf

How do I calculate it? I need to take the fastqs and trim-off the adaptors, but not the last 2 bases at 3', map with bismark, and check how many Ts are at the end of the each read? Is there any script to do so?

Thanks, Ming