Question: How can I calculate bisulfite conversion rate from RRBS data?
1
gravatar for Tori
4.9 years ago by
Tori90
Finland
Tori90 wrote:

I have RRBS data and I am asked to calculate bisulfite conversion ratio. I have aligned the data using Bismark. How would I be able to calculate bisulfite conversion ratio using the aligned data.?

rrbs • 4.3k views
ADD COMMENTlink modified 4.9 years ago by Devon Ryan94k • written 4.9 years ago by Tori90

Do you sequence unmethylated phage DNA along with your sample? In our lab, that's what we use to calculate the BS conversion rate since all methylation values within the sample are unknown. 

ADD REPLYlink written 4.9 years ago by Jautis290
4
gravatar for Devon Ryan
4.9 years ago by
Devon Ryan94k
Freiburg, Germany
Devon Ryan94k wrote:

Normally you would use a PhiX spike-in to determine this. This step was likely done, but you may or may not have the resulting PhiX reads in your fastq file. This would be the first thing to check on. If you do have them, then align the reads to the PhiX genome and get the conversion efficiency from that (it'll be 1-methylation percent).

ADD COMMENTlink written 4.9 years ago by Devon Ryan94k

How can I check if fastq files have PhiX reads?

ADD REPLYlink written 4.9 years ago by Tori90

The absolute simplest method is to just ask whomever did the sequencing. You could also just map the reads to the PhiX sequence. You can find links to the sequence here: What Is The Refseq Id For Illumina Phix Control?

ADD REPLYlink written 4.9 years ago by Devon Ryan94k

I mapped the fastq files to Illumina PhiX and got the follow result. How can I know conversion efficiency from these figures?

31403068 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
753492 + 0 mapped (2.40%:-nan%)
31403068 + 0 paired in sequencing
15701534 + 0 read1
15701534 + 0 read2
0 + 0 properly paired (0.00%:-nan%)
753492 + 0 with itself and mate mapped
0 + 0 singletons (0.00%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
ADD REPLYlink written 4.9 years ago by Tori90

What tool did you use for the mapping? Many (bismark, bison, etc.) produce a summary at the end where the approximate methylation statistics are printed. Otherwise, you can use PileOMeth (it's in my github repo) to get per-CpG metrics and then awk to or R to get the average from that.

ADD REPLYlink written 4.9 years ago by Devon Ryan94k

Sorry for my stupidity; I used BWA to align the reads. After reading your comment I realize that I should have used Bismark or similar.

From the lab people I got to know that they have used 10-15% PhiX control.

Now I have aligned the redas to PhiX using Bismark. Following is the alignment summary. How do I know the conversion rate? What is logic behind mapping reads to PhiX to know conversion ratio?

Final Cytosine Methylation Report
=================================
Total number of C's analysed:    6050559

Total methylated C's in CpG context:    1289702
Total methylated C's in CHG context:    1229962
Total methylated C's in CHH context:    3491603
Total methylated C's in Unknown context:    73

Total unmethylated C's in CpG context:    8174
Total unmethylated C's in CHG context:    6845
Total unmethylated C's in CHH context:    24273
Total unmethylated C's in Unknown context:    11

C methylated in CpG context:    99.4%
C methylated in CHG context:    99.4%
C methylated in CHH context:    99.3%
C methylated in Unknown context (CN or CHN):    86.9%

 

ADD REPLYlink written 4.9 years ago by Tori90

I don't use PhiX, but does it have an expected methylation level? If so, the difference between that and your observed methylation level would be your BS conversion rate

ADD REPLYlink written 4.9 years ago by Jautis290

Hmm, so sometimes the PhiX that's spiked in is unmethylated, so you can then calculate things easily from that. Yours seems to be methylated (or the conversion failed completely, but that's unlikely), so this is apparently not useful then. In general, you want to spike in something that's not methylated (e.g. a PCR product or plasmid) and then use that to determine the conversion efficiency, since ideally it'll show ~0% methylation when mapped.

ADD REPLYlink written 4.9 years ago by Devon Ryan94k

Thanks for your comment. I got to know that I should map to Lambda genome which seems to be standard way to calculate conversion efficiency. Do you guys have experience with lambda genome?

ADD REPLYlink written 4.9 years ago by Tori90
1

D'oh, I meant lambda above instead of PhiX, mea culpa! You can process against the lambda genome (I don't have a link handy for the fasta file) the same way as I had mentioned above for PhiX. In resulting "C methylated in CpG context:" line should then indicate 100% minus the conversion efficiency (e.g., 99.2% conversion will result in 0.8% C methylated in CpG context).

ADD REPLYlink written 4.9 years ago by Devon Ryan94k

I just mapped to the phage lambda genome using fasta here http://www.ncbi.nlm.nih.gov/nuccore/J02459.

ADD REPLYlink written 3.8 years ago by Ming Tang2.5k

most of my samples have a spike-in lambda unmethylated DNA. I mapped them to lambda genome and calculated the efficiency. I do not have spike-in for one sample. I have to check the conversion efficiency for the red unmethylated C (see below) introduced at the 3’ when end-repair was done. for my bismark pipeline, trim_galore will remove this unmehtylated C if there is adaptor contamination. I read it here http://www.bioinformatics.babraham.ac.uk/projects/bismark/RRBS_Guide.pdf

[url=http://postimg.org/image/mrch0rpfv/][img]http://s33.postimg.org/mrch0rpfv/Screenshot_2016_05_23_10_14_20.jpg[/img][/url]

How do I calculate it? I need to take the fastqs and trim-off the adaptors, but not the last 2 bases at 3', map with bismark, and check how many Ts are at the end of the each read? Is there any script to do so?

Thanks, Ming

ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by Ming Tang2.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1048 users visited in the last hour