Question: How to analyze methylation around splice junctions
gravatar for thefirstrealace
4.3 years ago by
thefirstrealace30 wrote:


i have the following problem. I need to compute and visualize cytosine methylation levels around splice junctions. I already used TopHat to determine splice junctions. Therefor i used RNA-Seq reads (obtained from spleen cell) downloaded at Roadmap Epigenomics Project  and the hg19 as reference genome. I also found the methylation data obtained from the same spleen sample at this homepage. They are provided raw reads (SRR files) and as processed data in bed and wig format. I would prefer to extract the processed methylation data out of the bed/wig file instead of using the raw data, so i would like to know:

  1. Is there any good tool to extract the methylation data out of the bed/wig file and map it to the regions +200 nt and -200 nt left and right of the splice junctions  previously computed with TopHat?
  2. Is there a good tool for visualization?
  3. Which tool can i use for statistical analysis? Maybe R Package(s)?

I hope someone can help me here :)

Best regards

ADD COMMENTlink modified 4.3 years ago by Alex Reynolds29k • written 4.3 years ago by thefirstrealace30

Hi, can you share your workflow for this problem if you have figure it out? I want to do a very similar thing.

ADD REPLYlink modified 9 weeks ago by RamRS25k • written 4.1 years ago by lrh0070


the workflow actually turns out to be very simple, but depending on your hardware it might be very time-consuming. I did the following steps:

  1. Download RNA-Seq data in SRA-format from GEO (i took data from spleen)
  2. Extract reads with the SRA toolkit (take care of library type!!! single-end or paired-end)
  3. Download reference genome from UCSC as 2bit version (in my case human genome hg19)
  4. Convert reference genome to FASTA format with twoBitToFa (available at UCSC)
  5. Index the reference genome with SAMtools
  6. Map the RNA-Seq reads to the reference with TopHat (it uses the SAM index for that)
  7. Download BS-Seq data from the very same sample (if possible) from GEO
  8. If methylation data is given in WIG format, convert to BED first
  9. Now you can use BEDOPS to map methylation data to TopHat's splice junctions like described by Alex Reynolds below
  10. Use R or something else to extract data, do statistical analysis ...

In addition to step 9 and 10 i also implemented a python script (using BioPython) to extract additional data at the splice junctions, like GC and CpG ratio, FPKM (obtained with a tool called Cufflinks) and some more properties

ADD REPLYlink modified 9 weeks ago by RamRS25k • written 4.1 years ago by thefirstrealace30
gravatar for Alex Reynolds
4.3 years ago by
Alex Reynolds29k
Seattle, WA USA
Alex Reynolds29k wrote:

You could perhaps use BEDOPS to do some of this.

If you only have Wiggle data, you could use convert2bed (wig2bed) to convert Wiggle to BED:

$ wig2bed < methylation.wig > methylation.bed

You could use bedops --range to expand the range of sorted, BED-formatted splice junctions, and then pipe this to bedmap to map methylation signal to the padded junctions:

$ bedops --range 200 --everything junctions.bed \
    | bedmap --echo --echo-map-score --delim '\t' - methylation.bed \
    > padded_junctions_with_signals_from_overlapping_methylation_regions.bed

Once you have signal data, you could read this into R with read.table() or similar to calculate mean, variance, etc. per padded junction. You could use R to make histograms of signal or summary statistics over all junctions, etc.

ADD COMMENTlink modified 9 weeks ago by RamRS25k • written 4.3 years ago by Alex Reynolds29k

Thank you so much, that sounds like a very good plan. I will try BEDOPS immediately.

For the subsequent visualization with R, there is one more thing that im not 100% sure about. I need to digitize the methylation levels in 20 bp bins around the splice junctions (within range +/- 200 bp). Is the methylation level within a bin defined as [methylated C's] / [methylated C's + unmethylated C's] ?

ADD REPLYlink written 4.3 years ago by thefirstrealace30

Hi Alex, can you specify how to split the methylation data in each region into bins and plot them in R? It would be greatly useful to me. Thanks.

ADD REPLYlink modified 9 weeks ago by RamRS25k • written 4.1 years ago by lrh0070
gravatar for Devon Ryan
4.3 years ago by
Devon Ryan94k
Freiburg, Germany
Devon Ryan94k wrote:

Since you didn't get a reply the last time you posted more or less the same thing the likely answer is that you'll have to do a bit of coding.

1. You might be able to use deepTools for this by converting to bigWig. Feeding the splice sites as an appropriate BED file into computeMatrix should allow plotting the profile, which is what I presume you want.

2. If (1) doesn't suffice for you you'll have to code together the extraction part and plot things in R.

3. You need to define "statistical analysis" before anyone can help you. There are too many possibilities for what that could mean.

ADD COMMENTlink written 4.3 years ago by Devon Ryan94k

Yes that is absolutely true, question 3 was a little bit unclear in this context. What i meant was the "normal" statistic calculations like mean, variance etc...

Nevertheless i thank you very much for helping me. For now, i will try the solution from Alex Reynolds (the post above) since i already installed BEDOPS on my PC. At the moment it seems to be the easiest way to solve my task.

Best regards

ADD REPLYlink written 4.3 years ago by thefirstrealace30
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1462 users visited in the last hour