How can I compute DNA methylation levels around splice junctions?
0
0
Entering edit mode
8.5 years ago

Hello everybody,

I am still new to the field of computational epigenetics, so I need some help with the following task(s):

I study applied bioinformatics and in the context of my master thesis, I need to compute methylation levels around splice junctions. I need to output it in a format that I have never seen before. I did some research about the format, but I couldn't find anything about it. 'The format seems to be similar to fasta, but instead of a sequence (after the header starting with >), it provides methylation levels in a tab-seperated manner, and I honestly don't know what DSQ stands for. A small part of a methylation track is given below is given below:

>chr1:142346773:142346881:+@chr1:142380702:142380810:+@chr1:142404277:142404426:+_expu=400_expd=200_bsz=20_part=0
DSQ    18.5594    18.5594    18.5594    8.22605    18.5594    31.9349    36.4521    36.4521    33.8659    18.5594    8.22605    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
>chr1:58852214:58852582:+@chr1:58878691:58878806:+@chr1:58880759:58881091:+_expu=400_expd=200_bsz=20_part=0
DSQ    0    0    0    0    0    0    0    0    4.50575    ...

This format is recognized by a newly developed flexible self-organizing map for DNA methylation analysis (or other digitized epigenetic signals). The paper describing the software is freely accesible here. Unfortunately besides a paper describing this software, the authors provide a 3-page-quick-start-manual, which doesn't tell much about this format shown above, but maybe someone here has seen this format before and can explain me the anatomy of it.

What I have done so far:

  1. I downloaded RNA-Seq runs from human spleen sample provided by NIH Roadmap Epigenomics Project. The GEO accession is GSM1010976.
  2. I used TopHat splice junction mapper in order to determine splice junctions and therfor used hg19 as reference genome.

I need to compute:

  • The methylation levels in the range -200nt/+200nt to the left/right of these splice junctions respectively
  • I need them in 20nt intervals. These DSQ values seen in the above example represent the (normalized?) methylation levels within a 20nt bin

I also found the data of whole genome BS-Seq experiment which was done for the same spleen sample. The GEO accession is GSM983652. I considered the following possibilities:

  1. If I understand correctly, the provided wig-file already contains methylation data. If that is the case, I would like to use the already existing methylation data. Is there a tool to extract methyation data out of a wig file? As I said before I need the cytosine methylation levels near splice junctions and I need them to be exported in the format shown above.
  2. If option 1 doesn't work, which tool should I use to analyse the provided BS-Seq data? And again: How can I export them in the format shown above?

I hope that somebody can help me with these tasks.

Best regards

thefirstrealace

BS-Seq bed wig DNA-methylation splice-junctions • 2.1k views
ADD COMMENT

Login before adding your answer.

Traffic: 1426 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6