i am still new to the field of computational epigenetics, so i need some help with the following task(s):
I study applied bioinformatics and in the context of my master thesis, i need to compute methylation levels around splice junctions. I need to output it in a format that i have never seen before. I did some research about the format, but i couldn't find anything about it. 'The format seems to be similar to fasta, but instead of a sequence (after the header starting with ">"), it provides methylation levels in a tab-seperated manner, and i honestly don't know what DSQ stands for. A small part of a methylation track is given below is given below:
DSQ 18.5594 18.5594 18.5594 8.22605 18.5594 31.9349 36.4521 36.4521 33.8659 18.5594 8.22605 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
DSQ 0 0 0 0 0 0 0 0 4.50575 ...
This format is recognized by a newly developed flexible self-organizing map for DNA methylation analysis (or other digitized epigenetic signals). The paper describing the software is freely accesible here. Unfortunately besides a paper describing this software, the authors provide a 3-page-quick-start-manual, which doesn't tell much about this format shown above, but maybe someone here has seen this format before and can explain me the anatomy of it.
What i have done so far:
- I downloaded RNA-Seq runs from human spleen sample provided by NIH Roadmap Epigenomics Project. The GEO accesion is GSM1010976.
- I used TopHat splice junction mapper in order to determine splice junctions and therfor used hg19 as reference genome.
I need to compute:
- The methylation levels in the range -200nt/+200nt to the left/right of these splice junctions respectively
- I need them in 20nt intervals. These DSQ values seen in the above example represent the (normalized?) methylation levels within a 20nt bin
I also found the data of whole genome BS-Seq experiment which was done for the same spleen sample. The GEO accesion is GSM983652. I considered the following possibilites:
- If i understand correctly, the provided wig-file already contains methylation data. If that is the case, i would like to use the already existing methylation data. Is there a tool to extract methyation data out of a wig file? As i said before i need the cytosine methylation levels near splice junctions and i need them to be exported in the format shown above.
- If option 1 doesn't work, which tool should i use to analyse the provided BS-Seq data? And again: How can i export them in the format shown above?
I hope that somebody can help me with these tasks.