Hi, I'm a bioinformatics intern and I have a data set that shows deferentially methylated DNA genes. I have the index number, gene name, level of methylation...etc.
I have also been provided the transcription start site (TSS) (a co-ordinate), the strand (+ or -), CPG ID (cpg island ID) and something called 'Dist2CPG' which I believe is the distance to CPG island.
I don't know if this dist2cpg means it is the distance between the CPG and the TSS or the distance of the methylation from the CPG island. I would be grateful if someone could answer this and explain its significance.
Thank you for your help h.mon!
Do you know if I would be able to find out whether the site is upstream or downstream from the CpG island? I have been given a column titled 'strand' which tells me whether it is the plus or minus strand. Do I need to use that information or do I always assume its downstream of that CpG island?