How to find CpGs within a certain distance from a gene of interest?
1
0
Entering edit mode
2.2 years ago
sswang25 ▴ 20

I am very new to bioinfromatics and have never handled DNA methylation data before. I have Illumina EPIC array data and I am looking for specific CpG probes that are within 2.5 megabases from the start of specific genes of interest. I am most interested in CpG probes in the promoter regions of my genes of interest. I want to create a list of these CpG probes based on their location and only use them to test for differentially methylated probes between my two groups of patients.

I have no idea how to do this, so any help would be really helpful.

CpG methylation DNA location genes • 1.8k views
ADD COMMENT
0
Entering edit mode

I managed to get the chromosome number and the start position and end position of all of my genes of interest:

enter image description here

I have the Chromosome number and MAPINFO i.e. location of my CpG sites:

enter image description here

How do I go about finding out which CpG probes are situated within the start/end positions of the genes I have identified? I can't figure out how to code this selection in R as the chromosome column has to match and the MAPINFO value I assume has to between the start and end values.

ADD REPLY
0
Entering edit mode

Create two Granges objects corresponding to your gene coordinates and the probes : Beware, first you need to transform your chromosome columns to "chr1" "chr2" ... and not "1" "2" "3" After this,

probesRanges = GRanges(seqnames=probes$CHR,ranges=IRanges(start=probes$MAPINFO,end=probes$MAPINFO+1))
genesRanges = GRanges(seqnames=genes$CHR,ranges=IRanges(start=genes$MAPINFO,end=genes$MAPINFO+1))

Then, use the findOverlaps() function from GenomicAlignments package : findOverlaps(probesRanges,genesRanges)

ADD REPLY
0
Entering edit mode
2.2 years ago
Basti ★ 2.0k

You can start by listing the coordinates of the regions you are interested in on genome assembly hg19. Then you download the Illumina EPIC annotation file from bioconductor :

library(IlluminaHumanMethylationEPICanno.ilm10b4.hg19)
data("IlluminaHumanMethylationEPICanno.ilm10b4.hg19")
data(Locations)
locEPIC=data.frame(Locations)

Finally you just have to retrieve the coordinates of the CpGs that are located within the list of coordinates of genes of interest you identified

ADD COMMENT
0
Entering edit mode

Hi thank you very much. So in essence I make a dataframe from genomic ranges of the areas of the genome I'm interested in. Using the dataframe I can get the CpG probe names in these regions from the Epic manifest?

Is there a way to find out which CpG probes are in Cpg Islands, promoter regions etc from the Epic manifest?

ADD REPLY
0
Entering edit mode

Yes this is how I would do the selection. For annotation, there are several available but I personally use it via the ChAMP package :

library(ChAMP)
data(probe.features.epic)
ADD REPLY
0
Entering edit mode

Thanks very much! I will try both these solutions out

ADD REPLY

Login before adding your answer.

Traffic: 1857 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6