Question: getNearestGene in R package FDb.InfiniumMethylation.hg19
0
gravatar for jz6002
4.6 years ago by
jz600210
Canada
jz600210 wrote:

I am using R package FDb.InfiniumMethylation.hg19 try to get the related gene information to some cpg sites. But I found a problem. For example, I have 3 cpg sites: cg10631684, cg13611636, cg15448829. I try to find the nearest gene to them (or which gene they are in). So I used following R code:

library(FDb.InfiniumMethylation.hg19)
hm450=get450k()
cpg = c("cg10631684", "cg13611636","cg15448829")
probes=hm450[cpg]
getNearestGene(probes)

I can get position informtion from object probes

cg10631684 starts at 113795897, ends at 113795898 (chr12).
cg13611636 starts at 40753851, ends at 40753852 (chr8).
cg15448829 starts at 119634634, ends at 119634635 (chr8).

For getNearestGene(probes), I can have

                 queryHits  subjectHits distance    nearestGeneSymbol
cg10631684  1   6463    472 PLBD2
cg13611636  11  21341   0   ZMAT4
cg15448829  10  16004   0   SAMD12-AS1

So cg10631684 is not in any gene, the nearest gene to it is PLBD2. cg13611636 is in the gene ZMAT4. ,cg15448829 is in the gene SAMD12-AS1.

Then, I did something different. I downloaded all the gene information from biomart(http://grch37.ensembl.org/biomart/martview/) create a bed file. I also created a bed file for these 3 cpg sites. I used command intersectBed try to see if these two cpg sites overlapping with any genes. I found:

cg10631684 is in gene SLC8B1. cg13611636 is in the gene ZMAT4. cg15448829 is not in any gene.

So only 1 cpg site has the same result. Other 2 are not consistent. I checked SLC8B1 starts at 113797298 ends at 113736564 (chr12). It seems cg10631684 should be in gene SLC8B1. For gene SAMD12-AS1, I cannot find it in biomart.

So my questions here are:

  1. Is my understanding of function getNearestGene wrong? How come it considers cg10631684 is not in gene SLC8B1?
  2. The biomart doesn't contain all the gene? (I am using hg19, so I used GRCH37)

Thank you.

R • 1.8k views
ADD COMMENTlink written 4.6 years ago by jz600210

A couple things:

  1. You may be using different annotation databases. What annotation does getNearestGene use?
  2. I would try and use the bedtools closest function a try and see what results you obtain with this. Bedtools intersect doesn't seem to be the appropriate method here.
  3. View your regions in IGV, this may answer a couple of questions.
ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by Sinji3.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1975 users visited in the last hour