Question: How to snps map gene in R (without R packages)
1
gravatar for mmh7272
12 months ago by
mmh727250
mmh727250 wrote:

Hi I am a beginner bioinformatic i am searching "how to snp mapping gene?" there are so many R packages for example (i.e, biomaRT, SNPlist .. ) but i have a snplist and gene list I wanna SNPs mapping gene list using R so i wanna get result file

Snp list:

Chr POS 
1 2
1 3
2 4
2 7
3 5
3 13
4 13

-gene list-

Chr start end gene id
1 2 3 a
2 3 5 b
2 6 10 c
3 4 10 d
3 11 17 e
4 14 17 f

I wanna get like below result

Chr POS gene id
1 2 a
1 3 a
2 4 b
2 7 c
3 5 d
3 13 e
4 13 Na
snp R gene • 448 views
ADD COMMENTlink modified 12 months ago by zx87549.1k • written 12 months ago by mmh727250
3
gravatar for Kevin Blighe
12 months ago by
Kevin Blighe56k
Kevin Blighe56k wrote:

Hey, on forums like this, it is usually a good idea to provide code that can produce your input data. Then, people will have less time to spend in tackling your problem.

To do what you want, you should use GenomicRanges:

create your data

SNPs <- data.frame(
  Chr = c(1,1,2,2,3,3,4),
  POS = c(2,3,4,7,5,13,13))
SNPs
  Chr POS
1   1   2
2   1   3
3   2   4
4   2   7
5   3   5
6   3  13
7   4  13

genes <- data.frame(
  Chr = c(1,2,2,3,3,4),
  start = c(2,3,6,4,11,14),
  end = c(3,5,10,10,17,17),
  id = c('a','b','c','d','e','f'))
genes
  Chr start end id
1   1     2   3  a
2   2     3   5  b
3   2     6  10  c
4   3     4  10  d
5   3    11  17  e
6   4    14  17  f

convert the data to GRanges objects

require(GenomicRanges)
grSNPs <- makeGRangesFromDataFrame(
  SNPs,
  seqnames.field = 'Chr',
  start.field = 'POS',
  end.field = 'POS',
  keep.extra.columns = TRUE)
grgenes <- makeGRangesFromDataFrame(
  genes,
  seqnames.field = 'Chr',
  start.field = 'start',
  end.field = 'end',
  keep.extra.columns = TRUE)

find overlaps and save the indices that overlapped

qHits <- queryHits(findOverlaps(query = grSNPs, subject = grgenes, type = 'within'))
subHits <- subjectHits(findOverlaps(query = grSNPs, subject = grgenes, type = 'within'))

use indices to produce the original data for the overlaps

overlaps <- data.frame(SNPs[qHits,], genes[subHits,'id'])
colnames(overlaps) <- c('Chr','POS','id')

now find those that didn't overlap

missing <- SNPs[-qHits,]
nmissing <- nrow(missing)
missing <- data.frame(missing, rep(NA, nmissing))
colnames(missing) <- c('Chr','POS','id')

rbind overlapping regions and non-overlapping

finalresult <- rbind(overlaps, missing)
finalresult
  Chr POS   id
1   1   2    a
2   1   3    a
3   2   4    b
4   2   7    c
5   3   5    d
6   3  13    e
7   4  13 <NA>
ADD COMMENTlink modified 12 months ago • written 12 months ago by Kevin Blighe56k
2
gravatar for zx8754
12 months ago by
zx87549.1k
London
zx87549.1k wrote:

Using data.table:

# example data (partly stolen from Kevin's answer)
SNPs <- data.table(
  Chr = c(1,1,2,2,3,3,4),
  start = c(2,3,4,7,5,13,13),
  end = c(2,3,4,7,5,13,13),
  key = c("Chr", "start", "end"))

genes <- data.table(
  Chr = c(1,2,2,3,3,4),
  start = c(2,3,6,4,11,14),
  end = c(3,5,10,10,17,17),
  id = c('a','b','c','d','e','f'),
  key = c("Chr", "start", "end"))

library(data.table)

foverlaps(SNPs, genes)
#    Chr start end   id i.start i.end
# 1:   1     2   3    a       2     2
# 2:   1     2   3    a       3     3
# 3:   2     3   5    b       4     4
# 4:   2     6  10    c       7     7
# 5:   3     4  10    d       5     5
# 6:   3    11  17    e      13    13
# 7:   4    NA  NA <NA>      13    13
ADD COMMENTlink written 12 months ago by zx87549.1k

thank you very~ much for ur answer!

ADD REPLYlink written 12 months ago by mmh727250
1

Please up-vote any answers that have helped.

ADD REPLYlink written 12 months ago by Kevin Blighe56k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1778 users visited in the last hour