How to get RSIDs of indels from GWAS summary statistics
0
0
Entering edit mode
3.2 years ago

Hi,

I have a GWAS summary statistics file with SNPs and indels positions, and I'd like to get their rsIDs. My .txt file has the following format:

CHR        BP             A1           A2
 8        101592213       T            C
 8        106973048       A            G
 8        108690829       T            G
 8        103128181       D            I
 8        104098842       I            D
...         ...           ...        ...

where CHR is the specific chromosome, BP is the chromosome start position of variants, A1 and A2 are reference and alternative alleles. For indels, the .txt file reports D (=deletion) and I(=insertion) symbols for A1 and A2.


Once in R,I created a new dataframe by adding a new column with end positions, which are the same of the start positions. So, the dataframe has the following aspect:

    CHR     chr_start       chr_end                     A1     A2
     8        101592213     101592213                    T     C
     8        106973048     106973048                    A     G
     8        108690829     108690829                    T     G
     8        103128181     103128181                    D     I               
     8        104098842     104098842                    I     D 
 ...           ...           ...                        ...   ...

Then, I run the following commands in R:

###commands to install biomaRt
if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
BiocManager::install("biomaRt")

#loading biomaRt
library(biomaRt)

#creating mart object to search in ensembl
snp_mart <- useMart(biomart="ENSEMBL_MART_SNP", host="grch37.ensembl.org", 
                    path="/biomart/martservice", dataset="hsapiens_snp")


#creating initial storage vector
results <- c()

###loop for getting rsIDs
for (snp in 1:dim(ds1)[1]) { 
  ###ds1 is the object name of the SNP file


  temp <- getBM(attributes = c('refsnp_id', 'allele', 'chrom_start'), 
                filters = c('chr_name', 'start', 'end'), 
                values = list(ds1[snp,1], ds1[snp, 2], ds1[snp, 3]), 
                mart = snp_mart)

  ds1[snp, 4] <- snp 
  ds1[snp, 5] <- temp[snp, 1]
  # Store SNP file index row next to each answer

  results <- rbind(results, temp)

}

The output of this script is the following:

refsnp_id         allele          chrom_start
rs62513865         C/T          101592213
rs79643588         G/A          106973048
rs17396518        T/C/G         108690829
...                 ...             ...

This script seems that does not work for indels of my .txt file. I'm supposing it does not work since I created the end positions column, in which end positions are the same of start positions also for indels, and this should not be correct. However, I did it because I found as possible filters for my getBM function 'chr_name', 'start' and 'end', where 'end' is known for SNPs, since end and start positions are identical, but I have no further information about indels position, apart from start positions.


Could someone help me in finding rsIDs for indels by improving getBM function in R, or by using a different strategy?

Thank u!!!

R gene snp • 1.0k views
ADD COMMENT

Login before adding your answer.

Traffic: 2108 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6