Question: SNP annotation with R and SnpEff
0
gravatar for wpierrick
16 months ago by
wpierrick60
wpierrick60 wrote:

Hello everyone, My question comes in 2 part.

First, I have a R data frame of 1.3M human SNPs (all with a RS number) and would like to:

  • Update their genomic position (the position I have is based on an old hapmap reference, it would be nice to have it with Hg38 coordinates)
  • Annotate their effect (as in the CONTEXT column in the GWAS catalog, including for example : 3_prime_UTR_variant 5_prime_UTR_variant, downstream_gene_variant, intergenic_variant, intron_variant, missense_variant, non_coding_transcript_exon_variant , regulatory_region_variant and a few other categories)

I looked at many R packages, including rsnps and BSgenome, but none of them was able to extract dbSNP information (only rsnps could extract their position but on a very small subset). I am aware SnpEff could (maybe, not sure) work with rsID but I would like to stick with R for that part.


Second question, I am using SnpEff on a vcf-type file of SNP list, and I would like to get the annotation on only the coding variants and for them whether if it is Synonymous or Missense/LoF for the non synonymous.

I ran SnpEff and got a shitload of extra information and it's hard to make sense of it. So I used the filters -no-downstream -no-intergenic -no-intron -no-upstream -no-utr -no EffectType (low). Fore some reason, the output still contained a lot of the stuff I wasn't interested about, apparently the filters weren't the good ones.

I anybody can help me with those questions, that would be great! I am starting to run a bit dry on the answers Google can tell me.

Many thanks,

snp • 1.2k views
ADD COMMENTlink written 16 months ago by wpierrick60
1

R code:

rsid=c("rs123","rs150")
library(biomaRt)
ensembl_snp=useMart("ENSEMBL_MART_SNP", dataset="hsapiens_snp")
getBM(attributes=c("refsnp_source",'refsnp_id','chr_name','chrom_start','chrom_end',"consequence_type_tv", "clinical_significance"), filters = 'snp_filter', values = rsid,mart = ensembl_snp)

output:

> getBM(attributes=c("refsnp_source",'refsnp_id','chr_name','chrom_start','chrom_end',"consequence_type_tv", "clinical_significance"), filters = 'snp_filter', values = rsid,mart = ensembl_snp)

 refsnp_source refsnp_id chr_name chrom_start chrom_end consequence_type_tv
1         dbSNP     rs123        7    24926827  24926827      intron_variant
2         dbSNP     rs150        7    24971033  24971033      intron_variant
  clinical_significance
1                    NA
2                    NA

coordinates seem to match with dbSNP records.

ADD REPLYlink modified 16 months ago • written 16 months ago by cpad011211k

Thanks a lot!!! That worked well and made my day. It is weird that I have multiple entries for each SNP effect, I guess it depends on versions of the genome/multiple adds to the dbSNP database, but I'll clean that up and keep only one of the many predicted effects. Thanks a lot again.

ADD REPLYlink written 16 months ago by wpierrick60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 793 users visited in the last hour