Question: SNP annotation with R and SnpEff
2
gravatar for wpierrick
21 months ago by
wpierrick80
wpierrick80 wrote:

Hello everyone,

My question comes in 2 part.

First, I have a R data frame of 1.3M human SNPs (all with a RS number) and would like to:

  • Update their genomic position (the position I have is based on an old hapmap reference, it would be nice to have it with Hg38 coordinates)
  • Annotate their effect (as in the CONTEXT column in the GWAS catalog, including for example : 3_prime_UTR_variant 5_prime_UTR_variant, downstream_gene_variant, intergenic_variant, intron_variant, missense_variant, non_coding_transcript_exon_variant , regulatory_region_variant and a few other categories)

I looked at many R packages, including rsnps and BSgenome, but none of them was able to extract dbSNP information (only rsnps could extract their position but on a very small subset). I am aware SnpEff could (maybe, not sure) work with rsID but I would like to stick with R for that part.


Second question, I am using SnpEff on a vcf-type file of SNP list, and I would like to get the annotation on only the coding variants and for them whether if it is Synonymous or Missense/LoF for the non synonymous.

I ran SnpEff and got a shitload of extra information and it's hard to make sense of it. So I used the filters -no-downstream -no-intergenic -no-intron -no-upstream -no-utr -no EffectType (low). Fore some reason, the output still contained a lot of the stuff I wasn't interested about, apparently the filters weren't the good ones.

I anybody can help me with those questions, that would be great! I am starting to run a bit dry on the answers Google can tell me.

Many thanks,

snp snpeff R • 1.6k views
ADD COMMENTlink modified 3 months ago by zx87547.9k • written 21 months ago by wpierrick80

Did you find a solution? I have to convert 8milion rs# to CHR:position, and I'm looking for a solution.

ADD REPLYlink written 11 weeks ago by bigfoot0
2
gravatar for cpad0112
21 months ago by
cpad011211k
India
cpad011211k wrote:

R code:

rsid=c("rs123","rs150")
library(biomaRt)
ensembl_snp=useMart("ENSEMBL_MART_SNP", dataset="hsapiens_snp")
getBM(attributes=c("refsnp_source",'refsnp_id','chr_name','chrom_start','chrom_end',"consequence_type_tv", "clinical_significance"), filters = 'snp_filter', values = rsid,mart = ensembl_snp)

output:

> getBM(attributes=c("refsnp_source",'refsnp_id','chr_name','chrom_start','chrom_end',"consequence_type_tv", "clinical_significance"), filters = 'snp_filter', values = rsid,mart = ensembl_snp)

 refsnp_source refsnp_id chr_name chrom_start chrom_end consequence_type_tv
1         dbSNP     rs123        7    24926827  24926827      intron_variant
2         dbSNP     rs150        7    24971033  24971033      intron_variant
  clinical_significance
1                    NA
2                    NA

coordinates seem to match with dbSNP records.

ADD COMMENTlink modified 21 months ago • written 21 months ago by cpad011211k

Thanks a lot!!! That worked well and made my day. It is weird that I have multiple entries for each SNP effect, I guess it depends on versions of the genome/multiple adds to the dbSNP database, but I'll clean that up and keep only one of the many predicted effects. Thanks a lot again.

ADD REPLYlink written 21 months ago by wpierrick80

No. It doesn't depend on the genome/multiple adds to the dbSNP database. Calculated effect is dependent on the gene and number of transcripts at the SNP position. Multiple effects for a single gene is expected if the gene has multiple transcripts and/or the position has overlapping genes. @ wpierrick

ADD REPLYlink written 11 weeks ago by cpad011211k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1727 users visited in the last hour