My question comes in 2 part.
First, I have a R data frame of 1.3M human SNPs (all with a RS number) and would like to:
- Update their genomic position (the position I have is based on an old hapmap reference, it would be nice to have it with Hg38 coordinates)
- Annotate their effect (as in the CONTEXT column in the GWAS catalog, including for example : 3_prime_UTR_variant 5_prime_UTR_variant, downstream_gene_variant, intergenic_variant, intron_variant, missense_variant, non_coding_transcript_exon_variant , regulatory_region_variant and a few other categories)
I looked at many R packages, including rsnps and BSgenome, but none of them was able to extract dbSNP information (only rsnps could extract their position but on a very small subset). I am aware SnpEff could (maybe, not sure) work with rsID but I would like to stick with R for that part.
Second question, I am using SnpEff on a vcf-type file of SNP list, and I would like to get the annotation on only the coding variants and for them whether if it is Synonymous or Missense/LoF for the non synonymous.
I ran SnpEff and got a shitload of extra information and it's hard to make sense of it. So I used the filters -no-downstream -no-intergenic -no-intron -no-upstream -no-utr -no EffectType (low). Fore some reason, the output still contained a lot of the stuff I wasn't interested about, apparently the filters weren't the good ones.
I anybody can help me with those questions, that would be great! I am starting to run a bit dry on the answers Google can tell me.