Hello, I impute, annotate a vcf file and now I want to filter SNP which are exomic. How to find if an gene name (or rs ID or position) is exomic ?
Hello, I impute, annotate a vcf file and now I want to filter SNP which are exomic. How to find if an gene name (or rs ID or position) is exomic ?
How did you "annotate" it? If you ran a programme like the VEP then the information you need is in the variant consequences.
Get the coordinates of the RS or the gene name and intersect with an annotation file (GTF). Please use the search function for it, this has been asked many times before.
A: how to get intronic and intergenic sequences based on gff file?
Another answer via biomaRt:
snps <- c("rs6025", "rs424964","rs199473684")
require(biomaRt)
ensembl <- useMart("ENSEMBL_MART_SNP", dataset = "hsapiens_snp")
out <- getBM(
  attributes=c("refsnp_id", "chr_name", "chrom_start", "chrom_end",
    "allele", "mapweight", "validated", "allele_1", "minor_allele",
    "minor_allele_freq", "minor_allele_count", "clinical_significance",
    "synonym_name", "ensembl_gene_stable_id", "consequence_type_tv"),
    filters = "snp_filter",
    values = snps,
    mart=ensembl,
    uniqueRows=TRUE)
This will return a lot of information, some of which you don't need for your purpose (so, eliminate what you dont need from theattributes` parameter). You can infer an exonic rs ID in various ways, one being the final column, consequence_type_tv
unique(out[,c("refsnp_id","ensembl_gene_stable_id", "consequence_type_tv")])
     refsnp_id ensembl_gene_stable_id           consequence_type_tv
1  rs199473684        ENSG00000257529                intron_variant
2  rs199473684        ENSG00000102393           3_prime_UTR_variant
3  rs199473684        ENSG00000102393        NMD_transcript_variant
4  rs199473684        ENSG00000102393                intron_variant
5  rs199473684        ENSG00000102393 non_coding_transcript_variant
6  rs199473684                LRG_672                intron_variant
43    rs424964        ENSG00000257636 non_coding_transcript_variant
44    rs424964        ENSG00000257636                intron_variant
49      rs6025        ENSG00000198734              missense_variant
50      rs6025                LRG_553              missense_variant
See here, also: A: How to retrieve Gene name from SNP ID using biomaRt
Kevin
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thank you for all your proposals, I will analyse them.