Question: How to convert SNP genome positions to variant identifiers and genome annotations
gravatar for Tim
4.4 years ago by
Tim0 wrote:

Hi Biostars,

I would like to learn how to convert the genome positions (e.g., Chr6: 467841) into other useful identifiers and annotations. For example, I use the vcftools to get only SNPs in a ".012" format, which also outputs the site locations (i.e., genome positions) in a ".012.pos" file. I use the following command:

vcftools --vcf xxx.vcf --out SNP --remove-indels --012

Basically, it creates "SNP.012" that only contain 0,1,2 values and "SNP.012.pos" that contains the site location like:

Chr1    2673
Chr1    2695
Chr1    2696 

I would like to match these site locations (i.e., genome positions) to variant identifiers to genome annotations. I have some success in loading a gff3 file (e.g., NCBI genome annotation downloaded) and doing left/right joins in R. But it seems somewhat ad hoc. I tried to use Bioconductor packages (GenomicRanges, GenomicFeatures, biomaRt) but I couldn't find efficient/fast/best practices. FYI, I prefer working in R/Bioconductor.




snp vcftools genome • 1.6k views
ADD COMMENTlink modified 4.4 years ago by harold.smith.tarheel4.5k • written 4.4 years ago by Tim0
gravatar for harold.smith.tarheel
4.4 years ago by
United States
harold.smith.tarheel4.5k wrote:

Why not use one of the available variant annotation tools, like Annovar or SnpEff, with the original VCF? Those provide information relative to known features, and have the additional advantage of mutation classification (synonymous, missense, nonsense, splicing) in coding sequences (impossible from your SNP.pos, which lacks the nucleotide change). You can always filter the output for only SNPs.

ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by harold.smith.tarheel4.5k

I had to analyze the genotype matrix ("012" format) in R and find out "important" SNPs. I simply feel like there must be a straightforward way of going from the site location (genome position) to variant identifiers, gene id, and/or known annotations. In other words, if there is a list of site locations (like Chr1 2673), what's the best way of getting annotations from RefSeq, Ensembl, and such (downloaded in gff3 or gtf formats, or accessing via any API)? Any help would be appreciated!

Thanks for great suggestions. I look more into Annovar and SnpEff.

ADD REPLYlink modified 5 months ago by RamRS27k • written 4.4 years ago by Tim0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1992 users visited in the last hour