Named variants to Rsids or coordinates or Haplotype?
1
0
Entering edit mode
7.8 years ago
Rm 8.2k

Looking for a tool to map variants (say from VCF) to "named variants" (for example: UGT1A1*28 , DPYD*3) to variant coordinates or rsid?

Biobase PGMD supports the same but need subscription I believe similar Question: Which Databases Carry Named Gene Variants Like Apoe4

vcf variants • 2.8k views
0
Entering edit mode
7.8 years ago
Ying W ★ 4.2k

Several variant annotations tools will identify variants in genes and also include known variants (from dbSNP with rsIDs). These tools include snpeff, annovar, and VEP. The rsID is just an identifier for variant from NIH, often, rare variants won't have these identifiers.

0
Entering edit mode

@Ying: My question is more specific: Link a RSID (rs8175347 or 2:234668881-234668882) to "Named Variant" UGT1A1*28

0
Entering edit mode

Your "named variant" is not a standard notation that I am aware of.

VEP (mentioned above) will produce HGVS notations from either rsIDs or coordinates+alleles, but you have a little work to do.

Here's an example command using the VEP command line tool:

echo "rs8175347" | perl variant_effect_predictor.pl -cache -force -hgvs -symbol -o stdout -fields SYMBOL,HGVSc -pick | grep -v ##
#SYMBOL HGVSc
UGT1A6  ENST00000305139.9:c.862-6799_862-6798[7]TA


I've used --pick to choose one amongst the many alternate transcripts for the UGT1A6. Note that the HGVS notation is given relative to a specific transcript so the variant can be resolved unambiguously; giving notations against a gene name alone can give rise to difficulty comparing notations from different systems or transcript sets.

1
Entering edit mode

The [GeneName]*[Number] format that the OP refers to as "Named Variant" is the way that different alleles of a gene are designated. An allele could be a combination of specific variants/rsIDs. For example, the entry for DPYD*3 can be found here and if you click on download translation table or the entry, you will find that allele *3 is defined by having 1897delG. After some googling, this application might be able to get you the info you are looking for (since it has DPYD*3's haplotype ID and rs#s in a plain text database, however, it does not have info on UGT1A1*28). It also seems like there is a limited number of genes with "named variants", maybe its possible to just download it somehow.

0
Entering edit mode

Thanks @Ying W: Yeah I am manually doing the linking of named variants from Pharmgkb. Let me explore tool you suggested.

0
Entering edit mode

Thanks @EnsemblWill : My inputs are list of named variants like UGT1A1*28 then to need to map to rs or coordinates...

0
Entering edit mode

Ah OK, sorry misunderstood the direction of conversion.

VEP can also take HGVS as input, but this named variant does not look like an HGVS name to me. Anyone know what this convention is and how it might be parsed?