I am attempting to insert observed variant modifications from Annovar, into protein sequences that I have retrieved from the UCSC file knownGeneTxPep. Variant positions from the start of a transcript were retrieved from Annovar. Here is my question:
When I make a mapping from peptide id (say uc010nwy.3) to transcript ID (say NM_0010757) using "kgXref" there is not a 1:1 mapping. There are more peptide ids than transcript ids, meaning multiple tsids map to a peptide ID. This confounds what I am trying to do because I don't know which peptide sequence to alter when annovar says that a variant was observed in transcript X.
I'm not certain I'm using the correct files for the task and I've been unable to find any documentation. Any help would be great.
This seems unnecessary complicated to me. Shouldn't Annovar tell you what the protein change caused by your variant is? What exactly is the information you have and what is the information you want?
I've narrowed things down a little. The problem seems to be with the Annovar entries that have more than one transcript associated with a variant. For instance, this entry seems to be correct:
By correct I mean that when I use kgxref to get the uniprot ID that corresponds to
NM_022828, there is a L at position 1253.
On the other hand, when there is more than one refSeq id in the annovar output (variants affects multiple transcripts)... for instance:
Now I use kgxref to pull out the protein sequence associated with
NM_001199291and there is no L at position 131, but rather a T.