Question: How to link UCSC peptide to transcripts
0
gravatar for jacobsen.jeremy
4.4 years ago by
United States
jacobsen.jeremy40 wrote:

I am attempting to insert observed variant modifications from Annovar, into protein sequences that I have retrieved from the UCSC file knownGeneTxPep.  Variant positions from the start of a transcript were retrieved from Annovar.  Here is my question:

When I make a mapping from peptide id (say uc010nwy.3) to transcript ID (say NM_0010757) using "kgXref" there is not a 1:1 mapping.  There are more peptide ids than transcript ids, meaning multiple tsids map to a peptide ID.  This confounds what I am trying to do because I don't know which peptide sequence to alter when annovar says that a variant was observed in transcript X.

I'm not certain I'm using the correct files for the task and I've been unable to find any documentation.  Any help would be great.

Thanks,

Jeremy

ucsc snp rna-seq forum • 1.3k views
ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by jacobsen.jeremy40

This seems unnecessary complicated to me.  Shouldn't Annovar tell you what the protein change caused by your variant is? What exactly is the information you have and what is the information you want?

ADD REPLYlink written 4.4 years ago by Bert Overduin3.6k

I've narrowed things down a little.  The problem seems to be with the Annovar entries that have more than one transcript associated with a variant.  For instance, this entry seems to be correct:

line64679    nonsynonymous SNV    YTHDC2:NM_022828:exon26:c.C3757G:p.L1253V,    5    112920108    112920108    C    G

By correct I mean that when I use kgxref to get the uniprot ID that corresponds to NM_022828, there is a L at position 1253.  

 

On the other hand, when there is more than one refSeq id in the annovar output (variants affects multiple transcripts)... for instance:

line64929    nonsynonymous SNV    HSD17B4:NM_001199291:exon7:c.T392A:p.L131Q,HSD17B4:NM_000414:exon6:c.G317A:p.R106H,HSD17B4:NM_001199292:exon5:c.G263A:p.R88H,    5    118811533    118811533    G    A

Now I use kgxref to pull out the protein sequence associated with NM_001199291 and there is no L at position 131, but rather a T.

 

ADD REPLYlink written 4.4 years ago by jacobsen.jeremy40
0
gravatar for jacobsen.jeremy
4.4 years ago by
United States
jacobsen.jeremy40 wrote:

Hmm

 

ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by jacobsen.jeremy40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1233 users visited in the last hour