Question: Annovar - RNA level variant positions
0
gravatar for jacobsen.jeremy
4.9 years ago by
United States
jacobsen.jeremy40 wrote:

I have run Annovar on GATK output after inserting a row for the end locus (following Annovar prepare input file tutorial). The script I am using is: annotate_variation.pl -out gatk -build hg19 example/gatkfile humandb/ -dbtype knownGene so that I can get UCSC transcript annotations.

For simple insertions/deletions I can pull out a protein sequence from hg_19knownPep to see if the variant position information (for instance G952A) is correct.  I wrote code to do this for all non-synonymous SNVs and the Annovar annotations are correct for all of them.

On the other hand, this is not the case when I look at the RNA level. For instance, take the Annovar entry:

frameshift substitution    NBPF8:uc031pny.1:exon2:c.116_116delinsGAA,    chr1    144615250    144615250    G    GAA

When I get the RNA sequence from the Annovar HG19 reference file for uc031pny.1 I notice the following which is causing me confusion:

 

1) G is at chr1:144615250 in IGV forward strand (check)

 

2) But when I get the mRNA sequence from Annovar's KnownGeneMrna, the nucleotide at position 116 is T.  This is

pretty consistent for all the substitutions and deletions in the output.  I think I'm misinterpreting something but I'm not sure what.  Any help would be excellent.

 

Thanks,

Jeremy

 

snp rna-seq annovar • 1.5k views
ADD COMMENTlink modified 4.9 years ago • written 4.9 years ago by jacobsen.jeremy40
0
gravatar for Stoploss25
4.9 years ago by
Stoploss2510
United States
Stoploss2510 wrote:

It may be that 116 is the position from the start of the exon (in this case exon 2), not the start of the transcript. 

ADD COMMENTlink written 4.9 years ago by Stoploss2510
0
gravatar for jacobsen.jeremy
4.9 years ago by
United States
jacobsen.jeremy40 wrote:

So I took a close look at another example:

line23941    frameshift substitution    CDK18:uc009xbm.1:exon6:c.428_430G,    chr1    205495889    205495891    GCT    G.  Since it is 6 exons out, I calculated the offset caused by the first 5 exons in the transcript.  Values were taken from Annovar's "knownGene" reference

205492753 205492610 143
205493485 205493359 126
205494323 205494266 57
205495307 205495192 115
205495589 205495494 95
     
  total offset

536

This means that the position from the start of the transcript should be 536+428 = 964 (if these are from the exon start).  The actual position from the start is 565 according to "knownGeneMrna".  Also, according to IGV, the deletion is between exons 7 and 8 (not 6).

 

ADD COMMENTlink modified 4.9 years ago • written 4.9 years ago by jacobsen.jeremy40
0
gravatar for jacobsen.jeremy
4.9 years ago by
United States
jacobsen.jeremy40 wrote:

It turns out that UTRs are included in the rna sequence assigned to the ucsc accession number.  The 428 in c.428_430G is the 428th nucleotide from the start of the CDS.

ADD COMMENTlink written 4.9 years ago by jacobsen.jeremy40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1563 users visited in the last hour