I am analyzing somatic mutations and I ran the two annotation programs (annovar and oncotator).
I annotated my mutations with gene-based annotation in Annovar which uses ref-seq gene annotation.
From this annotation tools, I was able to separate my mutations into nonsynonymous and the others for downstream analysis. In order to get more information, I ran another annotation tool called oncotator.
After running it, I could also annotate my mutation with variant classification (missense,nonsense,etc..)
However, there are some mismatches in my mutations as below.
As you can see, MUC3A gene was annotated with nonsynonymous using ref-seq gene in Annovar while it was IGR(integergenic?) annotated with oncotator.
MUC3A U01 7 100550909 C A IGR nonsynonymous
Aside from this, there are also some mismatches. (
stopgain in Annovar but
nonsense_mutation in oncotator)
I am so confused and tried to figure out why these mismatches happen and finally I thought that the reasons was that the oncotator utilizes GENCODE (Version 19 - July 2013 freeze, GRCh37 - Ensembl 74) as a reference transcript set while annotation that I used in Annovar is ref-seq. I think that this mismatched references resulted in my results.
In this regard, can I use just only one annotation information? Or adjust one of them to avoid mismatches?
variant_classification (missense, nonsense) is totally different from annovar's gene-based, so is it fine to use them both even if there are some mismatches ?
One more question!
My reads from exome seq were aligned to hg19 reference. I think all the programs I used is based on hg19 reference and the only different things is their databases. Is it necessary to always check their databases to avoid any mismatches for next analysis? It is too complicated.