Dear community,
I am working on produced VCF files from whole exome sequencing prostate cancer patients using MuTect2. I have compared normal/tumor matched samples and found mutations. During library preparation, we used a kit for exome capture (SureSelect Human All Exon V6). Next, I wanted to perform a functional prediction of these variants using SnpEff tool (http://snpeff.sourceforge.net/index.html). The reference genome is hg19 which I used to annotate my samples. Below, I am posting a screenshot of IGV as example showing the different builds for one random reported variant.
Now that being said, what I expect when I get a report of SnpEff is that CDK11B should have a functional impact on the exon if it was annotated with GRCh37.75 because the variant comes within the exon. And this is the case of many variants. However, below I am showing you a screenshot when I annotated the sample with the two builds:
For hg19
For GRCh37.75
Inspecting in details this particular variant, I found:
For GRCh37.75
For hg19
So why is SnpEff reporting variants that comes into exons as intron variants ?
Thank you in advance.
Hello,
can you please post a complete line of the resulting annotated vcf?
If a variant is intronic or exonic depends on the transcript.If not other statet out snpEff annotate for every transcript it knows.
fin swimmer
Hello finswimmer,
yes, here's the line of the variant shown above in the IGV for GRCh37.75:
and for hg19:
Why do you expect that this variant should be exonic? If you look at Ensembl you see that this is defintily intronic or downstream: http://www.ensembl.org/Homo_sapiens/Variation/Explore?r=1:1637216-1638216;v=rs201088964;vdb=variation;vf=49274494
fin swimmer
Now I am a bit confused. I expected it to be exonic because the variant comes within the exon as shown in IGV when its being annotated with GRCh37.75. I followed your link to ensembl and indeed it does show it being an intron or downstream intron. Is it me misinterpreting the representation of the gene from the GRCh37.75 ? Pardon my question if it sounds a bit stupid but isn't the blue bar being the exon on GRCh37.75 ? If it was an intron, it would have been represented as shown on the bottom for RefSeq, right ?
These issues occur when you consider that many genes have different splice-isoforms. What is exonic in one isoform may be intronic in another.
You can either choose to always output your annotation for the 'canonical' isoform, which is considered either the longest or most expressed isoform in your tissue of interest, or you can delve into the finer details of each splice isoform (and greatly increase your workload that way).
@Kevin Blighe Thank you for the explanation. I will take your opinion into consideration next time.
@fin swimmer Thank you for your help. Actually, that variant coming in the blue bar of GRCh37.75 was an intron. I am laughing at myself now for misunderstanding :)) Everything's cleared. Thanks again !
Yes I just looked on the browsers and this particular one does appear to be intronic, definitively. However, note that snpEff has labelled it as 'modifier' due to its proximity to the nearby exon. It has thus given it HGVS coding annotation: c.1458+46T>C (46bp from coding position 1458). This is not exactly a splice-site but it is still quite near.