Question: [solved] SnpEff reports intron variants in exons ?
gravatar for 乙
3.4 years ago by
180 wrote:

Dear community,

I am working on produced VCF files from whole exome sequencing prostate cancer patients using MuTect2. I have compared normal/tumor matched samples and found mutations. During library preparation, we used a kit for exome capture (SureSelect Human All Exon V6). Next, I wanted to perform a functional prediction of these variants using SnpEff tool ( The reference genome is hg19 which I used to annotate my samples. Below, I am posting a screenshot of IGV as example showing the different builds for one random reported variant.

<strong>10AN.vcf</strong> = Sample; <strong>Homo_sapiens.GRCh37.75.sorted.gtf</strong> = GRCh37.75 genes annotation; <strong>hg19_ucsc.gtf</strong> = hg19 UCSC genes annotation; <strong>Agilent SureSelect DNA</strong> = capture kit coverage

Now that being said, what I expect when I get a report of SnpEff is that CDK11B should have a functional impact on the exon if it was annotated with GRCh37.75 because the variant comes within the exon. And this is the case of many variants. However, below I am showing you a screenshot when I annotated the sample with the two builds:

  • For hg19 intron_variant constitute 32% for hg19 Variations for hg19

  • For GRCh37.75 intron_variant constitute 30% for GRCh37.75 Variations for GRCh37.75

Inspecting in details this particular variant, I found:

  • For GRCh37.75 intron_variant reported for GRCh37.75

  • For hg19 intron_variant reported for hg19

So why is SnpEff reporting variants that comes into exons as intron variants ?

Thank you in advance.

snp exome-seq snpeff annotation • 2.2k views
ADD COMMENTlink modified 3.4 years ago by cpad011215k • written 3.4 years ago by 180


can you please post a complete line of the resulting annotated vcf?

If a variant is intronic or exonic depends on the transcript.If not other statet out snpEff annotate for every transcript it knows.

fin swimmer

ADD REPLYlink written 3.4 years ago by finswimmer14k

Hello finswimmer,

yes, here's the line of the variant shown above in the IGV for GRCh37.75:

chr1    1573078 rs201088964 A   G   .   alt_allele_in_normal;panel_of_normals   DB;ECNT=1;HCNT=12;MAX_ED=.;MIN_ED=.;NLOD=7.46;TLOD=8.02;ANN=G|downstream_gene_variant|MODIFIER|MMP23B|ENSG00000189409|transcript|ENST00000356026|protein_coding||c.*3077A>G|||||3048|,G|downstream_gene_variant|MODIFIER|MMP23B|ENSG00000189409|transcript|ENST00000472264|protein_coding||c.*4579A>G|||||4579|WARNING_TRANSCRIPT_INCOMPLETE,G|downstream_gene_variant|MODIFIER|MMP23B|ENSG00000189409|transcript|ENST00000512731|nonsense_mediated_decay||c.*4222A>G|||||3228|,G|downstream_gene_variant|MODIFIER|MMP23B|ENSG00000189409|transcript|ENST00000378675|protein_coding||c.*3156A>G|||||3048|,G|downstream_gene_variant|MODIFIER|MMP23B|ENSG00000189409|transcript|ENST00000479814|protein_coding||c.*3156A>G|||||3051|WARNING_TRANSCRIPT_NO_START_CODON,G|downstream_gene_variant|MODIFIER|MMP23B|ENSG00000189409|transcript|ENST00000435358|protein_coding||c.*3156A>G|||||3134|WARNING_TRANSCRIPT_NO_START_CODON,G|downstream_gene_variant|MODIFIER|MMP23B|ENSG00000189409|transcript|ENST00000503792|protein_coding||c.*3156A>G|||||3134|WARNING_TRANSCRIPT_NO_START_CODON,G|downstream_gene_variant|MODIFIER|MMP23B|ENSG00000189409|transcript|ENST00000489782|retained_intron||n.*3098A>G|||||3098|,G|downstream_gene_variant|MODIFIER|MMP23B|ENSG00000189409|transcript|ENST00000486400|retained_intron||n.*3094A>G|||||3094|,G|downstream_gene_variant|MODIFIER|MMP23B|ENSG00000189409|transcript|ENST00000490017|nonsense_mediated_decay||c.*3077A>G|||||2439|WARNING_TRANSCRIPT_NO_START_CODON,G|intron_variant|MODIFIER|CDK11B|ENSG00000248333|transcript|ENST00000407249|protein_coding|14/20|c.1473+46T>C||||||,G|intron_variant|MODIFIER|CDK11B|ENSG00000248333|transcript|ENST00000513088|protein_coding|8/14|c.969+46T>C||||||WARNING_TRANSCRIPT_NO_START_CODON,G|intron_variant|MODIFIER|CDK11B|ENSG00000248333|transcript|ENST00000341832|protein_coding|13/19|c.1332+46T>C||||||,G|intron_variant|MODIFIER|CDK11B|ENSG00000248333|transcript|ENST00000317673|protein_coding|14/20|c.1467+46T>C||||||,G|intron_variant|MODIFIER|CDK11B|ENSG00000248333|transcript|ENST00000340677|protein_coding|13/19|c.1434+46T>C|||||| GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1  0/1:78,6:0.087:3:3:0.500:2284,161:32:46 0/0:94,11:0.103:6:5:0.455:2729,277:49:45

and for hg19:

chr1    1573078 rs201088964 A   G   .   alt_allele_in_normal;panel_of_normals   DB;ECNT=1;HCNT=12;MAX_ED=.;MIN_ED=.;NLOD=7.46;TLOD=8.02;ANN=G|downstream_gene_variant|MODIFIER|MMP23B|MMP23B|transcript|NM_006983.1|protein_coding||c.*3077A>G|||||3048|,G|downstream_gene_variant|MODIFIER|MMP23A|MMP23A|transcript|NR_002946.1|pseudogene||n.*3051A>G|||||3051|,G|intron_variant|MODIFIER|CDK11B|CDK11B|transcript|NM_001787.2|protein_coding|13/19|c.1497+46T>C||||||,G|intron_variant|MODIFIER|CDK11B|CDK11B|transcript|NM_001291345.1|protein_coding|13/19|c.1428+46T>C||||||,G|intron_variant|MODIFIER|CDK11B|CDK11B|transcript|NM_033486.2|protein_coding|13/19|c.1458+46T>C||||||,G|intron_variant|MODIFIER|CDK11B|CDK11B|transcript|NM_033487.2|protein_coding|11/17|c.696+46T>C||||||,G|intron_variant|MODIFIER|CDK11B|CDK11B|transcript|NM_033489.2|protein_coding|14/20|c.1356+46T>C||||||,G|intron_variant|MODIFIER|CDK11B|CDK11B|transcript|NM_033490.2|protein_coding|12/18|c.813+46T>C||||||    GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1  0/1:78,6:0.087:3:3:0.500:2284,161:32:46 0/0:94,11:0.103:6:5:0.455:2729,277:49:45
ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by 180

Why do you expect that this variant should be exonic? If you look at Ensembl you see that this is defintily intronic or downstream:;v=rs201088964;vdb=variation;vf=49274494

fin swimmer

ADD REPLYlink written 3.4 years ago by finswimmer14k

Now I am a bit confused. I expected it to be exonic because the variant comes within the exon as shown in IGV when its being annotated with GRCh37.75. I followed your link to ensembl and indeed it does show it being an intron or downstream intron. Is it me misinterpreting the representation of the gene from the GRCh37.75 ? Pardon my question if it sounds a bit stupid but isn't the blue bar being the exon on GRCh37.75 ? If it was an intron, it would have been represented as shown on the bottom for RefSeq, right ?

ADD REPLYlink written 3.4 years ago by 180

These issues occur when you consider that many genes have different splice-isoforms. What is exonic in one isoform may be intronic in another.

You can either choose to always output your annotation for the 'canonical' isoform, which is considered either the longest or most expressed isoform in your tissue of interest, or you can delve into the finer details of each splice isoform (and greatly increase your workload that way).

ADD REPLYlink written 3.4 years ago by Kevin Blighe71k

@Kevin Blighe Thank you for the explanation. I will take your opinion into consideration next time.

@fin swimmer Thank you for your help. Actually, that variant coming in the blue bar of GRCh37.75 was an intron. I am laughing at myself now for misunderstanding :)) Everything's cleared. Thanks again !

ADD REPLYlink written 3.4 years ago by 180

Yes I just looked on the browsers and this particular one does appear to be intronic, definitively. However, note that snpEff has labelled it as 'modifier' due to its proximity to the nearby exon. It has thus given it HGVS coding annotation: c.1458+46T>C (46bp from coding position 1458). This is not exactly a splice-site but it is still quite near.

ADD REPLYlink written 3.4 years ago by Kevin Blighe71k
gravatar for cpad0112
3.4 years ago by
Hyderabad India
cpad011215k wrote:

For all the transcripts of CDK118 at that position is intronic (hg19 and b37 in IGV, equivalent of grch 37) (chr1:1,573,038-1,573,098 and hg19 genome in IGV v2.4.1)

ADD COMMENTlink written 3.4 years ago by cpad011215k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2188 users visited in the last hour