[solved] SnpEff reports intron variants in exons ?
1
1
Entering edit mode
6.5 years ago
▴ 210

Dear community,

I am working on produced VCF files from whole exome sequencing prostate cancer patients using MuTect2. I have compared normal/tumor matched samples and found mutations. During library preparation, we used a kit for exome capture (SureSelect Human All Exon V6). Next, I wanted to perform a functional prediction of these variants using SnpEff tool (http://snpeff.sourceforge.net/index.html). The reference genome is hg19 which I used to annotate my samples. Below, I am posting a screenshot of IGV as example showing the different builds for one random reported variant.

<strong>10AN.vcf</strong> = Sample; <strong>Homo_sapiens.GRCh37.75.sorted.gtf</strong> = GRCh37.75 genes annotation; <strong>hg19_ucsc.gtf</strong> = hg19 UCSC genes annotation; <strong>Agilent SureSelect DNA</strong> = capture kit coverage

Now that being said, what I expect when I get a report of SnpEff is that CDK11B should have a functional impact on the exon if it was annotated with GRCh37.75 because the variant comes within the exon. And this is the case of many variants. However, below I am showing you a screenshot when I annotated the sample with the two builds:

  • For hg19 intron_variant constitute 32% for hg19 Variations for hg19

  • For GRCh37.75 intron_variant constitute 30% for GRCh37.75 Variations for GRCh37.75

Inspecting in details this particular variant, I found:

  • For GRCh37.75 intron_variant reported for GRCh37.75

  • For hg19 intron_variant reported for hg19

So why is SnpEff reporting variants that comes into exons as intron variants ?

Thank you in advance.

SnpEff SNP Annotation Exome-seq • 3.7k views
ADD COMMENT
0
Entering edit mode

Hello,

can you please post a complete line of the resulting annotated vcf?

If a variant is intronic or exonic depends on the transcript.If not other statet out snpEff annotate for every transcript it knows.

fin swimmer

ADD REPLY
0
Entering edit mode

Hello finswimmer,

yes, here's the line of the variant shown above in the IGV for GRCh37.75:

chr1    1573078 rs201088964 A   G   .   alt_allele_in_normal;panel_of_normals   DB;ECNT=1;HCNT=12;MAX_ED=.;MIN_ED=.;NLOD=7.46;TLOD=8.02;ANN=G|downstream_gene_variant|MODIFIER|MMP23B|ENSG00000189409|transcript|ENST00000356026|protein_coding||c.*3077A>G|||||3048|,G|downstream_gene_variant|MODIFIER|MMP23B|ENSG00000189409|transcript|ENST00000472264|protein_coding||c.*4579A>G|||||4579|WARNING_TRANSCRIPT_INCOMPLETE,G|downstream_gene_variant|MODIFIER|MMP23B|ENSG00000189409|transcript|ENST00000512731|nonsense_mediated_decay||c.*4222A>G|||||3228|,G|downstream_gene_variant|MODIFIER|MMP23B|ENSG00000189409|transcript|ENST00000378675|protein_coding||c.*3156A>G|||||3048|,G|downstream_gene_variant|MODIFIER|MMP23B|ENSG00000189409|transcript|ENST00000479814|protein_coding||c.*3156A>G|||||3051|WARNING_TRANSCRIPT_NO_START_CODON,G|downstream_gene_variant|MODIFIER|MMP23B|ENSG00000189409|transcript|ENST00000435358|protein_coding||c.*3156A>G|||||3134|WARNING_TRANSCRIPT_NO_START_CODON,G|downstream_gene_variant|MODIFIER|MMP23B|ENSG00000189409|transcript|ENST00000503792|protein_coding||c.*3156A>G|||||3134|WARNING_TRANSCRIPT_NO_START_CODON,G|downstream_gene_variant|MODIFIER|MMP23B|ENSG00000189409|transcript|ENST00000489782|retained_intron||n.*3098A>G|||||3098|,G|downstream_gene_variant|MODIFIER|MMP23B|ENSG00000189409|transcript|ENST00000486400|retained_intron||n.*3094A>G|||||3094|,G|downstream_gene_variant|MODIFIER|MMP23B|ENSG00000189409|transcript|ENST00000490017|nonsense_mediated_decay||c.*3077A>G|||||2439|WARNING_TRANSCRIPT_NO_START_CODON,G|intron_variant|MODIFIER|CDK11B|ENSG00000248333|transcript|ENST00000407249|protein_coding|14/20|c.1473+46T>C||||||,G|intron_variant|MODIFIER|CDK11B|ENSG00000248333|transcript|ENST00000513088|protein_coding|8/14|c.969+46T>C||||||WARNING_TRANSCRIPT_NO_START_CODON,G|intron_variant|MODIFIER|CDK11B|ENSG00000248333|transcript|ENST00000341832|protein_coding|13/19|c.1332+46T>C||||||,G|intron_variant|MODIFIER|CDK11B|ENSG00000248333|transcript|ENST00000317673|protein_coding|14/20|c.1467+46T>C||||||,G|intron_variant|MODIFIER|CDK11B|ENSG00000248333|transcript|ENST00000340677|protein_coding|13/19|c.1434+46T>C|||||| GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1  0/1:78,6:0.087:3:3:0.500:2284,161:32:46 0/0:94,11:0.103:6:5:0.455:2729,277:49:45

and for hg19:

chr1    1573078 rs201088964 A   G   .   alt_allele_in_normal;panel_of_normals   DB;ECNT=1;HCNT=12;MAX_ED=.;MIN_ED=.;NLOD=7.46;TLOD=8.02;ANN=G|downstream_gene_variant|MODIFIER|MMP23B|MMP23B|transcript|NM_006983.1|protein_coding||c.*3077A>G|||||3048|,G|downstream_gene_variant|MODIFIER|MMP23A|MMP23A|transcript|NR_002946.1|pseudogene||n.*3051A>G|||||3051|,G|intron_variant|MODIFIER|CDK11B|CDK11B|transcript|NM_001787.2|protein_coding|13/19|c.1497+46T>C||||||,G|intron_variant|MODIFIER|CDK11B|CDK11B|transcript|NM_001291345.1|protein_coding|13/19|c.1428+46T>C||||||,G|intron_variant|MODIFIER|CDK11B|CDK11B|transcript|NM_033486.2|protein_coding|13/19|c.1458+46T>C||||||,G|intron_variant|MODIFIER|CDK11B|CDK11B|transcript|NM_033487.2|protein_coding|11/17|c.696+46T>C||||||,G|intron_variant|MODIFIER|CDK11B|CDK11B|transcript|NM_033489.2|protein_coding|14/20|c.1356+46T>C||||||,G|intron_variant|MODIFIER|CDK11B|CDK11B|transcript|NM_033490.2|protein_coding|12/18|c.813+46T>C||||||    GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1  0/1:78,6:0.087:3:3:0.500:2284,161:32:46 0/0:94,11:0.103:6:5:0.455:2729,277:49:45
ADD REPLY
1
Entering edit mode

Why do you expect that this variant should be exonic? If you look at Ensembl you see that this is defintily intronic or downstream: http://www.ensembl.org/Homo_sapiens/Variation/Explore?r=1:1637216-1638216;v=rs201088964;vdb=variation;vf=49274494

fin swimmer

ADD REPLY
0
Entering edit mode

Now I am a bit confused. I expected it to be exonic because the variant comes within the exon as shown in IGV when its being annotated with GRCh37.75. I followed your link to ensembl and indeed it does show it being an intron or downstream intron. Is it me misinterpreting the representation of the gene from the GRCh37.75 ? Pardon my question if it sounds a bit stupid but isn't the blue bar being the exon on GRCh37.75 ? If it was an intron, it would have been represented as shown on the bottom for RefSeq, right ?

ADD REPLY
1
Entering edit mode

These issues occur when you consider that many genes have different splice-isoforms. What is exonic in one isoform may be intronic in another.

You can either choose to always output your annotation for the 'canonical' isoform, which is considered either the longest or most expressed isoform in your tissue of interest, or you can delve into the finer details of each splice isoform (and greatly increase your workload that way).

ADD REPLY
1
Entering edit mode

@Kevin Blighe Thank you for the explanation. I will take your opinion into consideration next time.

@fin swimmer Thank you for your help. Actually, that variant coming in the blue bar of GRCh37.75 was an intron. I am laughing at myself now for misunderstanding :)) Everything's cleared. Thanks again !

ADD REPLY
0
Entering edit mode

Yes I just looked on the browsers and this particular one does appear to be intronic, definitively. However, note that snpEff has labelled it as 'modifier' due to its proximity to the nearby exon. It has thus given it HGVS coding annotation: c.1458+46T>C (46bp from coding position 1458). This is not exactly a splice-site but it is still quite near.

ADD REPLY
0
Entering edit mode
6.5 years ago

For all the transcripts of CDK118 at that position is intronic (hg19 and b37 in IGV, equivalent of grch 37) (chr1:1,573,038-1,573,098 and hg19 genome in IGV v2.4.1)

ADD COMMENT

Login before adding your answer.

Traffic: 1928 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6