Question: [solved] SnpEff reports intron variants in exons ?
1
gravatar for badredda
14 months ago by
badredda50
badredda50 wrote:

Dear community,

I am working on produced VCF files from whole exome sequencing prostate cancer patients using MuTect2. I have compared normal/tumor matched samples and found mutations. During library preparation, we used a kit for exome capture (SureSelect Human All Exon V6). Next, I wanted to perform a functional prediction of these variants using SnpEff tool (http://snpeff.sourceforge.net/index.html). The reference genome is hg19 which I used to annotate my samples. Below, I am posting a screenshot of IGV as example showing the different builds for one random reported variant.

<strong>10AN.vcf</strong> = Sample; <strong>Homo_sapiens.GRCh37.75.sorted.gtf</strong> = GRCh37.75 genes annotation; <strong>hg19_ucsc.gtf</strong> = hg19 UCSC genes annotation; <strong>Agilent SureSelect DNA</strong> = capture kit coverage

Now that being said, what I expect when I get a report of SnpEff is that CDK11B should have a functional impact on the exon if it was annotated with GRCh37.75 because the variant comes within the exon. And this is the case of many variants. However, below I am showing you a screenshot when I annotated the sample with the two builds:

  • For hg19 intron_variant constitute 32% for hg19 Variations for hg19

  • For GRCh37.75 intron_variant constitute 30% for GRCh37.75 Variations for GRCh37.75

Inspecting in details this particular variant, I found:

  • For GRCh37.75 intron_variant reported for GRCh37.75

  • For hg19 intron_variant reported for hg19

So why is SnpEff reporting variants that comes into exons as intron variants ?

Thank you in advance.

snp exome-seq snpeff annotation • 852 views
ADD COMMENTlink modified 14 months ago by cpad011210k • written 14 months ago by badredda50

Hello,

can you please post a complete line of the resulting annotated vcf?

If a variant is intronic or exonic depends on the transcript.If not other statet out snpEff annotate for every transcript it knows.

fin swimmer

ADD REPLYlink written 14 months ago by finswimmer7.9k

Hello finswimmer,

yes, here's the line of the variant shown above in the IGV for GRCh37.75:

chr1    1573078 rs201088964 A   G   .   alt_allele_in_normal;panel_of_normals   DB;ECNT=1;HCNT=12;MAX_ED=.;MIN_ED=.;NLOD=7.46;TLOD=8.02;ANN=G|downstream_gene_variant|MODIFIER|MMP23B|ENSG00000189409|transcript|ENST00000356026|protein_coding||c.*3077A>G|||||3048|,G|downstream_gene_variant|MODIFIER|MMP23B|ENSG00000189409|transcript|ENST00000472264|protein_coding||c.*4579A>G|||||4579|WARNING_TRANSCRIPT_INCOMPLETE,G|downstream_gene_variant|MODIFIER|MMP23B|ENSG00000189409|transcript|ENST00000512731|nonsense_mediated_decay||c.*4222A>G|||||3228|,G|downstream_gene_variant|MODIFIER|MMP23B|ENSG00000189409|transcript|ENST00000378675|protein_coding||c.*3156A>G|||||3048|,G|downstream_gene_variant|MODIFIER|MMP23B|ENSG00000189409|transcript|ENST00000479814|protein_coding||c.*3156A>G|||||3051|WARNING_TRANSCRIPT_NO_START_CODON,G|downstream_gene_variant|MODIFIER|MMP23B|ENSG00000189409|transcript|ENST00000435358|protein_coding||c.*3156A>G|||||3134|WARNING_TRANSCRIPT_NO_START_CODON,G|downstream_gene_variant|MODIFIER|MMP23B|ENSG00000189409|transcript|ENST00000503792|protein_coding||c.*3156A>G|||||3134|WARNING_TRANSCRIPT_NO_START_CODON,G|downstream_gene_variant|MODIFIER|MMP23B|ENSG00000189409|transcript|ENST00000489782|retained_intron||n.*3098A>G|||||3098|,G|downstream_gene_variant|MODIFIER|MMP23B|ENSG00000189409|transcript|ENST00000486400|retained_intron||n.*3094A>G|||||3094|,G|downstream_gene_variant|MODIFIER|MMP23B|ENSG00000189409|transcript|ENST00000490017|nonsense_mediated_decay||c.*3077A>G|||||2439|WARNING_TRANSCRIPT_NO_START_CODON,G|intron_variant|MODIFIER|CDK11B|ENSG00000248333|transcript|ENST00000407249|protein_coding|14/20|c.1473+46T>C||||||,G|intron_variant|MODIFIER|CDK11B|ENSG00000248333|transcript|ENST00000513088|protein_coding|8/14|c.969+46T>C||||||WARNING_TRANSCRIPT_NO_START_CODON,G|intron_variant|MODIFIER|CDK11B|ENSG00000248333|transcript|ENST00000341832|protein_coding|13/19|c.1332+46T>C||||||,G|intron_variant|MODIFIER|CDK11B|ENSG00000248333|transcript|ENST00000317673|protein_coding|14/20|c.1467+46T>C||||||,G|intron_variant|MODIFIER|CDK11B|ENSG00000248333|transcript|ENST00000340677|protein_coding|13/19|c.1434+46T>C|||||| GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1  0/1:78,6:0.087:3:3:0.500:2284,161:32:46 0/0:94,11:0.103:6:5:0.455:2729,277:49:45

and for hg19:

chr1    1573078 rs201088964 A   G   .   alt_allele_in_normal;panel_of_normals   DB;ECNT=1;HCNT=12;MAX_ED=.;MIN_ED=.;NLOD=7.46;TLOD=8.02;ANN=G|downstream_gene_variant|MODIFIER|MMP23B|MMP23B|transcript|NM_006983.1|protein_coding||c.*3077A>G|||||3048|,G|downstream_gene_variant|MODIFIER|MMP23A|MMP23A|transcript|NR_002946.1|pseudogene||n.*3051A>G|||||3051|,G|intron_variant|MODIFIER|CDK11B|CDK11B|transcript|NM_001787.2|protein_coding|13/19|c.1497+46T>C||||||,G|intron_variant|MODIFIER|CDK11B|CDK11B|transcript|NM_001291345.1|protein_coding|13/19|c.1428+46T>C||||||,G|intron_variant|MODIFIER|CDK11B|CDK11B|transcript|NM_033486.2|protein_coding|13/19|c.1458+46T>C||||||,G|intron_variant|MODIFIER|CDK11B|CDK11B|transcript|NM_033487.2|protein_coding|11/17|c.696+46T>C||||||,G|intron_variant|MODIFIER|CDK11B|CDK11B|transcript|NM_033489.2|protein_coding|14/20|c.1356+46T>C||||||,G|intron_variant|MODIFIER|CDK11B|CDK11B|transcript|NM_033490.2|protein_coding|12/18|c.813+46T>C||||||    GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1  0/1:78,6:0.087:3:3:0.500:2284,161:32:46 0/0:94,11:0.103:6:5:0.455:2729,277:49:45
ADD REPLYlink modified 14 months ago • written 14 months ago by badredda50
1

Why do you expect that this variant should be exonic? If you look at Ensembl you see that this is defintily intronic or downstream: http://www.ensembl.org/Homo_sapiens/Variation/Explore?r=1:1637216-1638216;v=rs201088964;vdb=variation;vf=49274494

fin swimmer

ADD REPLYlink written 14 months ago by finswimmer7.9k

Now I am a bit confused. I expected it to be exonic because the variant comes within the exon as shown in IGV when its being annotated with GRCh37.75. I followed your link to ensembl and indeed it does show it being an intron or downstream intron. Is it me misinterpreting the representation of the gene from the GRCh37.75 ? Pardon my question if it sounds a bit stupid but isn't the blue bar being the exon on GRCh37.75 ? If it was an intron, it would have been represented as shown on the bottom for RefSeq, right ?

ADD REPLYlink written 14 months ago by badredda50
1

These issues occur when you consider that many genes have different splice-isoforms. What is exonic in one isoform may be intronic in another.

You can either choose to always output your annotation for the 'canonical' isoform, which is considered either the longest or most expressed isoform in your tissue of interest, or you can delve into the finer details of each splice isoform (and greatly increase your workload that way).

ADD REPLYlink written 14 months ago by Kevin Blighe33k
1

@Kevin Blighe Thank you for the explanation. I will take your opinion into consideration next time.

@fin swimmer Thank you for your help. Actually, that variant coming in the blue bar of GRCh37.75 was an intron. I am laughing at myself now for misunderstanding :)) Everything's cleared. Thanks again !

ADD REPLYlink written 14 months ago by badredda50

Yes I just looked on the browsers and this particular one does appear to be intronic, definitively. However, note that snpEff has labelled it as 'modifier' due to its proximity to the nearby exon. It has thus given it HGVS coding annotation: c.1458+46T>C (46bp from coding position 1458). This is not exactly a splice-site but it is still quite near.

ADD REPLYlink written 14 months ago by Kevin Blighe33k
0
gravatar for cpad0112
14 months ago by
cpad011210k
India
cpad011210k wrote:

For all the transcripts of CDK118 at that position is intronic (hg19 and b37 in IGV, equivalent of grch 37) (chr1:1,573,038-1,573,098 and hg19 genome in IGV v2.4.1)

ADD COMMENTlink written 14 months ago by cpad011210k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1253 users visited in the last hour