Question: snpEff inappropriate interpretation of variants?
1
gravatar for moushengxu
15 months ago by
moushengxu350
moushengxu350 wrote:

I have RNA-seq datasets, and I applied GATK variant calling procedure for RNA-seq (https://software.broadinstitute.org/gatk/documentation/article.php?id=3891). I have not found anything wrong with it yet. However, when I took the .vcf file output from GATK and used snpEff with GRCh 37.75 to assess the impact of variants, it seems either the genomic coordinates were not handled correctly or the variant impact assessment mechanism was off. For instance, one variant output from GATK was documented like the following:

chr19   7975586 .   C   T   275.78  .   AC=2;AF=1.00;AN=2;DP=9;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=30.64;SOR=4.615 GT:AD:DP:GQ:PL  1/1:0,9:9:26:304,26,0

This denotes the variant as T/T at chr19:7975586, which corresponds to codon GAT (D or Asp) in an exon of MAP2K7. The reference allele is C/C, which forms the codon GAC coding for Asp as well. In other words, this variant does not change the underlying amino acid, it's a synonymous variant, and the impact of this variant should be very low if any?

However, snpEff generated the following results:

chr19   7975586 .   C   T   275.78  .   AC=2;AF=1.00;AN=2;DP=9;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=30.64;SOR=4.615;ANN=T|structural_interaction_variant|HIGH|MAP2K7|ENSG00000076984|interaction|2DYL:A_166-A_207:ENST00000397983|protein_coding|7/12|c.621C>T||||||,T|synonymous_variant|LOW|MAP2K7|ENSG00000076984|transcript|ENST00000545011|protein_coding|6/11|c.699C>T|p.Asp233Asp|764/3497|699/1386|233/461||,T|synonymous_variant|LOW|MAP2K7|ENSG00000076984|transcript|ENST00000397981|protein_coding|6/11|c.573C>T|p.Asp191Asp|675/3430|573/1281|191/426||,T|synonymous_variant|LOW|MAP2K7|ENSG00000076984|transcript|ENST00000397983|protein_coding|7/12|c.621C>T|p.Asp207Asp|681/3396|621/1308|207/435||,T|synonymous_variant|LOW|MAP2K7|ENSG00000076984|transcript|ENST00000397979|protein_coding|6/11|c.573C>T|p.Asp191Asp|627/3361|573/1260|191/419||,T|upstream_gene_variant|MODIFIER|MAP2K7|ENSG00000076984|transcript|ENST00000498118|retained_intron||n.-398C>T|||||398|,T|downstream_gene_variant|MODIFIER|CTD-3193O13.13|ENSG00000268149|transcript|ENST00000595655|antisense||n.*1947G>A|||||1947|,T|downstream_gene_variant|MODIFIER|MAP2K7|ENSG00000076984|transcript|ENST00000475022|retained_intron||n.*3635C>T|||||3635|,T|non_coding_transcript_exon_variant|MODIFIER|MAP2K7|ENSG00000076984|transcript|ENST00000468058|retained_intron|5/10|n.768C>T||||||,T|non_coding_transcript_exon_variant|MODIFIER|MAP2K7|ENSG00000076984|transcript|ENST00000494348|retained_intron|2/3|n.496C>T||||||,T|non_coding_transcript_exon_variant|MODIFIER|MAP2K7|ENSG00000076984|transcript|ENST00000465324|retained_intron|2/2|n.365C>T||||||   GT:AD:DP:GQ:PL  1/1:0,9:9:26:304,26,0

snpEff annotated this variant as structural_interaction_variant with HIGH impact. How and why? As said above, shouldn't this simply be a meaningless variant? Is GRCh 37.75 the right match for hg19?

Another example is like the following:

GATK output --

chr16   3070708 .   TG  T   173.77  .   AC=2;AF=1.00;AN=2;DP=10;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=24.82;SOR=2.584    GT:AD:DP:GQ:PL  1/1:0,7:7:21:211,21,0

snpEff output --

 chr16  3070708 .   TG  T   173.77  .   AC=2;AF=1.00;AN=2;DP=10;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=24.82;SOR=2.584;ANN=T|frameshift_variant|HIGH|TNFRSF12A|ENSG00000006327|transcript|ENST00000575124|protein_coding|1/3|c.318delG|p.Ser107fs|341/1133|318/681|106/226||INFO_REALIGN_3_PRIME,T|upstream_gene_variant|MODIFIER|CLDN6|ENSG00000184697|transcript|ENST00000328796|protein_coding||c.-4687delC|||||2521|,T|upstream_gene_variant|MODIFIER|THOC6|ENSG00000131652|transcript|ENST00000326266|protein_coding||c.-3619delG|||||3323|,T|upstream_gene_variant|MODIFIER| ... (omitted)

The question is: this is a variant (indel) located either in an intronic region or in the 5' UTR of a TNFRSF12A transcript, how can this be interpreted as a frameshift variant?

Thanks for any suggestions!

ADD COMMENTlink modified 15 months ago by rbagnall1.3k • written 15 months ago by moushengxu350
1

the 1st prediction, formatted:

T  structural_interaction_variant      HIGH      MAP2K7          ENSG00000076984  interaction  2DYL:A_166-A_207:ENST00000397983  protein_coding   7/12        c.621C>T
T  synonymous_variant                  LOW       MAP2K7          ENSG00000076984  transcript   ENST00000545011                   protein_coding   6/11        c.699C>T  p.Asp233Asp  764/3497  699/1386  233/461
T  synonymous_variant                  LOW       MAP2K7          ENSG00000076984  transcript   ENST00000397981                   protein_coding   6/11        c.573C>T  p.Asp191Asp  675/3430  573/1281  191/426
T  synonymous_variant                  LOW       MAP2K7          ENSG00000076984  transcript   ENST00000397983                   protein_coding   7/12        c.621C>T  p.Asp207Asp  681/3396  621/1308  207/435
T  synonymous_variant                  LOW       MAP2K7          ENSG00000076984  transcript   ENST00000397979                   protein_coding   6/11        c.573C>T  p.Asp191Asp  627/3361  573/1260  191/419
T  upstream_gene_variant               MODIFIER  MAP2K7          ENSG00000076984  transcript   ENST00000498118                   retained_intron  n.-398C>T   398
T  downstream_gene_variant             MODIFIER  CTD-3193O13.13  ENSG00000268149  transcript   ENST00000595655                   antisense        n.*1947G>A  1947
T  downstream_gene_variant             MODIFIER  MAP2K7          ENSG00000076984  transcript   ENST00000475022                   retained_intron  n.*3635C>T  3635
T  non_coding_transcript_exon_variant  MODIFIER  MAP2K7          ENSG00000076984  transcript   ENST00000468058                   retained_intron  5/10        n.768C>T
T  non_coding_transcript_exon_variant  MODIFIER  MAP2K7          ENSG00000076984  transcript   ENST00000494348                   retained_intron  2/3         n.496C>T
T  non_coding_transcript_exon_variant  MODIFIER  MAP2K7          ENSG00000076984  transcript   ENST00000465324                   retained_intron  2/2         n.365C>T
ADD REPLYlink written 15 months ago by Pierre Lindenbaum114k
1

Really nicely formatted. Amazing. How did you achieve that?

ADD REPLYlink written 15 months ago by moushengxu350
1

you might like vcf2table : http://lindenb.github.io/jvarkit/VcfToTable.html

ADD REPLYlink written 15 months ago by Pierre Lindenbaum114k
 echo " AC=2;AF=(...)T00000465324|retained_intron|2/2|n.365C>T|||||| " | tr ";" "\n" | grep '^ANN=' | cut -c 5- | tr ',' '\n' | tr "|" "\t" | column -t
ADD REPLYlink written 15 months ago by Pierre Lindenbaum114k

what's your version of SNPEFF ?

ADD REPLYlink written 15 months ago by Pierre Lindenbaum114k

SnpEff version SnpEff 4.3p (build 2017-06-06 09:55), by Pablo Cingolani

Looks the most recent?

ADD REPLYlink written 15 months ago by moushengxu350
2
gravatar for rbagnall
15 months ago by
rbagnall1.3k
Australia
rbagnall1.3k wrote:

In the second example, the ensEMBL transcript ENST00000575124 has a protein coding region that extends into the intronic regions of other transcripts of this gene, and so explains the frameshift. You can see the extended protein coding region of transcript ENST00000575124 in the UCSC genome browser .

Not sure about the first example. A structural_interaction_variant suggests this amino acid is important for the protein structure, or for an interaction with another protein and so is flagged by SNPeff as an important amino acid site. However, in your case there is a synonymous change at this amino acid and so I would think your variant probably has no impact.

ADD COMMENTlink written 15 months ago by rbagnall1.3k

Awesome that you've found a transcript that codes for TNFRSF12A!

As for the first example, does it actually mean the "interaction" is highly important, but whether the mutation makes any difference to the "interaction" is another matter?

ADD REPLYlink written 15 months ago by moushengxu350

Yeah, possibly. It may be a default annotation of an important amino acid residue.

ADD REPLYlink written 15 months ago by rbagnall1.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 937 users visited in the last hour