Question: more than one annotation at a locus in SnpEff output?
2
gravatar for sun.nation
2.8 years ago by
sun.nation120
United States
sun.nation120 wrote:

I created the genome database by myself and ran:

$JAVA -jar /home/sshrest1/bin//snpEff/snpEff.jar Corynespora c1VSc2.vcf > c1VSc2_annotated.vcf

I am getting multiple annotations at a locus. I was not able to figure out the issue, any thoughts?

scaffold_1      3820284 .       G       A       5081.87 PASS    AC=1;AF=0.333;AN=3;BaseQRankSum=6.426;ClippingRankSum=0.000;DP=215;FS=0
.531;MLEAC=1;MLEAF=0.333;MQ=60.00;MQRankSum=0.000;QD=29.82;ReadPosRankSum=-1.702;SOR=0.756;ANN=A|missense_variant|MODERATE|"estExt_Genemark1.C_1_t30310"|GENE_"estExt_Genemark1.C_1_t30310"|transcript|TRANSCRIPT_"estExt_Genemark1.C_1_t30310"|protein_coding|4/5|c.1834C>T|p.Pro612Ser|1834/2304|1834/2304|612/767||,A|missense_variant|MODERATE|"gm1.1310_g"|GENE_"gm1.1310_g"|transcript|TRANSCRIPT_"gm1.1310_g"|protein_coding|4/5|c.1834C>T|p.Pro612Ser|1834/2304|1834/2304|612/767||,A|missense_variant|MODERATE|"fgenesh1_kg.1_#_1261_#_Locus3512v2rpkm41.18"|GENE_"fgenesh1_kg.1_#_1261_#_Locus3512v2rpkm41.18"|transcript|TRANSCRIPT_"fgenesh1_kg.1_#_1261_#_Locus3512v2rpkm41.18"|protein_coding|3/3|c.1705C>T|p.Pro569Ser|1705/1953|1705/1953|569/650||,A|missense_variant|MODERATE|"e_gw1.1.1471.1"|GENE_"e_gw1.1.1471.1"|transcript|TRANSCRIPT_"e_gw1.1.1471.1"|protein_coding|4/4|c.1801C>T|p.Pro601Ser|1801/2049|1801/2049|601/682||,A|missense_variant|MODERATE|"e_gw1.1.1890.1"|GENE_"e_gw1.1.1890.1"|transcript|TRANSCRIPT_"e_gw1.1.1890.1"|protein_coding|5/5|c.1663C>T|p.Pro555Ser|1663/1911|1663/1911|555/636||,A|missense_variant|MODERATE|"e_gw1.1.2199.1"|GENE_"e_gw1.1.2199.1"|transcript|TRANSCRIPT_"e_gw1.1.2199.1"|protein_coding|4/4|c.1834C>T|p.Pro612Ser|1834/2082|1834/2082|612/693||,A|missense_variant|MODERATE|"estExt_Genewise1.C_1_t60389"|GENE_"estExt_Genewise1.C_1_t60389"|transcript|TRANSCRIPT_"estExt_Genewise1.C_1_t60389"|protein_coding|4/4|c.1834C>T|p.Pro612Ser|1834/2082|1834/2082|612/693|,A|missense_variant|MODERATE|"estExt_Genewise1.C_1_t60390"|GENE_"estExt_Genewise1.C_1_t60390"|transcript|TRANSCRIPT_"estExt_Genewise1.C_1_t60390"|protein_coding|4/4|c.1834C>T|p.Pro612Ser|1834/2082|1834/2082|612/693||,A|missense_variant|MODERATE|"estExt_Genewise1.C_1_t60391"|GENE_"estExt_Genewise1.C_1_t60391"|transcript|TRANSCRIPT_"estExt_Genewise1.C_1_t60391"|protein_coding|4/4|c.1834C>T|p.Pro612Ser|1834/2082|1834/2082|612/693||,A|missense_variant|MODERATE|"estExt_Genewise1Plus.C_1_t60371"|GENE_"estExt_Genewise1Plus.C_1_t60371"|transript|TRANSCRIPT_"estExt_Genewise1Plus.C_1_t60371"|protein_coding|4/4|c.1834C>T|p.Pro612Ser|1834/2082|1834/2082|612/693||,A|missense_variant|MODERATE|"estExt_Genewise1Plus.C_1_t60372"|GENE_"estExt_Genewise1Plus.C_1_t60372"|transcript|TRANSCRIPT_"estExt_Genewise1Plus.C_1_t60372"|protein_coding|4/4|c.1834C>T|p.Pro612Ser|1834/2082|1834/2082|612/693||,A|missense_variant|MODERATE|"estExt_Genewise1Plus.C_1_t60373"|GENE_"estExt_Genewise1Plus.C_1_t60373"|transcript|TRANSCRIPT_"estExt_Genewise1Plus.C_1_t60373"|protein_coding|4/4|c.1834C>T|p.Pro612Ser|1834/2082|1834/2082|612/693||,A|missense_variant|MODERATE|"estExt_fgenesh1_pg.C_1_t30330"|GENE_"estExt_fgenesh1_pg.C_1_t30330"|  and so on
snpeff • 1.3k views
ADD COMMENTlink modified 15 months ago by jilguero88810 • written 2.8 years ago by sun.nation120
1

Each annotation seems to be for a different transcript. Either you have many transcripts overlapping that locus or you have a problem with the original gene annotation file.

ADD REPLYlink written 2.8 years ago by abascalfederico1.1k

I am getting multiple annotations at a locus. I was not able to figure out the issue

LIFE

ADD REPLYlink written 15 months ago by Pierre Lindenbaum124k
1
gravatar for jilguero888
15 months ago by
jilguero88810
United States
jilguero88810 wrote:

From snpEff manual:

"Usually there is more than one effect reported in each EFF field. There are several reasons for this: - A variant can affect multiple genes. E.g a variant can be DOWNSTREAM from one gene and UPSTREAM from another gene. - In complex organisms, genes usually have multiple transcripts. So SnpEff reports the effect of a variant on each transcript. - A VCF line can have more then one variant. E.g. If reference genome is 'G', but the sample has either 'A' or 'T' (non-biallelic variant), then this will be reported as one VCF line, having multiple alternative variants (notice that there are two ALTs)"

Probably the most problematic is the first one. In case this is your problem, you can change the surrounding area with the option -ud size_in_bases. From snpEff manual:

"You can change the default upstream and downstream interval size (default is 5K) using the -ud size_in_bases option. This also allows to eliminate any upstream and downstream effect by using "-ud 0"."

ADD COMMENTlink modified 15 months ago • written 15 months ago by jilguero88810

So how should one choose the correct annotation? Even I am having the same issue, there are multiple variants, not sure which is the correct one

ADD REPLYlink written 15 months ago by vinaysbharadhwaj20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1701 users visited in the last hour