Question: Warning errors in snpEff annotation results
I annotated variants in a bacterial genome using snpEff, but when I saw snpEff_summary.html, there were 944 warnings colored in yellow. Varians were detected using gatk HaplotypeCaller.

When I saw the warnings, they were almost WARNING_TRANSCRIPT_NO_START_CODON. However, when I checked both reference genome and CDS, both have start codon.

I have no idea why they were tagged as warnings.

If anyone has any idea, that would help me a lot.

Thank you.

I annotated using this command

 java -jar snpEff.jar -c snpEff.config -i vcf -o vcf bacteria1 SNPs_counted_using_HaplotypeCaller.vcf 1> res.vcf

one of the results I got

bacteria1   99501   .   C   A   697.6   PASS    AC=1;AF=0.500;AN=2;BaseQRankSum=2.555;DP=187;ExcessHet=3.0103;FS=0.784;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.000;QD=3.73;ReadPosRankSum=0.989;SOR=0.762;ANN=A|upstream_gene_variant|MODIFIER|D9_0073|GENE_D9_0073|transcript|TRANSCRIPT_D9_0073|protein_coding||c.-4774G>T|||||4774|,A|upstream_gene_variant|MODIFIER|D9_0074|GENE_D9_0074|transcript|TRANSCRIPT_D9_0074|protein_coding||c.-4536G>T|||||4536|,A|upstream_gene_variant|MODIFIER|D9_0078|GENE_D9_0078|transcript|TRANSCRIPT_D9_0078|protein_coding||c.-768G>T|||||768|,A|upstream_gene_variant|MODIFIER|D9_0079|null|transcript|D9_0079|protein_coding||c.-698C>A|||||609|WARNING_TRANSCRIPT_NO_START_CODON,A|upstream_gene_variant|MODIFIER|D9_0080|GENE_D9_0080|transcript|TRANSCRIPT_D9_0080|protein_coding||c.-2444C>A|||||2444|,A|upstream_gene_variant|MODIFIER|D9_0081|GENE_D9_0081|transcript|TRANSCRIPT_D9_0081|protein_coding||c.-3194C>A|||||3194|,A|upstream_gene_variant|MODIFIER|D9_0082|GENE_D9_0082|transcript|TRANSCRIPT_D9_0082|protein_coding||c.-4759C>A|||||4759|,A|downstream_gene_variant|MODIFIER|D9_0075|GENE_D9_0075|transcript|TRANSCRIPT_D9_0075|protein_coding||c.*3987C>A|||||3987|,A|downstream_gene_variant|MODIFIER|D9_0076|GENE_D9_0076|transcript|TRANSCRIPT_D9_0076|protein_coding||c.*3083C>A|||||3083|,A|downstream_gene_variant|MODIFIER|D9_0077|GENE_D9_0077|transcript|TRANSCRIPT_D9_0077|protein_coding||c.*2167C>A|||||2167|,A|intergenic_region|MODIFIER|D9_0078-D9_0079|GENE_D9_0078-null|intergenic_region|GENE_D9_0078-null|||n.99501C>A||||||  GT:AD:DP:GQ:PL  0/1:157,30:187:99:705,0,5786

I created my gtf file like this

seqname     source   feature start end   score strand frame attribute
bacteria1   bacteria1   CDS 101     1507    .   +   0   gene id "D9_0001";
bacteria1   bacteria1   CDS 1569    2666    .   +   0   gene id "D9_0002";
bacteria1   bacteria1   CDS 2663    4378    .   +   0   gene id "D9_0003";

I created my own database and adding this to snpEff.config

bacteria1.genome :bacteria1
bacteria1.chromosomes : bacteria1
bacteria1.bacteria1.codonTable : Bacterial_and_Plant_Plastid
ADD COMMENTlink modified 12 months ago • written 12 months ago by maricom0

Perhaps also contact the author Pablo Cingolani and cross-reference this thread. Be sure to have a read on Asking for help to provide the necessary information.

ADD REPLYlink written 12 months ago by SMK1.9k

Hi SMK, Thank you for your advice! I've sent the question to him, too.

ADD REPLYlink written 12 months ago by maricom0

I'm encountering the same issue. Has there been any resolution to this, or response from the author? Any help is appreciated.

ADD REPLYlink written 11 weeks ago by nadietz0
