snpEff database build error
0
0
Entering edit mode
7 weeks ago

Hi everyone, I was trying to build the snpEff database for the Human herpesvirus 5 strain Merlin (https://www.ncbi.nlm.nih.gov/nuccore/AY446894.2) using the script provided by SnpEff (buildDbNcbi.sh), and I got the following error described in the Error message section. I think the gen-bank file itself probably causes it. A formatting error or something in the gbk file. Is there anyone who encountered a similar problem? How did you overcome it? What do you suggest?

Note: Later, I tried to build the database manually and got the same error. I updated SnpEff to the 5.1 version and tried again. But I got the same error.

To Reproduce

SnpEff version: 5.0

Genome version: AY446894.2

SnpEff full command line: bash ~/path-to-script/buildDbNcbi.sh AY446894.2

Output / Error message: java.lang.RuntimeException: Error reading file '/path-to-data/data/AY446894.2/genes.gbk' java.lang.RuntimeException: Transcript 'HHV5wtgr002' is already in Gene 'HHV5wtgr002'

Expected behavior: Building database

Annotation Database GenBank SnpEff • 317 views
1
Entering edit mode

It seems the annotation contains two genes (probably identical?) at different positions (6759..8458 and 8250..8393), but with same name (RL9A) and locus_tag (HHV5wtgr002). My guess is snpEff wants unique names for the genes and transcripts.

0
Entering edit mode

Thank you for your input. I believe you guessed it correctly. I have deleted redundant entries in the GenBank file. I am not sure that was the right approach, but that worked. Also, I was not interested in those regions anyways.