Hi all,
I am trying to build a snpEff database but I'm running into the following error message:
java.lang.RuntimeException: Transcript 'hypothetical_protein' already exists
at org.snpeff.snpEffect.factory.SnpEffPredictorFactory.add(SnpEffPredictorFactory.java:135)
at org.snpeff.snpEffect.factory.SnpEffPredictorFactoryFeatures.addMrna(SnpEffPredictorFactoryFeatures.java:183)
at org.snpeff.snpEffect.factory.SnpEffPredictorFactoryFeatures.addFeatures(SnpEffPredictorFactoryFeatures.java:134)
at org.snpeff.snpEffect.factory.SnpEffPredictorFactoryFeatures.create(SnpEffPredictorFactoryFeatures.java:330)
at org.snpeff.snpEffect.commandLine.SnpEffCmdBuild.run(SnpEffCmdBuild.java:369)
at org.snpeff.SnpEff.run(SnpEff.java:1183)
at org.snpeff.SnpEff.main(SnpEff.java:162)
java.lang.RuntimeException: Error reading file '/home/lina/snpEff/./data/my_organism/genes.gbk'
java.lang.RuntimeException: Transcript 'hypothetical_protein' already exists
at org.snpeff.snpEffect.factory.SnpEffPredictorFactoryFeatures.create(SnpEffPredictorFactoryFeatures.java:344)
at org.snpeff.snpEffect.commandLine.SnpEffCmdBuild.run(SnpEffCmdBuild.java:369)
at org.snpeff.SnpEff.run(SnpEff.java:1183)
at org.snpeff.SnpEff.main(SnpEff.java:162)
The file I am using for the database is a Genbank file that I downloaded from NCBI. It contains 12760 genes and 2764 of them are annotated with product="hypothetical protein"
Based on the error message I assume having more than one gene labeled hypothetical protein
is a problem for snpEff. However, I assume there must be many organisms where that is the case.
Does anyone have any insight into this?
Thanks!
~Lina
what the acn of this genbank file ?
it's GCA_001007165.2