Question: snpEff building new database ERROR
0
gravatar for Anastasia A.
9 months ago by
USA
Anastasia A. 0 wrote:

I'm current working on building a new database on snpEff, because the current one is not an up-to-date version. However I keep receiving the error below:

java -jar snpEff.jar build -gff3 -v Zea_Mays_B73
00:00:00    SnpEff version SnpEff 4.3t (build 2017-11-24 10:18), by Pablo Cingolani
00:00:00    Command: 'build'
00:00:00    Building database for 'Zea_Mays_B73'
00:00:00    Reading configuration file 'snpEff.config'. Genome: 'Zea_Mays_B73'
00:00:00    Reading config file: /opt/home/ngsclass/Amoiroglou/pt1RNAlibs/VcfFiles/snpEff/snpEff.config
java.lang.RuntimeException: Property: 'Zea_Mays_B73.genome' not found
    at org.snpeff.interval.Genome.<init>(Genome.java:106)
    at org.snpeff.snpEffect.Config.readGenomeConfig(Config.java:681)
    at org.snpeff.snpEffect.Config.readConfig(Config.java:649)
    at org.snpeff.snpEffect.Config.init(Config.java:480)
    at org.snpeff.snpEffect.Config.<init>(Config.java:117)
    at org.snpeff.SnpEff.loadConfig(SnpEff.java:451)
    at org.snpeff.snpEffect.commandLine.SnpEffCmdBuild.run(SnpEffCmdBuild.java:364)
    at org.snpeff.SnpEff.run(SnpEff.java:1183)
    at org.snpeff.SnpEff.main(SnpEff.java:162)
00:00:00    Logging
00:00:01    Done.

I have followed the documentation on the snpEff manual page(http://snpeff.sourceforge.net/SnpEff_manual.html#databases) with no luck.

Any help is very very appreciated!

new snpeff database • 1.2k views
ADD COMMENTlink modified 7 weeks ago by arif.ashraf.opu10 • written 9 months ago by Anastasia A. 0
1

Anastasia A. : Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. SUBMIT ANSWER is reserved for submitting new answers for the original question in the thread.

ADD REPLYlink written 9 months ago by genomax74k

Similar problem with this:

error in building annotation database by SnpEff

Have you added your genome entry to the snpEff config file?

ADD REPLYlink written 9 months ago by Vitis2.3k

Thank you for your reply, and yes I did add the genome to the snpEff config file and still received the same error.

Zea Mays B73 genome,Version 4
Zea_Mays_B73v4.genome: Zea_Mays_B73

I also made a dir called Zea_Mays_B73v4 and downloaded both genomic fasta and the gff3 annotation file (also renamed those according to the instructions on snpEff manual).

ADD REPLYlink modified 9 months ago by finswimmer12k • written 9 months ago by Anastasia A. 0
1

Is the command you used?

java -jar snpEff.jar build -gtf22 -v Zea_Mays_B73

Make sure the name should be Zea_Mays_B73v4 not Zea_Mays_B73

ADD REPLYlink written 9 months ago by Vitis2.3k

Yes, you were right, that worked!

Thank you!! (hate when it is just a typo)

ADD REPLYlink written 9 months ago by Anastasia A. 0

Now that I was able to get the database to built I'm receiving the following when trying to download it.

java -jar snpEff.jar download -v Zea_Mays_B73v4
00:00:00    SnpEff version SnpEff 4.3t (build 2017-11-24 10:18), by Pablo Cingolani
00:00:00    Command: 'download'
00:00:00    Reading configuration file 'snpEff.config'. Genome: 'Zea_Mays_B73v4'
00:00:00    Reading config file: /opt/home/ngsclass/Amoiroglou/pt1RNAlibs/VcfFiles/snpEff/snpEff.config
00:00:00    done
00:00:00    Downloading database for 'Zea_Mays_B73v4'
00:00:01    Connecting to http://downloads.sourceforge.net/project/snpeff/databases/v4_3/snpEff_v4_3_Zea_Mays_B73v4.zip
00:00:01    ERROR while connecting to http://downloads.sourceforge.net/project/snpeff/databases/v4_3/snpEff_v4_3_Zea_Mays_B73v4.zip
java.lang.RuntimeException: java.lang.RuntimeException: File not found on the server. Make sure the database name is correct.
    at org.snpeff.util.Download.download(Download.java:178)
    at org.snpeff.snpEffect.commandLine.SnpEffCmdDownload.downloadAndInstall(SnpEffCmdDownload.java:32)
    at org.snpeff.snpEffect.commandLine.SnpEffCmdDownload.runDownloadGenome(SnpEffCmdDownload.java:86)
    at org.snpeff.snpEffect.commandLine.SnpEffCmdDownload.run(SnpEffCmdDownload.java:72)
    at org.snpeff.SnpEff.run(SnpEff.java:1183)
    at org.snpeff.SnpEff.main(SnpEff.java:162)
Caused by: java.lang.RuntimeException: File not found on the server. Make sure the database name is correct.
    at org.snpeff.util.Download.download(Download.java:127)
    ... 5 more
00:00:01    Logging
00:00:02    Done.

Any thoughts on why it is still unable to find my database?

ADD REPLYlink modified 9 months ago by finswimmer12k • written 9 months ago by Anastasia A. 0

Hello and welcome to biostars Anastasia A. ,

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

Thank you!

ADD REPLYlink written 9 months ago by finswimmer12k

Thank you! I did that and it seemed like the db was built. However when I run my samples using my db as the reference db, the program is unable to find it and it is looking in the snpeff server. I get the error message below:

java `-Xmx4g -jar snpEff.jar -v -stats 2mutantNewDB.html Zea_Mays_B73v4 2mutant.vcf > 2mutantNewDB.ann.vcf &`

SnpEff version SnpEff 4.3t (build 2017-11-24 10:18), by Pablo Cingolani

00:00:00    Command: '`ann`'
00:00:00    Reading configuration file '`snpEff.config'. Genome: 'Zea_Mays_B73v4`'
00:00:00    Reading config file: /opt/home/ngsclass/Amoiroglou/pt1RNAlibs/VcfFiles/snpEff.config
00:00:00    done
00:00:00    Reading database for genome version '`Zea_Mays_B73v4`' from file '/opt/home/ngsclass/Amoiroglou/pt1RNAlibs/VcfFiles/./data/Zea_Mays_B73v4/snpEffectPredictor.bin' (this might take a while)
00:00:00    Database not installed
    Attempting to download and install database '`Zea_Mays_B73v4`'
00:00:00    Reading configuration file 'snpEff.config'. Genome: '`Zea_Mays_B73v4`'
00:00:00    Reading config file: /opt/home/ngsclass/Amoiroglou/pt1RNAlibs/VcfFiles/snpEff.config
00:00:01    done
00:00:01    Downloading database for '`Zea_Mays_B73v4`'
00:00:01    Connecting to http://downloads.sourceforge.net/project/snpeff/databases/v4_3/snpEff_v4_3_Zea_Mays_B73v4.zip
00:00:01    ERROR while connecting to http://downloads.sourceforge.net/project/snpeff/databases/v4_3/snpEff_v4_3_Zea_Mays_B73v4.zip
java.lang.RuntimeException: java.lang.RuntimeException: File not found on the server. Make sure the database name is correct.
    at org.snpeff.util.Download.download(Download.java:178)
    at org.snpeff.snpEffect.commandLine.SnpEffCmdDownload.downloadAndInstall(SnpEffCmdDownload.java:32)
    at org.snpeff.snpEffect.commandLine.SnpEffCmdDownload.runDownloadGenome(SnpEffCmdDownload.java:86)
    at org.snpeff.snpEffect.commandLine.SnpEffCmdDownload.run(SnpEffCmdDownload.java:72)
    at org.snpeff.SnpEff.run(SnpEff.java:1221)
    at org.snpeff.SnpEff.loadDb(SnpEff.java:515)
    at org.snpeff.snpEffect.commandLine.SnpEffCmdEff.run(SnpEffCmdEff.java:1001)
    at org.snpeff.snpEffect.commandLine.SnpEffCmdEff.run(SnpEffCmdEff.java:984)
    at org.snpeff.SnpEff.run(SnpEff.java:1183)
    at org.snpeff.SnpEff.main(SnpEff.java:162)
Caused by: java.lang.RuntimeException: File not found on the server. Make sure the database name is correct.
    at org.snpeff.util.Download.download(Download.java:127)
    ... 9 more
java.lang.RuntimeException: Genome download failed!
ADD REPLYlink modified 9 months ago • written 9 months ago by Anastasia A. 0

Looks like snpEff still tries to download the database instead of looking for a local one. Can you confirm you added the entry to the configure file like this:

# Zea Mays B73 genome, Version 4
Zea_Mays_B73v4.genome: Zea_Mays_B73v4

Make sure the first line is commented out. Then try this:

java -Xmx4g -jar /path/to/snpEff/snpEff.jar -c /path/to/snpEff/snpEff.config -v Zea_Mays_B73v4 input.vcf > output.ann.vcf
ADD REPLYlink written 9 months ago by Vitis2.3k

I edit the snpEff as you described and still didn't work. I also added data.dir pathway as it is specified on the config file :

# Zea_Mays_B73v4 genome, Version 4 Zea_Mays_B73v4.genome : Zea_Mays_B73v4 data.dir = ~/ngsclass/Amoiroglou/pt1RNAlibs/VcfFiles/snpEff/data/Zea_Mays_B73v4/

that still didn't seem to solve the issue of trying to find the db on the server instead of the local directory. I tried both with and without the data.dir and had no luck.

ADD REPLYlink written 9 months ago by Anastasia A. 0
1

I'll download the maize data to do a real test and report back later.

ADD REPLYlink written 9 months ago by Vitis2.3k
1
gravatar for Vitis
9 months ago by
Vitis2.3k
New York
Vitis2.3k wrote:

I think you're supposed to build the database from a local fasta reference and gff3 annotation file, instead of downloading, if the version of annotation is not available through snpEff.

cd /path/to/snpEff/data/
mkdir Zea_Mays_B73v4
cd Zea_Mays_B73v4

Move the reference sequences and annotation to the data directory. Make sure they're named "sequences" and "genes".

mv somewhere/sequences.fa.gz ./
mv somewhere/genes.gtf.gz ./

Then run the building step.

cd /path/to/snpEff
java -jar snpEff.jar build -gtf22 -v Zea_Mays_B73v4

The -gft22 option specifies the format you're using for annotation files.

ADD COMMENTlink modified 9 months ago • written 9 months ago by Vitis2.3k
1
gravatar for Vitis
9 months ago by
Vitis2.3k
New York
Vitis2.3k wrote:

OK. I ran through the entire process and it seems to work fine for me. Here are the steps:

Download reference genome and annotation from ensembl ftp site. I downloaded the annotation file in GFF3 format.

https://plants.ensembl.org/Zea_mays/Info/Index

The reference fasta file was renamed and compressed it as "sequences.fa.gz", the annotations in GFF3 format was renamed and compressed as "genes.gff.gz". It seems you'll have to use "gff" in the name instead of "gff3" to have snpEff recognize it.

gzip sequences.fa
gzip genes.gff

Create a directory for your database under /path/to/snpEff/data/

cd /path/to/snpEff/data/
mkdir Zea_Mays_B73v4
cd Zea_Mays_B73v4

Move the reference and annotation here:

mv somewhere/sequences.fa.gz ./
mv somewhere/genes.gff.gz ./

Edit the snpeff.config file, add the following lines for your database (I added under the ensembl release 86 section)

# Zea Mays B73 genome, Version 4
Zea_Mays_B73v4.genome: Zea_Mays_B73v4

Then run the database building step. Make sure you use the "-gff3" option to match your "genes.gff.gz" file.

java -jar snpEff.jar build -gff3 -v Zea_mays_B73v4

You should see some warnings about UTRs but there should not be any "ERROR" reported.

Then you can run the effect prediction.

java -Xmx4g -jar /path/to/snpEff/snpEff.jar -c /path/to/snpEff/snpEff.config -v Zea_mays_B73v4 input.vcf > output.ann.vcf

This should pick up your custom database Zea_mays_B73v4 correctly.

A side note, if you're running relatively small VCF for this, you may consider using ensembl's online VEP interface. It works well for a limited number of variants, say, a few hundred.

https://uswest.ensembl.org/info/docs/tools/vep/index.html

ADD COMMENTlink written 9 months ago by Vitis2.3k
0
gravatar for arup
9 months ago by
arup1.9k
India
arup1.9k wrote:

Prebuilt Zea mays database is already available, not sure about the version.

java -jar /Toolbox/snpEff/snpEff.jar databases |grep "Zea"
java -jar /Toolbox/snpEff/snpEff.jar download -v Zea_mays
ADD COMMENTlink written 9 months ago by arup1.9k

Thank you, it is the older version though and that is why I need to build my own db!

ADD REPLYlink written 9 months ago by Anastasia A. 0
0
gravatar for arif.ashraf.opu
7 weeks ago by
Japan
arif.ashraf.opu10 wrote:

Download the latest version of snpEff from the following link and unzip it.

http://sourceforge.net/projects/snpeff/files/snpEff_latest_core.zip

Download reference genome in .fa format and annotation file in .gff3 format from the following link:

https://plants.ensembl.org/Zea_mays/Info/Index

Unzip both files and rename them accordingly: Reference genome sequences.fa Annotation file genes.gff

Remember to change the annotation file from .gff3 to .gff, otherwise snpEff wouldn’t be able to recognize it.

Make a new folder in inside your “snpEff” folder named “data” Inside the "data" folder, make two more folders - "genomes" and "Zea_Mays_B73v4" Transfer "sequences.fa" and "genes.gff" to the folder "genomes" and "Zea_Mays_B73v4", respectively.

*Important note: Java heap size is an important issue. From your "Control panel", go to the "Programs" and find "Java". Check whether you are using 32 bit or 64 bit. If you are using 32 bit, Java heap size will be maximum 1GB. Remove all previous/old version of Java from your computer. Install 64 bit Java and increase the maximum heap size. Otherwise, following codes will not work for you. You can watch YouTube video for assistance: "How to increase Java heap size"

Now, edit the snpEff.congif file. Add the following lines under ensemble release section.

Zea Mays B73 genome, Version 4

Zea_Mays_B73v4.genome: Zea_Mays_B73v4

Now, go to the directory which contains snpEff.jar file, and run the following command.

java -Xmx4G -jar snpEff.jar build -gff3 -v Zea_Mays_B73v4

Now, run the following code to execute your file.

$ java -Xmx4G -jar snpEff.jar Zea_Mays_B73v4 chr1_NoHapMap_EMS.vcf > output1.vcf

ADD COMMENTlink written 7 weeks ago by arif.ashraf.opu10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 710 users visited in the last hour