Building Snpeff Database
2
2
Entering edit mode
10.1 years ago
bioinfo ▴ 830

I was trying to create a snpeff database using my reference genome (in genbank format). I followed http://snpeff.sourceforge.net/supportNewGenome.html#genbank but during the editing the configuration file, I messed up. Eventually I couldn't add the genome to the config file.

My commands:

vi snpEffect.config

# Sodalis genome, version NC_007712.1 GI:85057978

NC_007712.1 GI:85057978 : Sodalis

I don't know how to save the above input information in configuration file and I m not sure whether the information I put in above is correct for my genome. I copied it from version section of genbank file of my bacteria from NCBI. Any help will be highly appreciated.

vcftools gatk snp • 18k views
6
Entering edit mode
4.7 years ago
rleach ▴ 130

I just went through figuring this out and I thought I would add my process, including the FASTA component, using Vibrio phage VP882 as my example and utilizing the gff strategy you mentioned in a comment to the other answer. Here is everything I did using an established snpEff installation. It worked when I ran my analysis using it, so this strategy is confirmed in my case:

 #How to create a snpEff database using a gff3 and genomic DNA fasta file... (note, the chromosome names must match in the 2 files)
#NOTE: This uses /bin/tcsh...

setenv DBNAME Vibrio_phage_VP882

#Go into the snpEff directory and create a directory for your files
cd /usr/local/snpEff
mkdir data/$DBNAME #Copy the files into snpEff's directory structure cp$GFF3 data/$DBNAME/genes.gff cp$FASTA data/$DBNAME/sequences.fa #Edit snpEff.config and insert your specific database information: echo "$DBNAME.genome : $DBNAME" >> snpEff.config #Build the database java -jar snpEff.jar build -gff3 -v$DBNAME


I did not have any errors or warnings, so if you see anything untoward, you'll have to figure those things out.

You can set the 3 variable values at the top of this script and run the rest without changing it (unless your snpEff installation is in a different place.

https://www.ncbi.nlm.nih.gov/nuccore/NC_009016

Rob

1
Entering edit mode

Thank you so much! The documentation for SNPeff is rather poor, and this was the first source I (finally) found that worked! Great, thanks!

0
Entering edit mode

I try to run the same commands but when I try to annotate my vcf snpEff tries to upload the database from sourceforge and ends up with an error. Should the build option create some database related files at directories? Directories and their contents remain unchanged and this is weird.

0
Entering edit mode

The problem was following: the memory argument -Xmx4G should be added before running build command.

0
Entering edit mode

the folder /usr/local/snpEff/data does not exist, you need to create it.

5
Entering edit mode
10.1 years ago

Try this:

1. Create directory "Sodalis" in snpEff data directory:

mkdir /path/to/snpEff/data/Sodalis

2. Downloaded and save the GenBank file to the Sodalis directory (note the file name must be gene.gb)

/path/to/snpEff/data/Sodalis/genes.gb

3. Edit snpEff.config and insert your specific database information:

# Sodalis genome, version NC_007712.1 GI:85057978

4. Create database (note the "-genbank" flag):

cd /path/to/snpEff
java -jar snpEff.jar build -genbank -v Sodalis


I expect this could help...

Fred

3
Entering edit mode

very useful starter, thanks! in my case the genbank data seemed to be faulty and I had to use GFF3 + FASTA to reach the best result

• build folder and register in .config as described above
• place the two files renamed: genes.gff and sequences.fa into the new database folder
• build with: java -jar \$SNPEFF/snpEff.jar build -gff3 -v <name>
0
Entering edit mode

Could you please tell me what the fault was with the genbank file?