I use Streptococcus pneumoniae reference genome (NZ_CP020550) for snpEff database.
First, I downloaded reference data (sequences.fa, genes.gff, protein.fa, cds.fa) from NCBI (https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_002076835.1/) using the command below:
curl -OJX GET "https://api.ncbi.nlm.nih.gov/datasets/v1alpha/genome/accession/GCF_002076835.1/download?include_annotation_type=GENOME_GTF,GENOME_GFF,GENOME_GBFF,RNA_FASTA,CDS_FASTA,PROT_FASTA&filename=GCF_002076835.1.zip" -H "Accept: application/zip"
Datas were saved to snpEff/data/Streptococcus_pneumoniae_gcf_002076835/ snpEff.config file was edited.
#Streptococcus pneumoniae reference genome, gcf 002076835 gff
Streptococcus_pneumoniae_gcf_002076835.genome : Streptococcus_pneumoniae_gcf_002076835
then build database.
java -jar snpEff.jar build -gff3 -v Streptococcus_pneumoniae_gcf_002076835
00:00:00 SnpEff version SnpEff 5.1d (build 2022-04-19 15:49), by Pablo Cingolani
00:00:00 Command: 'build'
00:00:00 Building database for 'Streptococcus_pneumoniae_gcf_002076835'
00:00:00 Reading configuration file 'snpEff.config'. Genome: 'Streptococcus_pneumoniae_gcf_002076835'
00:00:00 Reading config file: /home/external/lys/younso/work/sgseq/pipeline/snpEff/snpEff.config
00:00:00 done
00:00:00 Reading GFF3 data file : '/home/external/lys/younso/work/sgseq/pipeline/snpEff/./data/Streptococcus_pneumoniae_gcf_002076835/genes.gff'
00:00:00 Reading file '/home/external/lys/younso/work/sgseq/pipeline/snpEff/./data/Streptococcus_pneumoniae_gcf_002076835/genes.gff'
WARNING_TRANSCRIPT_NOT_FOUND: Exon's parent 'gene-SPNHU17_RS00005' is a Gene instead of a transcript. Created transcript 'TRANSCRIPT_gene-SPNHU17_RS00005' for NZ_CP020549.1 Protein Homology CDS 196 1557 +
dbxref : Genbank:WP_000660615.1,GeneID:66805161
gbkey : CDS
gene : dnaA
go_function : DNA binding|0003677||IEA,DNA replication origin binding|0003688||IEA,ATP binding|0005524||IEA
go_process : DNA replication initiation|0006270||IEA,regulation of DNA replication|0006275||IEA
id : cds-WP_000660615.1
inference : COORDINATES: similar to AA sequence:RefSeq:WP_004255267.1
locus_tag : SPNHU17_RS00005
name : WP_000660615.1
ontology_term : GO:0006270,GO:0006275,GO:0003677,GO:0003688,GO:0005524
parent : gene-SPNHU17_RS00005
product : chromosomal replication initiator protein DnaA
protein_id : WP_000660615.1
source : Protein Homology
transl_table : 11
type : CDS
...
WARNING_GENE_NOT_FOUND: Gene 'null' (NZ_CP020549.1:20021-20825) does not include 'gene-SPNHU17_RS00280' (NZ_CP020549.1:45312-46643). Created new gene 'null.2' (NZ_CP020549.1:45312-46643). File '/home/external/lys/younso/work/sgseq/pipeline/snpEff/./data/Streptococcus_pneumoniae_gcf_002076835/genes.gff' line 129 'NZ_CP020549.1 RefSeq pseudogene 45312
46643 . + . ID=gene-SPNHU17_RS00280;Dbxref=GeneID:66805216;Name=SPNHU17_RS00280;end_range=46643,.;gbkey=Gene;gene_biotype=pseudogene;locus_tag=SPNHU17_RS00280;old_locus_tag=SPNHU17_00055;partial=true;pseudo=true'
...
Then, when I used snpEff to make vcf file, error message were appeared.
java -jar ./snpEff/snpEff.jar Streptococcus_pneumoniae_gcf_002076835 work_out/workpath/filtered_vcf_file.vcf
00:00:00 ERROR while connecting to https://snpeff.blob.core.windows.net/databases/v5_1/snpEff_v5_1_Streptococcus_pneumoniae_gcf_002076835.zip
00:00:00 ERROR while connecting to https://snpeff.blob.core.windows.net/databases/v5_0/snpEff_v5_0_Streptococcus_pneumoniae_gcf_002076835.zip
FATAL ERROR: Failed to download database from [https://snpeff.blob.core.windows.net/databases/v5_1/snpEff_v5_1_Streptococcus_pneumoniae_gcf_002076835.zip, https://snpeff.blob.core.windows.net/databases/v5_0/snpEff_v5_0_Streptococcus_pneumoniae_gcf_002076835.zip]
I don't know why snpEff didn't use custom database (/snpEff/data/"custom database")
I also built database using genebank file and ncbi scripts.
I followed building snpEff documents. for genebank: (https://pcingola.github.io/SnpEff/se_build_db/#step-2-option-2-building-a-database-from-genbank-files) for ncbi: (https://pcingola.github.io/SnpEff/se_faq/#how-to-building-an-ncbi-genome-genbank-file)
I downloaded genebank data (sequence.gb) from NCBI(https://www.ncbi.nlm.nih.gov/nuccore/NZ_CP020550.1/) and used mv command to rename file and build database.
mv sequence.gb genes.gbk
java -jar snpEff.jar build -genbank -v Streptococcus_pneumoniae_gcf_002076835
And build ncbi database.
./scripts/buildDbNcbi.sh NZ_CP020550
But, the error message was same.
How to solve this problem?
Thanks, I used -noCheckCds and -noCheckProtein option and then worked.
you need to figure out how to ensure it can download the databases from the public URLs -- or place the files in the local SnpEff cache.
Following the URL, there are description how to download data. So I didn't describe in detail, sorry