I'm trying to download the GRCh38.86 database to run snpEff. The problem (I think) is that the download occurs in a location without enough space. Is there a way to specify a different dir to download to?
java -jar /home/me/anaconda3/share/snpeff-4.3.1t-1/snpEff.jar download -v GRCh38.86
This is probably something simple that I'm overlooking. I checked out the snpEff documentation and tried -o
but can't seem to find what I'm looking for. The error message that I get from the above code supplies the link below. So, I tried just directly downloading and unzipping that link, but there's no .genome file there. (Lots of other stuff though.)
https://iweb.dl.sourceforge.net/project/snpeff/databases/v4_3/snpEff_v4_3_GRCh38.86.zip
Am I just missing something in the help info?
snpEff -help
Available commands:
[eff|ann] : Annotate variants / calculate effects (you can use either 'ann' or 'eff', they mean the same). Default: ann (no command or 'ann').
build : Build a SnpEff database.
buildNextProt : Build a SnpEff for NextProt (using NextProt's XML files).
cds : Compare CDS sequences calculated form a SnpEff database to the one in a FASTA file. Used for checking databases correctness.
closest : Annotate the closest genomic region.
count : Count how many intervals (from a BAM, BED or VCF file) overlap with each genomic interval.
databases : Show currently available databases (from local config file).
download : Download a SnpEff database.
dump : Dump to STDOUT a SnpEff database (mostly used for debugging).
genes2bed : Create a bed file from a genes list.
len : Calculate total genomic length for each marker type.
pdb : Build interaction database (based on PDB data).
protein : Compare protein sequences calculated form a SnpEff database to the one in a FASTA file. Used for checking databases correctness.
seq : Show sequence (from command line) translation.
show : Show a text representation of genes or transcripts coordiantes, DNA sequence and protein sequence.
translocReport : Create a translocations report (from VCF file).
Generic options:
-c , -config : Specify config file
-configOption name=value : Override a config file option
-d , -debug : Debug mode (very verbose).
-dataDir <path> : Override data_dir parameter from config file.
-download : Download a SnpEff database, if not available locally. Default: true
-nodownload : Do not download a SnpEff database, if not available locally.
-h , -help : Show this help and exit
-noLog : Do not report usage statistics to server
-t : Use multiple threads (implies '-noStats'). Default 'off'
-q , -quiet : Quiet mode (do not show any messages or errors)
-v , -verbose : Verbose mode
-version : Show version number and exit
Database options:
-canon : Only use canonical transcripts.
-canonList <file> : Only use canonical transcripts, replace some transcripts using the 'gene_id transcript_id' entries in <file>.
-interaction : Annotate using inteactions (requires interaciton database). Default: true
-interval <file> : Use a custom intervals in TXT/BED/BigBed/VCF/GFF file (you may use this option many times)
-maxTSL <TSL_number> : Only use transcripts having Transcript Support Level lower than <TSL_number>.
-motif : Annotate using motifs (requires Motif database). Default: true
-nextProt : Annotate using NextProt (requires NextProt database).
-noGenome : Do not load any genomic database (e.g. annotate using custom files).
-noExpandIUB : Disable IUB code expansion in input variants
-noInteraction : Disable inteaction annotations
-noMotif : Disable motif annotations.
-noNextProt : Disable NextProt annotations.
-onlyReg : Only use regulation tracks.
-onlyProtein : Only use protein coding transcripts. Default: false
-onlyTr <file.txt> : Only use the transcripts in this file. Format: One transcript ID per line.
-reg <name> : Regulation track to use (this option can be used add several times).
-ss , -spliceSiteSize <int> : Set size for splice sites (donor and acceptor) in bases. Default: 2
-spliceRegionExonSize <int> : Set size for splice site region within exons. Default: 3 bases
-spliceRegionIntronMin <int> : Set minimum number of bases for splice site region within intron. Default: 3 bases
-spliceRegionIntronMax <int> : Set maximum number of bases for splice site region within intron. Default: 8 bases
-strict : Only use 'validated' transcripts (i.e. sequence has been checked). Default: false
-ud , -upDownStreamLen <int> : Set upstream downstream interval length (in bases)
Error message when running first command listed:
00:00:00 SnpEff version SnpEff 4.3t (build 2017-11-24 10:18), by Pablo Cingolani
00:00:00 Command: 'download'
00:00:00 Reading configuration file 'snpEff.config'. Genome: 'GRCh38.86'
00:00:00 Reading config file: /my/cluster/me/myproject/snpEff.config
00:00:00 Reading config file: /home/me/anaconda3/share/snpeff-4.3.1t-1/snpEff.config
00:00:00 done
00:00:00 Downloading database for 'GRCh38.86'
00:00:00 Connecting to http://downloads.sourceforge.net/project/snpeff/databases/v4_3/snpEff_v4_3_GRCh38.86.zip
00:00:00 Following redirect: https://iweb.dl.sourceforge.net/project/snpeff/databases/v4_3/snpEff_v4_3_GRCh38.86.zip
00:00:01 Local file name: '/tmp/snpEff_v4_3_GRCh38.86.zip'
..........................................................................................................................................00:00:04 ERROR while connecting to https://iweb.dl.sourceforge.net/project/snpeff/databases/v4_3/snpEff_v4_3_GRCh38.86.zip
java.lang.RuntimeException: java.io.IOException: No space left on device
at org.snpeff.util.Download.download(Download.java:178)
at org.snpeff.snpEffect.commandLine.SnpEffCmdDownload.downloadAndInstall(SnpEffCmdDownload.java:32)
at org.snpeff.snpEffect.commandLine.SnpEffCmdDownload.runDownloadGenome(SnpEffCmdDownload.java:86)
at org.snpeff.snpEffect.commandLine.SnpEffCmdDownload.run(SnpEffCmdDownload.java:72)
at org.snpeff.SnpEff.run(SnpEff.java:1183)
at org.snpeff.SnpEff.main(SnpEff.java:162)
Caused by: java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:326)
at org.snpeff.util.Download.download(Download.java:159)
... 5 more
00:00:04 Logging
00:00:05 Done.
Thanks for your reply. It does answer what I thought was my problem. I was able to change the output path as suggested, but still ran into the same error. I've found that the error occurs because I don't have enough space in /tmp (~300mb). I know this is a separate problem, but is there a workaround for this?
Perfect, thanks Pierre. Ran into the same problem using a container.
snpEff download GRCh37.75 -dataDir /data/snpEff
@abe add the
-dataDir
flag . . .