Question: Specify output dir when downloading snpEff database?
5 months ago by
abe10 wrote:

I'm trying to download the GRCh38.86 database to run snpEff. The problem (I think) is that the download occurs in a location without enough space. Is there a way to specify a different dir to download to?

java -jar /home/me/anaconda3/share/snpeff-4.3.1t-1/snpEff.jar download -v GRCh38.86

This is probably something simple that I'm overlooking. I checked out the snpEff documentation and tried -o but can't seem to find what I'm looking for. The error message that I get from the above code supplies the link below. So, I tried just directly downloading and unzipping that link, but there's no .genome file there. (Lots of other stuff though.)

Am I just missing something in the help info?

snpEff -help

Available commands:
        [eff|ann]                    : Annotate variants / calculate effects (you can use either 'ann' or 'eff', they mean the same). Default: ann (no command or 'ann').
        build                        : Build a SnpEff database.
        buildNextProt                : Build a SnpEff for NextProt (using NextProt's XML files).
        cds                          : Compare CDS sequences calculated form a SnpEff database to the one in a FASTA file. Used for checking databases correctness.
        closest                      : Annotate the closest genomic region.
        count                        : Count how many intervals (from a BAM, BED or VCF file) overlap with each genomic interval.
        databases                    : Show currently available databases (from local config file).
        download                     : Download a SnpEff database.
        dump                         : Dump to STDOUT a SnpEff database (mostly used for debugging).
        genes2bed                    : Create a bed file from a genes list.
        len                          : Calculate total genomic length for each marker type.
        pdb                          : Build interaction database (based on PDB data).
        protein                      : Compare protein sequences calculated form a SnpEff database to the one in a FASTA file. Used for checking databases correctness.
        seq                          : Show sequence (from command line) translation.
        show                         : Show a text representation of genes or transcripts coordiantes, DNA sequence and protein sequence.
        translocReport               : Create a translocations report (from VCF file).

Generic options:
        -c , -config                 : Specify config file
        -configOption name=value     : Override a config file option
        -d , -debug                  : Debug mode (very verbose).
        -dataDir <path>              : Override data_dir parameter from config file.
        -download                    : Download a SnpEff database, if not available locally. Default: true
        -nodownload                  : Do not download a SnpEff database, if not available locally.
        -h , -help                   : Show this help and exit
        -noLog                       : Do not report usage statistics to server
        -t                           : Use multiple threads (implies '-noStats'). Default 'off'
        -q , -quiet                  : Quiet mode (do not show any messages or errors)
        -v , -verbose                : Verbose mode
        -version                     : Show version number and exit

Database options:
        -canon                       : Only use canonical transcripts.
        -canonList <file>            : Only use canonical transcripts, replace some transcripts using the 'gene_id       transcript_id' entries in <file>.
        -interaction                 : Annotate using inteactions (requires interaciton database). Default: true
        -interval <file>             : Use a custom intervals in TXT/BED/BigBed/VCF/GFF file (you may use this option many times)
        -maxTSL <TSL_number>         : Only use transcripts having Transcript Support Level lower than <TSL_number>.
        -motif                       : Annotate using motifs (requires Motif database). Default: true
        -nextProt                    : Annotate using NextProt (requires NextProt database).
        -noGenome                    : Do not load any genomic database (e.g. annotate using custom files).
        -noExpandIUB                 : Disable IUB code expansion in input variants
        -noInteraction               : Disable inteaction annotations
        -noMotif                     : Disable motif annotations.
        -noNextProt                  : Disable NextProt annotations.
        -onlyReg                     : Only use regulation tracks.
        -onlyProtein                 : Only use protein coding transcripts. Default: false
        -onlyTr <file.txt>           : Only use the transcripts in this file. Format: One transcript ID per line.
        -reg <name>                  : Regulation track to use (this option can be used add several times).
        -ss , -spliceSiteSize <int>  : Set size for splice sites (donor and acceptor) in bases. Default: 2
        -spliceRegionExonSize <int>  : Set size for splice site region within exons. Default: 3 bases
        -spliceRegionIntronMin <int> : Set minimum number of bases for splice site region within intron. Default: 3 bases
        -spliceRegionIntronMax <int> : Set maximum number of bases for splice site region within intron. Default: 8 bases
        -strict                      : Only use 'validated' transcripts (i.e. sequence has been checked). Default: false
        -ud , -upDownStreamLen <int> : Set upstream downstream interval length (in bases)

Error message when running first command listed:

00:00:00        SnpEff version SnpEff 4.3t (build 2017-11-24 10:18), by Pablo Cingolani
00:00:00        Command: 'download'
00:00:00        Reading configuration file 'snpEff.config'. Genome: 'GRCh38.86'
00:00:00        Reading config file: /my/cluster/me/myproject/snpEff.config
00:00:00        Reading config file: /home/me/anaconda3/share/snpeff-4.3.1t-1/snpEff.config
00:00:00        done
00:00:00        Downloading database for 'GRCh38.86'
00:00:00        Connecting to
00:00:00        Following redirect:
00:00:01        Local file name: '/tmp/'
..........................................................................................................................................00:00:04   ERROR while connecting to
java.lang.RuntimeException: No space left on device
        at org.snpeff.snpEffect.commandLine.SnpEffCmdDownload.downloadAndInstall(
        at org.snpeff.snpEffect.commandLine.SnpEffCmdDownload.runDownloadGenome(
        at org.snpeff.SnpEff.main(
Caused by: No space left on device
        at Method)
        ... 5 more
00:00:04        Logging
00:00:05        Done.
5 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum120k wrote:

in the file snpEff.config

there is a line data.dir that can be changed

# Databases are stored here
# E.g.: Information for 'hg19' is stored in data.dir/hg19/
# You can use tilde ('~') as first character to refer to your home directory. 
# Also, a non-absolute path will be relative to config's file dir
data.dir = /path/to/what/you/want/data

you can also copy/create your own custom config file and use it with option -c

    -c , -config                 : Specify config file
ADD COMMENTlink written 5 months ago by Pierre Lindenbaum120k

Thanks for your reply. It does answer what I thought was my problem. I was able to change the output path as suggested, but still ran into the same error. I've found that the error occurs because I don't have enough space in /tmp (~300mb). I know this is a separate problem, but is there a workaround for this?

ADD REPLYlink written 5 months ago by abe10
