Question: Specify output dir when downloading snpEff database?
0
gravatar for abe
10 months ago by
abe10
abe10 wrote:

I'm trying to download the GRCh38.86 database to run snpEff. The problem (I think) is that the download occurs in a location without enough space. Is there a way to specify a different dir to download to?

java -jar /home/me/anaconda3/share/snpeff-4.3.1t-1/snpEff.jar download -v GRCh38.86

This is probably something simple that I'm overlooking. I checked out the snpEff documentation and tried -o but can't seem to find what I'm looking for. The error message that I get from the above code supplies the link below. So, I tried just directly downloading and unzipping that link, but there's no .genome file there. (Lots of other stuff though.)

https://iweb.dl.sourceforge.net/project/snpeff/databases/v4_3/snpEff_v4_3_GRCh38.86.zip

Am I just missing something in the help info?

snpEff -help

Available commands:
        [eff|ann]                    : Annotate variants / calculate effects (you can use either 'ann' or 'eff', they mean the same). Default: ann (no command or 'ann').
        build                        : Build a SnpEff database.
        buildNextProt                : Build a SnpEff for NextProt (using NextProt's XML files).
        cds                          : Compare CDS sequences calculated form a SnpEff database to the one in a FASTA file. Used for checking databases correctness.
        closest                      : Annotate the closest genomic region.
        count                        : Count how many intervals (from a BAM, BED or VCF file) overlap with each genomic interval.
        databases                    : Show currently available databases (from local config file).
        download                     : Download a SnpEff database.
        dump                         : Dump to STDOUT a SnpEff database (mostly used for debugging).
        genes2bed                    : Create a bed file from a genes list.
        len                          : Calculate total genomic length for each marker type.
        pdb                          : Build interaction database (based on PDB data).
        protein                      : Compare protein sequences calculated form a SnpEff database to the one in a FASTA file. Used for checking databases correctness.
        seq                          : Show sequence (from command line) translation.
        show                         : Show a text representation of genes or transcripts coordiantes, DNA sequence and protein sequence.
        translocReport               : Create a translocations report (from VCF file).

Generic options:
        -c , -config                 : Specify config file
        -configOption name=value     : Override a config file option
        -d , -debug                  : Debug mode (very verbose).
        -dataDir <path>              : Override data_dir parameter from config file.
        -download                    : Download a SnpEff database, if not available locally. Default: true
        -nodownload                  : Do not download a SnpEff database, if not available locally.
        -h , -help                   : Show this help and exit
        -noLog                       : Do not report usage statistics to server
        -t                           : Use multiple threads (implies '-noStats'). Default 'off'
        -q , -quiet                  : Quiet mode (do not show any messages or errors)
        -v , -verbose                : Verbose mode
        -version                     : Show version number and exit

Database options:
        -canon                       : Only use canonical transcripts.
        -canonList <file>            : Only use canonical transcripts, replace some transcripts using the 'gene_id       transcript_id' entries in <file>.
        -interaction                 : Annotate using inteactions (requires interaciton database). Default: true
        -interval <file>             : Use a custom intervals in TXT/BED/BigBed/VCF/GFF file (you may use this option many times)
        -maxTSL <TSL_number>         : Only use transcripts having Transcript Support Level lower than <TSL_number>.
        -motif                       : Annotate using motifs (requires Motif database). Default: true
        -nextProt                    : Annotate using NextProt (requires NextProt database).
        -noGenome                    : Do not load any genomic database (e.g. annotate using custom files).
        -noExpandIUB                 : Disable IUB code expansion in input variants
        -noInteraction               : Disable inteaction annotations
        -noMotif                     : Disable motif annotations.
        -noNextProt                  : Disable NextProt annotations.
        -onlyReg                     : Only use regulation tracks.
        -onlyProtein                 : Only use protein coding transcripts. Default: false
        -onlyTr <file.txt>           : Only use the transcripts in this file. Format: One transcript ID per line.
        -reg <name>                  : Regulation track to use (this option can be used add several times).
        -ss , -spliceSiteSize <int>  : Set size for splice sites (donor and acceptor) in bases. Default: 2
        -spliceRegionExonSize <int>  : Set size for splice site region within exons. Default: 3 bases
        -spliceRegionIntronMin <int> : Set minimum number of bases for splice site region within intron. Default: 3 bases
        -spliceRegionIntronMax <int> : Set maximum number of bases for splice site region within intron. Default: 8 bases
        -strict                      : Only use 'validated' transcripts (i.e. sequence has been checked). Default: false
        -ud , -upDownStreamLen <int> : Set upstream downstream interval length (in bases)

Error message when running first command listed:

00:00:00        SnpEff version SnpEff 4.3t (build 2017-11-24 10:18), by Pablo Cingolani
00:00:00        Command: 'download'
00:00:00        Reading configuration file 'snpEff.config'. Genome: 'GRCh38.86'
00:00:00        Reading config file: /my/cluster/me/myproject/snpEff.config
00:00:00        Reading config file: /home/me/anaconda3/share/snpeff-4.3.1t-1/snpEff.config
00:00:00        done
00:00:00        Downloading database for 'GRCh38.86'
00:00:00        Connecting to http://downloads.sourceforge.net/project/snpeff/databases/v4_3/snpEff_v4_3_GRCh38.86.zip
00:00:00        Following redirect: https://iweb.dl.sourceforge.net/project/snpeff/databases/v4_3/snpEff_v4_3_GRCh38.86.zip
00:00:01        Local file name: '/tmp/snpEff_v4_3_GRCh38.86.zip'
..........................................................................................................................................00:00:04   ERROR while connecting to https://iweb.dl.sourceforge.net/project/snpeff/databases/v4_3/snpEff_v4_3_GRCh38.86.zip
java.lang.RuntimeException: java.io.IOException: No space left on device
        at org.snpeff.util.Download.download(Download.java:178)
        at org.snpeff.snpEffect.commandLine.SnpEffCmdDownload.downloadAndInstall(SnpEffCmdDownload.java:32)
        at org.snpeff.snpEffect.commandLine.SnpEffCmdDownload.runDownloadGenome(SnpEffCmdDownload.java:86)
        at org.snpeff.snpEffect.commandLine.SnpEffCmdDownload.run(SnpEffCmdDownload.java:72)
        at org.snpeff.SnpEff.run(SnpEff.java:1183)
        at org.snpeff.SnpEff.main(SnpEff.java:162)
Caused by: java.io.IOException: No space left on device
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:326)
        at org.snpeff.util.Download.download(Download.java:159)
        ... 5 more
00:00:04        Logging
00:00:05        Done.
snp assembly • 726 views
ADD COMMENTlink modified 10 months ago by Pierre Lindenbaum123k • written 10 months ago by abe10
0
gravatar for Pierre Lindenbaum
10 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum123k wrote:

in the file snpEff.config

there is a line data.dir that can be changed

#---
# Databases are stored here
# E.g.: Information for 'hg19' is stored in data.dir/hg19/
#
# You can use tilde ('~') as first character to refer to your home directory. 
# Also, a non-absolute path will be relative to config's file dir
# 
#---
data.dir = /path/to/what/you/want/data

you can also copy/create your own custom config file and use it with option -c

    -c , -config                 : Specify config file
ADD COMMENTlink written 10 months ago by Pierre Lindenbaum123k

Thanks for your reply. It does answer what I thought was my problem. I was able to change the output path as suggested, but still ran into the same error. I've found that the error occurs because I don't have enough space in /tmp (~300mb). I know this is a separate problem, but is there a workaround for this?

ADD REPLYlink written 10 months ago by abe10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1016 users visited in the last hour