Question: output path issue on Busco (python)
0
gravatar for Darrill
11 months ago by
Darrill0
Darrill0 wrote:

Hi everyone, I'm actually running a busco.py program to find orthologous genes present in all insects in my genomes. To do so I created a bash file to run it on the cluster. Here is the file:

 #!/bin/bash
    #SBATCH -t 24:00:00
    #SBATCH -e path/busco_job.log/busco_job.error
    #SBATCH -o path/busco_job.log/busco_job.out
    date;hostname;pwd
    ASSEMBLY=path/genome.fasta
    LINEAGE=path/hymenoptera_odb9
    SAMP=my_species
    NAME=$SAMP'_BUSCO_v3'
    #########################################
    # define PATH to sofwtare used by BUSCO #
    #########################################
    #Augustus
    export PATH=/bin:/usr/bin:/usr/remote/bin:path/Augustus3.3/bin:path/Augustus3.3/scripts
    # hmmer
    PATH=$PATH:/path/hmmer-3.2.1/bin
    # blast et python
    PATH=$PATH:/path/ncbi-blast-2.8.1+/bin
    PATH=$PATH:/usr/bin
    # augustus
    export AUGUSTUS_CONFIG_PATH=/path/Augustus3.3/config

    ################
    # Command line #
    ################
    export PATH=/usr/remote/Python-3.6.5/bin:$PATH
    PATH=$PATH:/usr/bin
    out_path = path/run_busco
    export PYTHONPATH=$PYTHONPATH:~/path/site-packages
    python3 /path/busco-masterV3/scripts/run_BUSCO.py -i $ASSEMBLY -o $NAME -l $LINEAGE -m geno -f

The main issue is that the program busco.py by default write the output files into the directory where the python busco.py is ran but I would like to change the directory where are written the output files. And in the documentation they say that the option out_path can be modified from 2 ways: One is to modifie the path directly on the config.ini file or to provide input parameters through the command line which will override those defined in config.ini (and it is this solution I want to use). But it does not work even if I write in the run.sh file out_path = my_desired_path

Here is the documentation concerning the path:

In this file (config.ini), you must declare the paths to all dependencies (see below) and you can optionally define the required input parameters (described later in this document). Note: providing input parameters through the command line will override those defined in config.ini. The config.ini.default file is extensively commented and self explanatory. here is the head of the content of config.ini:

# BUSCO specific configuration
# It overrides default values in code and dataset cfg, and is overridden by arguments in command line
# Uncomment lines when appropriate
[busco]
# Input file
;in = ./sample_data/target.fa
# Run name, used in output files and folder
;out = SAMPLE
# Where to store the output directory
;out_path = ./sample_data
# Path to the BUSCO dataset
;lineage_path = ./sample_data/example
# Which mode to run (genome / protein / transcriptome)
;mode = genome
# How many threads to use for multithreaded steps
;cpu = 1
# Domain for augustus retraining, eukaryota or prokaryota
# Do not change this unless you know exactly why !!!
;domain = eukaryota
# Force rewrite if files already exist (True/False)
;force = False
# Restart mode (True/False)
;restart = False
# Blast e-value
;evalue = 1e-3

So I was wondering why even if I write in my script : out_path = /path/run_busco the out_file are still in the ./sample_data ??

Thank you for your help.

busco path python • 631 views
ADD COMMENTlink modified 11 months ago by h.mon28k • written 11 months ago by Darrill0

Hello,

I don't know the program. But I guess you have to remove the ; before the out_path parameter in the config file, so that whatever you declare there have an effect.

fin swimmer

ADD REPLYlink written 11 months ago by finswimmer13k

Yes I removed the ; part but there is still the same issue.

ADD REPLYlink written 11 months ago by Darill30

It would be odd if the config file is using ; in some way. But in that case can you specify a directory you want the output to go to in ;out_path = /path_to_dir_you_want

ADD REPLYlink written 11 months ago by genomax75k

It would be odd if the config file is using ; in some way.

The php config file php.ini for example uses this to comment out parameters.

ADD REPLYlink written 11 months ago by finswimmer13k

Yep it works if I modify it directly in the config.ini file of course but the output path will change depending on the script I use...

I have around 100 script to run with a unique path for each job, that is why I want to incorporate the out_path directly in my script and not in the config.ini which does not change.

ADD REPLYlink written 11 months ago by Darill30

Have the script generate/modify the config.ini.

ADD REPLYlink written 11 months ago by cschu1811.9k
2
gravatar for h.mon
11 months ago by
h.mon28k
Brazil
h.mon28k wrote:
ERROR   Please do not provide a full path in --out parameter, no slash. Use out_path in the config.ini file to specify the full path.
  

It is a bit annoying you can't just give a path to --out. I would solve (in fact, it is what I do when I use BUSCO) the issue in a simpler manner than editing the config for every run: I just create and cd into the desired output directory before running BUSCO.

################
# Command line #
################
export PATH=/usr/remote/Python-3.6.5/bin:$PATH
PATH=$PATH:/usr/bin
mkdir path/run_busco
cd path/run_busco
export PYTHONPATH=$PYTHONPATH:~/path/site-packages
python3 /path/busco-masterV3/scripts/run_BUSCO.py -i $ASSEMBLY -o $NAME -l $LINEAGE -m geno -f

Of course, if "ASSEMBLY=path/genome.fasta" and "LINEAGE=path/hymenoptera_odb9" are relative paths, they have to be tweaked to work in the new folder - if they are absolute paths, they will work regardless of where BUSCO is running.

ADD COMMENTlink written 11 months ago by h.mon28k

Ok I see the idea, it is a good one thank you very much for your help.

ADD REPLYlink written 11 months ago by Darrill0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 809 users visited in the last hour