Issue about "Augustus did not recognize any genes"
7 months ago
boymin2020

Hi all,

I am a sprog to BUSCO and have been struggling with the current issue for near 2-3 weeks. Briefly speaking, the Augustus invoked by BUSCO can not recognize any genes. Below is my shell command line.

${busco} --config /public/home/lvzhenming/soft/tools/busco-5.0.0/config/config.ini -f -c ${Ncores} --offline -i /home/lvzhenming/public/home/lvzhenming/project/sjl/reference/GWHACFF00000000.genome.fasta -o sjl_batchRef_busco_v --out_path /public/home/lvzhenming/project/sjl/reference/resultsFromBusco4Ref -m genome -l /public/home/lvzhenming/project/sjl/reference/busco_downloads/vertebrata_odb10

And the following is the version information of the main software used in my script: BLAST v2.3.0+, BUSCO v4.1.4, AUGUSTUS v3.3.2. The input GWHACFF00000000.genome.fasta derived from a fish was downloaded from a published paper. For lineage_dataset, except vertebrata_odb10, I also tried eukaryota_odb10 and actinopterygii_odb10 but failed with the same error.

Regarding the busco.log file generated by BUSCO was partly shown below.

INFO:busco.run_BUSCO    ***** Start a BUSCO v4.1.4 analysis, current time: 03/26/2021 17:13:12 *****
DEBUG:busco.ConfigManager   Getting config file
INFO:busco.ConfigManager    Configuring BUSCO with /public/home/lvzhenming/soft/tools/busco-5.0.0/config/config.ini
INFO:busco.BuscoConfig  Mode is genome
INFO:busco.BuscoConfig  Input file is /home/lvzhenming/public/home/lvzhenming/project/sjl/reference/GWHACFF00000000.genome.fasta
DEBUG:busco.BuscoConfig State of BUSCO config before run:
DEBUG:busco.BuscoConfig {'_allow_no_value': False,
INFO:busco.BuscoDownloadManager Using local lineages directory /public/home/lvzhenming/project/sjl/reference/busco_downloads/vertebrata_odb10
DEBUG:busco.BuscoAnalysis   Check all required tools are accessible...
DEBUG:busco.BuscoAnalysis   Checking dataset for HMM profiles
INFO:busco.BuscoAnalysis    Running BUSCO using lineage dataset vertebrata_odb10 (eukaryota, 2021-02-19)
DEBUG:busco.BuscoTools  Tool: makeblastdb
DEBUG:busco.BuscoTools  Version: 2.3.0+
INFO:busco.Toolset  Running 1 job(s) on makeblastdb, starting at 03/26/2021 17:13:15
INFO:busco.BuscoTools   Creating BLAST database with input file
DEBUG:busco.Toolset cmd call: /public/home/lvzhenming/soft/tools/ncbi-blast-2.3.0+/bin/makeblastdb -in /home/lvzhenming/public/home/lvzhenming/project/sjl/reference/GWHACFF00000000.genome.fasta -dbtype nucl -out /public/home/lvzhenming/project/sjl/reference/resultsFromBusco4Ref/sjl_batchRef_busco_v/blast_db/GWHACFF00000000.genome.fasta
INFO:busco.Toolset  [makeblastdb]   1 of 1 task(s) completed
INFO:busco.BuscoTools   Running a BLAST search for BUSCOs against created database
DEBUG:busco.BuscoTools  Tool: tblastn
DEBUG:busco.BuscoTools  Version: 2.3.0+
INFO:busco.Toolset  Running 1 job(s) on tblastn, starting at 03/26/2021 17:13:27
DEBUG:busco.Toolset cmd call: /public/home/lvzhenming/soft/tools/ncbi-blast-2.3.0+/bin/tblastn -evalue 0.001 -num_threads 12 -query /public/home/lvzhenming/project/sjl/reference/busco_downloads/vertebrata_odb10/ancestral -db /public/home/lvzhenming/project/sjl/reference/resultsFromBusco4Ref/sjl_batchRef_busco_v/blast_db/GWHACFF00000000.genome.fasta -out /public/home/lvzhenming/project/sjl/reference/resultsFromBusco4Ref/sjl_batchRef_busco_v/run_vertebrata_odb10/blast_output/tblastn.tsv -outfmt 7
INFO:busco.Toolset  [tblastn]   1 of 1 task(s) completed
INFO:busco.GenomeAnalysis   Running Augustus gene predictor on BLAST search results.
INFO:busco.BuscoTools   Running Augustus prediction using human as species:
DEBUG:busco.BuscoTools  Tool: augustus
DEBUG:busco.BuscoTools  Version: 3.3.2
INFO:busco.Toolset  Running 3971 job(s) on augustus, starting at 03/26/2021 19:43:15

INFO:busco.Toolset  [augustus]  3971 of 3971 task(s) completed
INFO:busco.BuscoTools   Extracting predicted proteins...
**ERROR:busco.run_BUSCO Augustus did not recognize any genes matching the dataset vertebrata_odb10 in the input file. If this is unexpected, check your input file and your installation of Augustus**

Thanks a lot for any advice !!!

There is something odd here: you seem to have a version conflict BUSCO v4.1.4, but your path contains "busco-5.0.0". Is there a reason why you need to run the older version, e.g. trying to reproduce published scores? If not, try to install the latest version 5.0.0 via conda and run it using MetaEUK as gene predictor.

If you use Augustus, it doesn't seem to go down well with MacOS/BSD or large number of parallel processes. Try running with -c 1 (max 10).

Also, it could be that the linage database file is broken. Try to download it again and check with different linage files, like eukaryota, metazoa. If those work, the lineage file could be broken.


Thank you very much for your advice.

Regarding the BUSCO version problem, no matter how I install it (by conda or from source), the log file from busco-5.0.0 always show "Start a BUSCO v4.1.4 analysis". But I tried both of v4.1.4 and v5.0.0 and got the same error.

Regarding the OS, I run the script on a Linux Server with one core (-c 1) and got the same error.

Regarding the different linage files, I already tried Eukaryota, metazoa & Vertebrata and got the same error.

Regarding MetaEUK, I think it is the last option to debug. I am trying...

7 months ago

I think you have a PATH problem. I can confirm that a v5 run output starts with the correct version info:

***** Start a BUSCO v5.0.0 analysis, current time: 03/28/2021 14:38:36 *****

That means, if your run shows v4.1.4 this is the version of wrapper scripts that are running, possibly with a configuration file that is for v5.

Also, for BUSCO 5, MetaEUK is the default, unless --augustus is specified, therefore if you are running version 5 with default parameters you should not see augustus errors. So, I'd try to get the install cleaned up first. In the meantime I am running busco on the assembly to see if anything comes out of it.

So, I ran busco v5.0.0 for eukaryota and actinopterygii and the genome seems highly complete. I got:

  • actinopterygii odb10: C:98.5%[S:97.5%,D:1.0%],F:0.3%,M:1.2%,n:3640
  • eukaryota_odb10: C:99.2%[S:96.5%,D:2.7%],F:0.4%,M:0.4%,n:255

    Commandline parameters:

  nohup nice -2 busco -m genome  -i GWHACFF00000000.genome.fasta -o GWHACFF00000000_act -f  -l actinopterygii_odb10 --long -c 50 &
  nohup nice -2 busco -m genome  -i GWHACFF00000000.genome.fasta -o GWHACFF00000000_euk -f  -l eukaryota_odb10 -c 50 &

I can give you the complete output if you need it.

Yes, U R right. I had a PATH problem, which was caused by my direct use of the ./bin/busco in the source folder. I have successfully run the script and solved the issue.

Thanks a lot, a very good lesson.


