Question

Does anyone know how to download genBlastG?

0

Entering edit mode

13 months ago

jaredbernard ▴ 20

I'm trying to find/use a popular gene finding tool called genBlastG 1.39 (She et al. 2011), but the source website seems to be deprecated, giving me 404 errors. It appears to have not been updated for 7 years. I have tried downloading the source code on linux using:

sudo wget http://genome.sfu.ca/genblast/latest/genblast_v139.zip && sudo unzip genblast_v139.zip && sudo rm genblast_v139.zip

but still get the 404 error. But it has been cited by recent genomics papers (e.g. Navarro-Escalante et al. 2021, Tan et al. 2021), so clearly someone is getting it to work. Does anyone have any ideas on how to use this program?

Or if anyone has newer alternatives, I'm open to that. I see Keilwagen et al. 2018 have created GeMoMa, so I may look into that. The newest OrthoFinder (Emms & Kelly 2019) seems to have some of the same function in order to categorize genes into orthogroups, but I would like to compare the OrthoFinder results with genBlastG, since some papers use genBlastG instead or in addition to OrthoFinder.

Thanks for any feedback!

gene-finding genBlastG • 1.9k views

ADD COMMENT • link updated 7 months ago by dukecomeback ▴ 40 • written 13 months ago by jaredbernard ▴ 20

0

Entering edit mode

Thank you both! I'm trying it now. It seems the default installation with bioconda is version 1.38, whereas 1.39 gives me an incompatibility "UnsatisfiableError" with libstdcxx-ng. But perhaps I can work around it...

ADD REPLY • link 13 months ago by jaredbernard ▴ 20

0

Entering edit mode

As a follow-up, GenoMax and Mensur Dlakic, I was able to install both genblast packages with bioconda (thank you!) but I can't figure out how to execute them. I keep getting the dreaded "command not found" error for either one, despite ensuring to add the miniconda3 path and editing .bashrc, etc. This seems to be the problem someone experienced on this thread. Someone suggested activating the conda environment, but genblasta or genblastg aren't environments. (I tried it anyway, of course.) Any ideas on why this many be happening? I keep thinking I've set up conda wrong, but I was able to install and use GeMoMa via bioconda. Thanks again for any advice!

ADD REPLY • link 12 months ago by jaredbernard ▴ 20

1

Entering edit mode

If you had simply done

conda install genblastg

then you need to

conda activate (i.e. activate the base environment)

At this point you should be able to find the executable.

Ideally you should have created a new environment

conda create -n geneblast genblatg
conda activate genblast

ADD REPLY • link 12 months ago by GenoMax 141k

0

Entering edit mode

Thanks for the reply! I did activate my conda environment and then installed genblastg. I called the environment something else, but I don't see why that should matter. I'll see if this works...

ADD REPLY • link 12 months ago by jaredbernard ▴ 20

0

Entering edit mode

Unless the named environment (where you installed the program) is active you will not be able to run the installed program. Simply conda activate only activates the base environment.

ADD REPLY • link 12 months ago by GenoMax 141k

0

Entering edit mode

Thanks! Yes, I had created a conda environment called opencv, activated opencv, and installed the genblast programs therein, but they would not run. I also tried another environment. It did not work until I used the genblast environment name that you and Mensur Dlakic recommended. I don't understand it, but it only worked for that environment.

ADD REPLY • link 12 months ago by jaredbernard ▴ 20

1

Entering edit mode

To create an environment:

conda create -n genblast -c bioconda genblasta genblastg

Activate:

conda activate genblast

Then type genblastA or genblastG, as needed:

genBlastA release v1.0.1

SYNOPSIS:
Given a list of query protein or DNA sequences and a target database that
consists of DNA sequences, this program runs wu-blast tblastn on the list
of sequences provided, then for each query, it groups the resulted HSPs
into sensible groups so that each group of HSPs corresponds to a potential
target gene that is homologous to the query. The output is ranked according
to their homology to the query.

Command line options:
        -P      Search program used to produce blast-format sequence alignments,
                can be either "blast" or "wublast", default is "blast",
                optional
        -q      List of query sequences to blast, must be in fasta format,
                required
        -t      The target database of genomic sequences in fasta format,
                required
        -p      Whether query sequences are protein sequences (T/F)
                [default: T], optional
        -pg     Specify which blast/wublast program to run. If not specified,
                the default behaviour is to run tblastn (for blast/wublast protein
                sequence) / blastn (for blast nucleotide sequence) or tblastx
                (for wublast nucleotide sequence).
        -e      parameter for blast: The e-value, [default: 1e-2],
                optional
        -g      parameter for blast: Perform gapped alignment (T/F)
                [default: T], optional
        -f      parameter for blast: Perform filtering (T/F) [default: F],
                optional
        -a      parameter for genBlast: weight of penalty for skipping HSPs,
                between 0 and 1 [default: 0.5], optional
        -d      parameter for genBlast: maximum allowed distance between HSPs
                within the same gene, a non-negative integer [default: 100000],
                optional
        -r      parameter for genBlast: number of ranks in the output,
                a positive integer, optional
        -c      parameter for genBlast: minimum percentage of query gene
                coverage in the output, between 0 and 1 (e.g. for 50%
                gene coverage, use "0.5"), optional
        -s      parameter for genBlast: minimum score of the HSP group in
                the output, a real number, optional
        -o      output filename, optional. If not specified, the output
                will be the same as the query filename with ".gblast"
                extension.

Example:
genblasta -P blast -pg tblastn -q myquery -t mytarget -p T -e 1e-2 -g T -f F -a 0.5 -d 100000 -r 10 -c 0.5 -s 0 -o myoutput

(Rong She (rshe@cs.sfu.ca)      May 2010)

ADD REPLY • link 12 months ago by Mensur Dlakic ★ 27k

0

Entering edit mode

I'm very grateful to both you and GenoMax for your helpful advice! The programs appear to be functional now. The only issue I now have is that I get an error that says

sh: 1: ./formatdb: not found
XDF file error

So it seems that it requires a blast database, even though the documentation doesn't mention this input. If I specify wublast, the error says ./xdformat: not found instead. I wonder if I could use a .blastxml search database -- presumably for the target species not the query sequences from the reference species.

ADD REPLY • link 12 months ago by jaredbernard ▴ 20

score 1 · Answer 1 · 2023-04-06

1

Entering edit mode

13 months ago

GenoMax 141k

It appears to be available on conda: https://bioconda.github.io/recipes/genblastg/README.html So install using conda.

ADD COMMENT • link 13 months ago by GenoMax 141k

score 1 · Answer 2 · 2023-04-06

1

Entering edit mode

13 months ago

Mensur Dlakic ★ 27k

You may want to try one of the following two methods:

ADD COMMENT • link 13 months ago by Mensur Dlakic ★ 27k

score 0 · Answer 3 · 2023-06-13

In case anyone ends up needing genBlastG, I found a solution. Seven years ago, Michael Paulini posted the genblastg_patch for WormBase on github. Whenever I tried to use this version or the versions available on conda, I got an error saying it could not find the blastall or formatdb files, or that genblastG was not a viable command. Last year, Guisen Chen posted a python version called genblastG_extension on github. However, it also failed because it was missing the alignscore.txt file, but it does have those other missing files. So I copied the files contained in genblastg_patch (which included alignscore.txt) into the genblastG_extension directory, and executed it without python, using the same syntax shown by Mensur Dlakic above. And it works!

Obviously this wouldn't be necessary if the original package had been maintained, likely a consequence of transient workers like grad students or postdocs creating something and never managing it thereafter. This may be the case with a new package called TGFAM-Finder, which is also meant to be a homology-based gene finder, because every time I tried to install it, I got an error saying "resource temporarily unavailable." It would be ideal to use GeMoMa for homology-based gene finding, but there too I got errors saying "there are gene annotations on chromosomes/contigs with missing reference sequences ..." and "Did not finish as intended." But GeMoMa is mainly developed for whole genome annotation anyway, with no tutorials on gene family analysis. The growing need to compare gene families among already-annotated genomes and newly annotated ones will probably lead to new packages in the coming years so that hacking a deprecated one won't be necessary.