Question

How to run SPANDX3.2

0

Entering edit mode

3.6 years ago

Morello Salesman • 0

Is there anyone who can help me with running Spandx 3.2? I downloaded the file from https://sourceforge.net/projects/spandx/, and every time I try to run the program it shows this:

Organism = haploid
Output directory and directory containing sequence files = /home/user/spandx/SPANDx_v3.2 /SPANDx
SNP matrix will be created? = no
Genomes will be annotated? = no
Strain(s) to be processed = all
Sequence technology used = Illumina
Pairing of reads = PE
Variant genome specified for SnpEff = no
Window size for BedTools = 1000
Indels will be merged and corrected = no
-------------------------------------
SPANDx.sh: line 237: cd: too many arguments

Couldn't locate reference file. Please make sure that reference file is in the analysis directory, you have specified the reference name correctly, and that the .fasta extension is not included.

I know I'm doing something wrong. Please help me out.

PS: I don't have enough RAM to run SPANDX4.0

snp • 872 views

ADD COMMENT • link updated 3.6 years ago by dsarovich ▴ 10 • written 3.6 years ago by Morello Salesman • 0

0

Entering edit mode

every time I try to run the program

how to you run the program ?

ADD REPLY • link 3.6 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

I followed the SPANDX_Manual_V3.2.pdf and my command line looks like this.

perl SPANDx.sh -r k96243 -o haploid -m yes -i yes -a yes -p PE B83_R1.fastq.gz B83_R2.fastq.gz

ADD REPLY • link updated 3.6 years ago by zx8754 11k • written 3.6 years ago by Morello Salesman • 0

0

Entering edit mode

why perl ??

ADD REPLY • link 3.6 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

If I directly run SPANDx.sh, it says "SPANDx.sh: command not found"

ADD REPLY • link 3.6 years ago by Morello Salesman • 0

0

Entering edit mode

A script with the suffix '.sh' is most probably a shell script, not a PERL script.

how about

./SPANDx.sh

?

ADD REPLY • link 3.6 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

./SPANDx.sh -r k96243 -m yes -a yes -o haploid BP23.fastq.gz BP3504.fastq.gz BP6260.fastq.gz BP6887.fastq.gz BPZUSM.fastq.gz

Reference = k96243 SNP matrix will be generated? yes VCF files will be annotated with SnpEff? yes Organism = haploid

The following parameters will be used

Organism = haploid Output directory and directory containing sequence files = /home/user/spandx/SPANDx_v3.2 /SPANDx SNP matrix will be created? = yes Genomes will be annotated? = yes Strain(s) to be processed = all Sequence technology used = Illumina Pairing of reads = PE Variant genome specified for SnpEff = no Window size for BedTools = 1000

Indels will be merged and corrected = no

./SPANDx.sh: line 237: cd: too many arguments Couldn't locate reference file. Please make sure that reference file is in the analysis directory, you have specified the reference name correctly, and that the .fasta extension is not included

It's still showing the same thing as before.

ADD REPLY • link 3.6 years ago by Morello Salesman • 0

0

Entering edit mode

Spandx automatically detects fastq files. This should be your input:

./SPANDx.sh -r k96243 -m yes -a yes

-o is not needed (it's for specifying the organism name).

Have a look on the github page for usage because you seem slightly confused: https://github.com/dsarov/SPANDx

If you specify the -a flag, you need to provide the name of the SNPeff db with -v.

The authors are super helpful, post an issue on github and you'll get a reply fairly quickly.

ADD REPLY • link 3.6 years ago by Mark ★ 1.5k

score 1 · Answer 1 · 2020-10-08

Hi Morello,

Have you had any luck getting it to run? We generally make a new analysis directory for each run so you'll need to type the full path to the location of the SPANDx.sh script (e.g. /home/user/bin/SPANDx/SPANDx.sh). The exact location will differ depending on where you've installed it.

Depending on your system, you might have to change the "scheduler" variable in the config file. Do you know what scheduler is on your system? The older version of SPANDx should work with the most common ones. You also might be able to change the memory requirements for SPANDx 4 in the config file (nextflow.config). I've included fairly generous allocations by default but your data may not need it.

Lastly, you'll need to rename your sequence files to strain_1_sequence.fastq.gz and strain_2_sequence.fast.gz. SPANDx will scan the current directory for files using that format and include them by default in the analysis.

Feel free to e-mail me though if you are still having problems and I can help out.

Cheers,

Derek