Question

Tutorial:Troubleshooting issues with SURPI pipeline for pathogen identification from complex metagenomic NGS data.

8

Entering edit mode

11.0 years ago

GouthamAtla 12k

SURPI is a pipeline to find out the pathogens from clinical metagenomics samples. It is tested only on Ubuntu and it assumes many things about your installation. I recently installed in on CentOS. These are few key points you need to take care before running the pipeline.

The github page of SURPI is https://github.com/chiulab/surpi and it's published in Genome Research.

The create_snap_to_nt.sh program uses -Ofactor as 1000, on line 29, which may not work for your machine. You need to figure out the correct value and make necessary changes. Read snap aligner document for details.
The abyss installation requires mmap. Make sure you have installed it before compiling abyss. http://hackage.haskell.org/package/mmap-0.5.9/mmap-0.5.9.tar.gz
Make sure formatdb is there in your path. It can be downloaded from ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/LATEST/
The taxonomy_lookup.pl program, at line 84 has sort --parallel=$cores, where you may need to remove --parallel=$cores option, if the sort utility on you machine does not support --parallel option.
The abyss_minimus.sh program tries to use mpirun to make it parallel. If the mpirun is not configured properly, you need to remove the option 'np=$cores' in line 86, so that it will not be run parallelly.
The ribo_snap_bac_euk.sh program is hardcoded to use the 10,75 as arguments to crop_reads.csh, which you may need to change in line 43.
The coveragePlot.py program uses mlab.load() at line 47, which is deprecated in latest version of matplotlib. Hence, you may need to change it to np.loadtxt()

I will update this post as and when I find more issues with the pipeline.

I forgot to mention that the configuration file that gets created, will keep the wrong path for the reference sequences. For example, in your case the path to db is dbname=/reference/taxonomy/gi_taxid_nucl.db but I am sure it's not /reference/ instead either it should be reference/ or <full path>/reference/...

Change all the paths carefully in the configuration file.

metagenomics clinical-genomics • 7.4k views

ADD COMMENT • link updated 3.8 years ago by Ram 45k • written 11.0 years ago by GouthamAtla 12k

Ram · Answer 1 · 2014-12-19

Wow, that is a long list. Apart from that I installed SURPI and tryed to start the pipeline. But when I start it there is instantly a file created which contains the following error message:

nohup: ignoring input
DBI connect('dbname=/reference/taxonomy/gi_taxid_nucl.db','',...) failed: unable to open database file at /home/jsusat/surpi-1.0.18/taxonomy_lookup_embedded.pl line 106.
/home/jsusat/surpi-1.0.18/SURPI.sh: line 507: [: =: unary operator expected

Do I have to set up a reference before is start? Or is the taxonomy lookup script itself fetching a db? I just followed the steps from the website:

http://chiulab.ucsf.edu/surpi/documentation/How_to_run_SURPI.pdf

With kind regards,
Julian