Question

How to use cd-hit-para.pl in SGE?

0

Entering edit mode

21 months ago

FelipeMSD • 0

Dear all,

I have a fasta file with more than 200MM protein sequences that I would like to cluster in a non-redundant catalogue (100% identity) using cd-hit, but as this file is so big I thought using cd-hit-para.pl could be a good option to optimize it. At my institution we use SGE and I was trying to run this qsub script (below) to send the job to a queue, but with not success (Error message: no host at /bin/cd-hit-para.pl line 97). I followed the user guide (http://www.bioinformatics.org/cd-hit/cd-hit-user-guide.pdf) but think I didn't understand well on how to use it. Do you have an example on how to run cd-hit-para.pl in SGE or tell me if there is a better way to use cd-hit for a large file like that?

Script:

#!/bin/bash
#$ -N cdhit
#$ -o /output/logs/$JOB_NAME_$JOB_ID.out
#$ -e  /output/error/$JOB_NAME_$JOB_ID.err
#$ -l virtual_free=20G,h_vmem=20G,h_rt=6:00:00
#$ -q long-sl7
#$ -pe smp 8

cd-hit-para.pl -i file.faa -o file_100.faa -c 1.0 -M 20000 -T $NSLOTS --T "SGE"-Q 20

Command line:

$ qsub cdhit

SGE cd-hit-para.pl • 445 views

ADD COMMENT • link 21 months ago by FelipeMSD • 0