Cutadapt error: too many parameters.
0
0
Entering edit mode
8 days ago

Hi biostars community!

I am having issues to loop cutadapt over gunzipped samples. This is the script I am using:

#!/bin/bash
#SBATCH --account GRINFISH
#SBATCH -c 8
#SBATCH --mem 96g
#SBATCH --output logfile.out
#SBATCH --error logfile.err

# This script performs trimming for PE sequences with cutadapt and then runs fastqc in the result.

# Set the number of parallel processes/threads to match the allocated CPUs
PARALLEL_PROCESSES=$SLURM_CPUS_PER_TASK

# Setting parameters

LISTFOR=lists/forward-list.txt
LISTREV=lists/reverse-list.txt
TRIMDIR=trim/1stBatch/
ADAPTERS=refs/NexteraPE_NT.fa

# Performing cutadapt

parallel --jobs $PARALLEL_PROCESSES "cutadapt -a file:"${ADAPTERS}" -A file:"${ADAPTERS}" -o "${TRIMDIR}"{.}_trimmed_1.fq.gz -p "${TRIMDIR}"{.}_trimmed_2.fq.gz {1} {2}" ::: ${LISTFOR} ::: ${LISTREV}

I have some idea that the issue may be in the way the "" or '' are used in the cutadapt call, I am not sure though. The paths are correct and both cutadapt and parallel are installed.

Thank you!

cutadapt parallel bash • 407 views
ADD COMMENT
1
Entering edit mode

Somewhat unrelated but if you are using SLURM on a cluster why add the complication of parallel? Simply submit multiple jobs with your samples directly to SLURM.

ADD REPLY
0
Entering edit mode

Hi,

Thank you for taking the time for reading it and answering!

I am not entirely sure I understand, I am using GNU parallel as substitute of a for loop to loop over the list of samples. Perhaps you mean to do something like this?:

#!/bin/bash
#SBATCH --account MyAcount
#SBATCH -c 8
#SBATCH --mem 96g
#SBATCH --output logfile.out
#SBATCH --error logfile.err

cutadapt -a file:refs/NexteraPE_NT.fa -A file:refs/NexteraPE_NT.fa -o trim/1stBatch/{.}_trimmed_1.fq.gz -p trim/1stBatch/{.}_trimmed_2.fq.gz  folder/folder/*/*_1.fq.gz folder/folder/*/*_2.fq.gz
ADD REPLY
0
Entering edit mode

using GNU parallel as substitute of a for loop

Not being a parallel user I missed that application. But isn't this inefficient? You have a single SLURM job that all the processing is happening within and it is constrained by 8 cores. If you were to submit multiple parallel jobs for each sample those would likely run faster (within resources allocated for your account. But this may simply be a matter of how one is used to doing things.

As for your original question using single quotes on the outside may do the trick. Looks like the expansion of the options seems to be confusing cutadapt.

parallel --jobs $PARALLEL_PROCESSES 'cutadapt -a file:"${ADAPTERS}" -A file:"${ADAPTERS}" -o "${TRIMDIR}"{.}_trimmed_1.fq.gz -p "${TRIMDIR}"{.}_trimmed_2.fq.gz {1} {2}' ::: ${LISTFOR} ::: ${LISTREV}
ADD REPLY
0
Entering edit mode

Thank you!

I don't think I have tried that alternative specifically.

Concerning whether it is inefficient or not, being honest I do not really know. The ultimate objective I have is to loop over a list where the file names are contained. The other option would be to use wild cards, but I have tried this and it does not work.

Let's see if it goes this time.

Again, thank you for your time

ADD REPLY
1
Entering edit mode

Just for illustrative purposes you could submit multiple SLURM jobs as follows. Remove the word echo before sbatch is all command lines look correct. (Ref: cutadapt loop and paired-end reads )

for i in *_R1.fastq.gz
do
  SAMPLE=$(echo ${i} | sed "s/_R1\.fastq\.gz//")
  echo ${SAMPLE}_R1.fastq.gz ${SAMPLE}_R2.fastq.gz
  echo sbatch -p Partition --account GRINFISH --mem=NNg -c 8 -o log.out -e log.err --wrap="cutadapt -a file:refs/NexteraPE_NT.fa  -A file:refs/NexteraPE_NT.fa -o ${SAMPLE}_trimmed_1.fq.gz -p ${SAMPLE}_trimmed_2.fq.gz"  
done
ADD REPLY

Login before adding your answer.

Traffic: 1144 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6