Question

Help with writing a loop for metaspades assembler (beginner)

0

Entering edit mode

2.2 years ago

116439372 • 0

I have a large number of samples that I need to assemble using metaspades. They are named as such;

CUH001T1_unclassified_unpaired_R1 and CUH001T1_unclassified_unpaired_R2
CUH002T2_unclassified_unpaired_R1 and CUH002T2_unclassified_unpaired_R2
..
..

The script I'm using for this is;

spades.py --meta --pe1-1 CUH002T2_unclassified_paired_R1.fastq --pe1-2 CUH002T2_unclassified_paired_R2.fastq -t 20 -m 400 -o ../metaspades/CUH002T2

Does anyone know how to write a loop for this? I'm sure its relatively easy but I'm very new to bioinformatics and can't figure it out.

metaspades • 1.8k views

ADD COMMENT • link updated 14 months ago by Ram 43k • written 2.2 years ago by 116439372 • 0

1

Entering edit mode

$ find . -type f -name "*_unclassified_paired_R1*" -exec basename {} \; | while read line; do echo spades.py --meta --pe1-1  /home/user/$line --pe1-2 /home/user/${line/_R1/_R2} -t 20 -m 400 -o ../metaspades/${line%%_*} ;done

Replace /home/user with appropriate directory path. Remove echo once you are okay with dummy run

ADD REPLY • link 2.2 years ago by cpad0112 21k

0

Entering edit mode

This worked for me, thank you!

ADD REPLY • link 2.2 years ago by 116439372 • 0

0

Entering edit mode

MYPATH="/path/to/fastq/data"; for FLE in $(ls ${MYPATH}/*.fq | sed 's/_R[12].*$//' | sort | uniq); do echo spades.py --meta --pe1-1 ${FLE}_R1.fastq --pe1-2 ${FLE}_R2.fastq -t 20 -m 400 -o ../metaspades/$(basename ${FLE}; done

Try this. You'll need to point MYPATH= to wherever it is your data is stored (e.g., /home/myname/metagenomics). Then try running the entire command from the terminal. It should print spades.py <blah> to the console, one line each for as many samples as you have. If everything looks fine, remove the echo in there, and run it.

I'm not sure for looping through your datasets is the most efficient way though. Are you working on a local workstation of some sort or a cluster?

ADD REPLY • link 2.2 years ago by Dunois ★ 2.5k

score 0 · Answer 1 · 2022-02-13

Use basename to get filename without path, then cut by underscore.

for sample in $dir/*_unclassified_unpaired_R1.fastq ; do
sample_name = $(basename $sample | cut -d'_' -f1)
spades.py --meta --pe1-1 ${sample_name}_unclassified_paired_R1.fastq \
--pe1-2 ${sample_name}_unclassified_paired_R2.fastq -t 20 -m 400 -o ../metaspades/${sample_name} ;
done

score 0 · Answer 2 · 2022-02-14

I use something like this for a spades loop. Start with: bash yourScript.sh

Note I begin with trimmed FASTQs, as you should too.

#!/bin/bash

# Start spades assemblies

for i in `ls *R1.trm.fastq`

        do
        echo $i
        echo "Input file 1: " $1
        fastq=$1
        # derive R2 from R1
        fastq2="${fastq/R1/R2}"

        # Run script

        spades.py -o $fastq.spades -t 18 -m 250 --meta -1 $fastq -2 $fastq2

done