Question: looping a list of trimed fasta files to run spades assembler
0
gravatar for m.al_amiri
14 months ago by
m.al_amiri20
m.al_amiri20 wrote:

Hi everyone, I have a list of trimmed fasta files and need to run spades assembler for them using a list my fasta files name are like that s1_2 s1_3 s1_4 ... each sequence has been trimmed using trimmomatic in a directory with the same name my question is : how I can define a list then run spades for all in a loop?

ADD COMMENTlink modified 14 months ago • written 14 months ago by m.al_amiri20
1

Not sure what the input for SPAdes needs to be, but with find . -type f -name "s1_*" you will be able to get the list of all your files

ADD REPLYlink modified 14 months ago • written 14 months ago by lieven.sterck6.7k

thanks can you write it for my 3 samples s1_2 s1_3 s1_4 ?

ADD REPLYlink written 14 months ago by m.al_amiri20

I mean listing and then looping the spades

ADD REPLYlink written 14 months ago by m.al_amiri20

OK, you lost me :/

you only have three input files? I assume you mean fastq rather then fasta, correct? Do you want to run SPAdes on all samples together or once per sample/file ?

ADD REPLYlink written 14 months ago by lieven.sterck6.7k

sorry, I have all file paired and unpaired fastaq.gz and my samples are 95 samples so I need to make a list and run spades for all

ADD REPLYlink written 14 months ago by m.al_amiri20

right, and the find command I provided in the first comment is not doing/giving what you want/expect ? Simply run that in your top folder and it will report all files matching the regex in the -name option

ADD REPLYlink modified 14 months ago • written 14 months ago by lieven.sterck6.7k

thank you still help me. this did not work

for file in $(find . -type f -name "s1_*");
 do
 spades.py -1 *_R1_paired.fastq.gz -2 *_R2_paired.fastq.gz -s *_R1_unpaired.fastq.gz -s *_R2_unpaired.fastq.gz -m 30 -o assembly -careful 
done
ADD REPLYlink modified 6 months ago by RamRS25k • written 14 months ago by m.al_amiri20

Question about the loop - will the output folder ( -o assembly) be overwritten each time it gets a new sample?

ADD REPLYlink written 6 months ago by ARich80

Probably. You'll need to check spades manual or its source code to be sure.

ADD REPLYlink written 6 months ago by RamRS25k

There are countless posts with bash loop questions, and countless solutions with either bash for loops or GNU parallel. Please read some of them and try to implement a solution, then ask a more detailed question if you get stuck.

How to run Spades For Nextseq data

Bash loop for files in several directories

For loop script

How to run a set or batch of genome assemblies at once in one go?

ADD REPLYlink written 14 months ago by h.mon29k

I have a directory with the name genome and in this directory, I have 90 directories each one for one sample. I did trimommatic for all then I need to loop them but it does not work. I used this command but nothing happened.

for FILE in (find . -type f -name "s1_*");
do
spades.py -1 *_R1_paired.fastq.gz -2 *_R2_paired.fastq.gz -s *_R1_unpaired.fastq.gz -s *_R2_unpaired.fastq.gz -m 30 -o assembly –careful
done
ADD REPLYlink modified 14 months ago by RamRS25k • written 14 months ago by m.al_amiri20
2

That's because the syntax is wrong in several ways. Go and take a look at how find works, and how to use commands in for loops (hint: $(find ...)).

Second hint, your loop declares the variable FILE but you then never use it, so it's not really any wonder the loop doesn't work.

Don't blindly copy and paste commands, attempt to understand what they do. This is important, because one time you may copy a command without thinking and erase your data, or maybe worse.

ADD REPLYlink written 14 months ago by Joe15k

Two things:

  1. Use the code formatting to present your posts better
  2. The hyphen you're using as part of the -careful seems to be a non ASCII character, probably from a copy-paste out of PDF/Word/a website. Ensure you type your commands on the terminal, avoid copy-paste unless you've gained considerable expertise at noticing non ASCII characters when you eyeball text.
ADD REPLYlink written 14 months ago by RamRS25k
2
gravatar for m.al_amiri
14 months ago by
m.al_amiri20
m.al_amiri20 wrote:

Hi, finally I did it first I made a list in the main directory which all trimmed sequences directories exist. I used the following command:

ls $search_path > list

then I run this command:

cat list | while read line;
do
cd $line
spades.py -1 *_R1_001_paired.fastq.gz -2 *_R2_001_paired.fastq.gz -s *_R1_001_unpaired.fastq.gz -s *_R2_001_unpaired.fastq.gz -m 30 -o assembly --careful
cd ../
done
ADD COMMENTlink modified 14 months ago • written 14 months ago by m.al_amiri20
3

Good job figuring it out! You can now work on making this better. For example, the file list doesn't need to exist, you can just:

ls ${search_path} | while read line;
do
# run the commands
done

In your case, you should have specified your problem in the following fashion:


I have 95 sample directories, each of which have files named in the format

s1_1_R1_001.paired.fastq.gz, s1_2_R1_001.paired.fastq.gz, s1_3_R1_001.paired.fastq.gz, ..., 
s2_1_R1_001.paired.fastq.gz, s2_2_R1_001.paired.fastq.gz, s2_3_R1_001.paired.fastq.gz, ..., 
s2_1_R1_001.unpaired.fastq.gz, s2_2_R1_001.unpaired.fastq.gz, s2_3_R1_001.unpaired.fastq.gz, ..., 
s2_1_R2_001.unpaired.fastq.gz, s2_2_R2_001.unpaired.fastq.gz, s2_3_R2_001.unpaired.fastq.gz, ...,

How can I run spades.py per directory passing in all fastq files in the appropriate parameters like so:

spades.py -1 *_R1_001_paired.fastq.gz -2 *_R2_001_paired.fastq.gz -s *_R1_001_unpaired.fastq.gz -s *_R2_001_unpaired.fastq.gz -m 30 -o assembly --careful

In the time you'd take to explain your problem in this fashion, you'd automatically figure out the solution - that's the advantage of a well written post :-)

ADD REPLYlink modified 14 months ago • written 14 months ago by RamRS25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1180 users visited in the last hour