Question

Bash: For loop with two statements, first statements as input to second with .fasta files

2

Entering edit mode

6.4 years ago

rah ▴ 30

im working on several .fasta files which in their name contains the chr_start_end.fasta which I want to iterate through and then extract the individual size for each fasta file. Then I want to use the size as an input to another command for each fasta file, so in the same for loop.

To extract the coordinates from the file I use as an example:

echo "chr10_126777139_126791124.fasta" | awk -F'[_.]' '{print $3-$2}'

which yields = 126791124 - 126777139 = 13985

Then I want to give the 13985 as an input to an genome assembly tool called canu. Like this example

canu -assemble -p asstest -d . -genomeSize=13985 -nanopore-raw chr10_126777139_126791124.fasta

I've tried this so far, but I cant get it to work properly.

for f in *.fasta; do Gen_size=$(echo "$f" | awk -F'[_.]' '{print $3-$2}') canu -assemble -p asstest -d . genomeSize=$Gen_size -nanopore-raw $f; done

I want to do this for several .fasta files at once, do any of you have any suggestions on how to pass one input to the next statement within the same for loop? Thanks?

fasta assembly bash sequence • 1.9k views

ADD COMMENT • link updated 2.3 years ago by Ram 45k • written 6.4 years ago by rah ▴ 30

0

Entering edit mode

It looks fine. What is the error you get?

Also, the line starting with "canu" should be on a new line.

ADD REPLY • link 6.4 years ago by Fabio Marroni ★ 3.0k

2

Entering edit mode

You need a semi colon after your echo/awk command and before you call canu:

for f in *.fasta; do Gen_size=$(echo "$f" | awk -F'[_.]' '{print $3-$2}') canu -assemble -p asstest -d . genomeSize=$Gen_size -nanopore-raw $f; done
                                                                         ^ here

ADD REPLY • link 6.4 years ago by Joe 22k

0

Entering edit mode

Thank you, nicely noticed.

ADD REPLY • link 6.4 years ago by rah ▴ 30

score 2 · Answer 1 · 2019-02-20

2

Entering edit mode

6.4 years ago

ATpoint 88k

Hope I got your question correctly: Wrap it into a function and parallelize with GNU parallel using $JOBS as the number of parallel jobs. Not familiar with Canu so you might have to tune it a bit because I do not know what parameter to set to define an output name/deirectory/whatever.

function CANU {

  FILE=$1
  Gen_size=$(echo "$1" | awk -F'[_.]' '{print $3-$2}')

  canu -assemble -p asstest -d . genomeSize=$Gen_size -nanopore-raw $FILE

}; export -f CANU
ls *.fasta | parallel -j $JOBS "CANU {}"

ADD COMMENT • link 6.4 years ago by ATpoint 88k

0

Entering edit mode

Thanks for your respons, i'll give it a try when im doing my next round of analysis.

ADD REPLY • link 6.4 years ago by rah ▴ 30