Question: Bash: For loop with two statements, first statements as input to second with .fasta files
2
gravatar for rah
3 months ago by
rah20
rah20 wrote:

im working on several .fasta files which in their name contains the chr_start_end.fasta which i want to iterate through and then extract the individual size for each fasta file. Then i want to use the size as an input to another command for each fasta file, so in the same for loop.

To extract the coordinates from the file i use as an example:

echo "chr10_126777139_126791124.fasta" | awk -F'[_.]' '{print $3-$2}'

which yields = 126791124 - 126777139 = 13985

Then i want to give the 13985 as an input to an genome assembly tool called canu. Like this example

canu -assemble -p asstest -d . -genomeSize=13985 -nanopore-raw chr10_126777139_126791124.fasta

I've tried this so far, but i cant get it to work properly.

for f in *.fasta; do Gen_size=$(echo "$f" | awk -F'[_.]' '{print $3-$2}') canu -assemble -p asstest -d . genomeSize=$Gen_size -nanopore-raw $f; done

I want to do this for several .fasta files at once, do any of you have any suggestions on how to pass one input to the next statement within the same for loop? Thanks?

ADD COMMENTlink modified 3 months ago • written 3 months ago by rah20

It looks fine. What is the error you get?

Also, the line starting with "canu" should be on a new line.

ADD REPLYlink modified 3 months ago • written 3 months ago by Fabio Marroni2.2k
2

You need a semi colon after your echo/awk command and before you call canu:

for f in *.fasta; do Gen_size=$(echo "$f" | awk -F'[_.]' '{print $3-$2}') canu -assemble -p asstest -d . genomeSize=$Gen_size -nanopore-raw $f; done
                                                                         ^ here
ADD REPLYlink modified 3 months ago • written 3 months ago by jrj.healey12k

Thank you, nicely noticed.

ADD REPLYlink written 3 months ago by rah20
1
gravatar for ATpoint
3 months ago by
ATpoint16k
Germany
ATpoint16k wrote:

Hope I got your question correctly: Wrap it into a function and parallelize with GNU parallel using $JOBS as the number of parallel jobs. Not familiar with Canu so you might have to tune it a bit because I do not know what parameter to set to define an output name/deirectory/whatever.

function CANU {

  FILE=$1
  Gen_size=$(echo "$1" | awk -F'[_.]' '{print $3-$2}')

  canu -assemble -p asstest -d . genomeSize=$Gen_size -nanopore-raw $FILE

}; export -f CANU
ls *.fasta | parallel -j $JOBS "CANU {}"
ADD COMMENTlink modified 3 months ago • written 3 months ago by ATpoint16k

Thanks for your respons, i'll give it a try when im doing my next round of analysis.

ADD REPLYlink written 3 months ago by rah20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 801 users visited in the last hour