Only copy multifasta files from one directory to another
1
0
Entering edit mode
13 months ago

Hey guys,

I have a directory will a lot of fasta files, some single and some multi-fasta. Is there a way to copy only the multi-fasta files (i. e., those files that have more than one >) to another directory? Thanks!

sequence assembly • 266 views
2
Entering edit mode
13 months ago
Joe 19k

Here's a low-tech solution that seems to do the job (I haven't tested it extensively):

#!/bin/bash
# Usage:
#   $bash script.sh destination_folder for file in ./*.fasta ; do entries=$(grep -c ">" $file) if [$entries -gt 1 ]; then
mv "$file"$1
fi
done


If this is an especially critical step in a pipeline or something, you may want to consider a more robust approach (i.e. use an actual parser to check for multiple entries, use find to pick up the files etc.)

If you have very big fasta's this will be somewhat slow as it reads the whole file to count the number of >. It could terminate after finding just 2, but that's a more complicated task.

## EDIT: slightly more flexible way to ingest from the commandline:

#!/bin/bash
# Usage:
#      $bash script.sh destination_folder /path/to/dir/*.fasta # replace fasta with whatever extension is relevant for file in "${@:1:$#}" ; do entries=$(grep -c ">" "$file") if [ "$entries" -gt 1 ]; then
mv -v "$file" "$1"
fi
done

0
Entering edit mode

To add to my comments about robustness, it would probably be worth using getopt here to make the usage clearer, since passing wildcards to bash scripts is a bit clunky.