Concatenating 1000s of pairs of fasta files
1
0
Entering edit mode
5.6 years ago

Hi,

I have one folder with 25000 fasta files named Rnor.01.fa, Rnor.02.fa.... And another folder with 25000 fasta files named Rrat.01.fa, Rrat.02.fa....

I want to concatenate Rnor.01.fa and Rrat.01.fa into a file Rnor_Rrat.01.fa and likewise Rnor.02.fa with Rrat.02.fa to ultimately have 25000 files each containing fasta sequences from each species.

I am new to programing and can't seem to figure out how to use cat to be able to do so.

Any help would be appreciated. Thank you.

Unix scripts fasta files • 1.5k views
ADD COMMENT
1
Entering edit mode
ADD REPLY
2
Entering edit mode
5.6 years ago

Assuming that naming convention holds for all of your files:

mkdir tmp
ls -1 | cut -f 2 -d . | sort | uniq | while read i;do cat $(ls -1 *.$i.fa)  > $(ls -1 *.$i.fa | cut -f 1 -d . | perl -pe 's/\n/_/g').$i.fa; mv $(ls -1 *.$i.fa | grep -v _) tmp;done

Untested, and typed on my phone, so check with some echo statements before running it for real. (should preserve your original files in the tmp folder)

ADD COMMENT
0
Entering edit mode

Thank you so much. I got it to work.

ADD REPLY
0
Entering edit mode

Glad to hear it. If you find an answer helpful, remember to mark it as accepted to help the next people to find the thread.

ADD REPLY

Login before adding your answer.

Traffic: 1660 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6