Concatenate multifastas
1
0
Entering edit mode
2.9 years ago
Colaptes ▴ 90

Hello, I have a task which seems it should be simple but I haven't found a solution yet. I have several thousand fasta files, each containing an alignment of 30 samples. The headers of each entry are the sample name, and every file contains the same 30 samples. I would like to concatenate the sequences of each fasta file such that I have one fasta file with the 30 samples. For example:

Starting data:

Gene1.fasta

>Sample1
CCCCCCCCC
>Sample2
AAAAAAAAA

Gene2.fasta

>Sample1
TTTTTTTTTTTTTTT
>Sample2
GGGGGGGGGGGGGGG

Desired output:
AllGenes.fasta

>Sample1
CCCCCCCCCTTTTTTTTTTTTTTT
>Sample2
AAAAAAAAAGGGGGGGGGGGGGGG

So far the only solution I have come up with is this:

for sample in Sample1 Sample2 ; do echo ">$sample" > "$sample".temp.fasta ; for gene in Gene1 Gene2 ; do seqkit grep -p "$sample" "$gene".fasta | grep -v ">" >> "$sample".temp.fasta ; done ; done
cat *.temp.fasta > AllGenes.fasta

but that seems terribly inefficient for thousands of genes, is there a better way?

join fasta concatenate multifasta • 615 views
ADD COMMENT
2
Entering edit mode
2.9 years ago
GenoMax 141k

See answers here: Combining two fasta sequences into one

I recommend you use seqkit concat.

ADD COMMENT
0
Entering edit mode

Thank you, that is perfect!

ADD REPLY

Login before adding your answer.

Traffic: 2618 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6