I have a fasta file containing N sequences (outgroups). On the other hand, I have N fast files each having the orthologous gene sequences of several samples.
The idea is to put each of the outgroups at the top of their corresponding genes.
Example:
The outgroup looks like below, OUTGROUPS.fas
> GENE 1
TTAACTCCTGCTACTTTG
> GENE 2
TCTGTCGACGGCAACTGTGAAACTTATC
> GENE 3
GCACCCTGAGCCGAACTGAATTC
> GENE 4
GGTTAACAGAACTTGTTCTTCACATGCAGAGTCTTGA
Here is the fasta file of one of my genes, called GENE1.fas
>Ind_1
TTAAATCCTGCTTCTTTG
>Ind_2
TTAAATCCTGCTTCTTTG
>Ind_3
TTAAATCCTGCTTCTTTG
I want to get the following:
> GENE 1
TTAACTCCTGCTACTTTG
>Ind_1
TTAAATCCTGCTTCTTTG
>Ind_2
TTAAATCCTGCTTCTTTG
>Ind_3
TTAAATCCTGCTTCTTTG
This is done for GENE1, I need same job for the rest of the GENES. As you see, the name of entries in outgroup fasta file (GENE1, GENE2, ...) are the same as their orthologous genes (GENE1.fas
, GENE2.fas
...).
I appreciate any help on this post.
Best, Hossein
Lemme make sure I understand this correctly:
Each gene FASTA file will always have a name (
GENE1.fas
) which corresponds to its associated header in theOUTGROUPS.fas
file (>GENE 1
), correct?Do you want all of the results in a single output file?