Make pairwise fasta file for two species
Entering edit mode
6.4 years ago
burnsro ▴ 20

I have a fasta file of DNA sequences for two different species across a chromosome, e.g.






and a text file of orthologs

SpeciesA SpeciesB

gene1 geneX (etc)


I want to assign each ortholog pair into a seperate file, so I can run analyses on orthologous pairs, but I'm not sure how this can be achieved.

I thought about using the "paste" function in unix to merge the two files row by row, and then split the result file every two DNA sequences (i.e. a pair), but the order of the genes in the fasta file doesn't correspond to the order the orthologs are listed in.

Is there a standard way to do this? I imagine it must be done a lot for programs like Clustal?

fasta clustal • 1.5k views
Entering edit mode
6.4 years ago
tomc ▴ 80

I can not address address if there is a standard way to do this.

It is unclear to me if the sequences for the two species are in in the same fasta file or each in their own file. 

Unfortunately one list of arbitrary pairs (orthologs)  cannot be sorted in two ways simultaneously so you will need to search for at least one of the pair. 

Do not know the details of your data so can't say if a simple brute force fgrep in a loop is sufficient or if it is more appropriate to split and order the sequences by species and identifier so the first record is always the next species-A sequence and searching for species-B identifier is sped up by those records being sorted as well.


Login before adding your answer.

Traffic: 1093 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6