Question

Make pairwise fasta file for two species

0

Entering edit mode

8.9 years ago

burnsro ▴ 20

I have a fasta file of DNA sequences for two different species across a chromosome, e.g.

>SpeciesA_gene1
ACTGC.....

and

>SpeciesB_geneX
TCTGC...

and a text file of orthologs

SpeciesA SpeciesB
gene1 geneX (etc)

I want to assign each ortholog pair into a seperate file, so I can run analyses on orthologous pairs, but I'm not sure how this can be achieved.

I thought about using the "paste" function in unix to merge the two files row by row, and then split the result file every two DNA sequences (i.e. a pair), but the order of the genes in the fasta file doesn't correspond to the order the orthologs are listed in.

Is there a standard way to do this? I imagine it must be done a lot for programs like Clustal?

fasta clustal • 2.0k views

ADD COMMENT • link updated 15 months ago by Ram 43k • written 8.9 years ago by burnsro ▴ 20

Ram · Answer 1 · 2015-05-31

I can not address address if there is a standard way to do this.

It is unclear to me if the sequences for the two species are in in the same fasta file or each in their own file.

Unfortunately one list of arbitrary pairs (orthologs) cannot be sorted in two ways simultaneously so you will need to search for at least one of the pair.

Do not know the details of your data so can't say if a simple brute force fgrep in a loop is sufficient or if it is more appropriate to split and order the sequences by species and identifier so the first record is always the next species-A sequence and searching for species-B identifier is sped up by those records being sorted as well.