Question: Make pairwise fasta file for two species
0
gravatar for burnsro
4.3 years ago by
burnsro20
Austria
burnsro20 wrote:

I have a fasta file of DNA sequences for two different species across a chromosome, e.g.

>SpeciesA_gene1     

ACTGC.....

and 

>SpeciesB_geneX

TCTGC...

and a text file of orthologs

SpeciesA SpeciesB

gene1 geneX (etc)

 

I want to assign each ortholog pair into a seperate file, so I can run analyses on orthologous pairs, but I'm not sure how this can be achieved.

I thought about using the "paste" function in unix to merge the two files row by row, and then split the result file every two DNA sequences (i.e. a pair), but the order of the genes in the fasta file doesn't correspond to the order the orthologs are listed in.

Is there a standard way to do this? I imagine it must be done a lot for programs like Clustal?

clustal fasta • 1.2k views
ADD COMMENTlink modified 4.3 years ago by tomc80 • written 4.3 years ago by burnsro20
0
gravatar for tomc
4.3 years ago by
tomc80
United States
tomc80 wrote:

I can not address address if there is a standard way to do this.

It is unclear to me if the sequences for the two species are in in the same fasta file or each in their own file. 

Unfortunately one list of arbitrary pairs (orthologs)  cannot be sorted in two ways simultaneously so you will need to search for at least one of the pair. 

Do not know the details of your data so can't say if a simple brute force fgrep in a loop is sufficient or if it is more appropriate to split and order the sequences by species and identifier so the first record is always the next species-A sequence and searching for species-B identifier is sped up by those records being sorted as well.

ADD COMMENTlink written 4.3 years ago by tomc80
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1131 users visited in the last hour