Question: Get orthologous sequences between 2 files containing a set of seq fasta
0
gravatar for Darill
18 months ago by
Darill30
Darill30 wrote:

Hi all the community! I explain what I need to do.

I actually have 2 files containing a set of genes sequences corresponding of 2 differentes species and what I need to do is to know between all those sequences, which are orthologous to be able to compare each pair of sequence (dN and dS).

Here is a hypotetical exemple of my file:

File 1 :

>seqB  (real name is seq 1)
AAAACCCCGGGGTTTTT
>seqE  (real name is seq 2)
ACCGGTTGACGGATGGAG
>seqC  (real name is seq 3)
AGGATTAGGATTAGGAAT

File 2:

>seqC  (real name is seq 1)
AGGACTAGGATTAGGAAA
>seqE (real name is seq 2)
ACGGGTTGACGGACGGAG
>seqB  (real name is seq 3)
AAAACCGCGGGGTTTAT

of course, none of those sequences has the same name.

And what I would like to do is to know which of them are orthologous, for exemple a file giving:

Orthologous genes between sp1 : sp2 
seq1 : seq3
seq2 : seq2
seq3 : seq1

Thank you very much for you help.

orthologous clustering gene • 425 views
ADD COMMENTlink modified 18 months ago by Sishuo Wang180 • written 18 months ago by Darill30
1
gravatar for Sishuo Wang
18 months ago by
Sishuo Wang180
The University of British Columbia
Sishuo Wang180 wrote:

For protein coding genes, you can try inparanoid, orthomcl, get_homologues,, orthofinder, and many other tools. I think for your purpose, you can translate them into amino acids first and then run family clustering using the above tool(s), as you mention that you were going to calculate dN and dS.

ADD COMMENTlink written 18 months ago by Sishuo Wang180
0
gravatar for Buffo
18 months ago by
Buffo1.7k
Buffo1.7k wrote:

You need to use CD-HIT, especifically cd-hit-2d

ADD COMMENTlink written 18 months ago by Buffo1.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1473 users visited in the last hour