Question: Get orthologous sequences between 2 files containing a set of seq fasta
0
gravatar for Darill
12 months ago by
Darill30
Darill30 wrote:

Hi all the community! I explain what I need to do.

I actually have 2 files containing a set of genes sequences corresponding of 2 differentes species and what I need to do is to know between all those sequences, which are orthologous to be able to compare each pair of sequence (dN and dS).

Here is a hypotetical exemple of my file:

File 1 :

>seqB  (real name is seq 1)
AAAACCCCGGGGTTTTT
>seqE  (real name is seq 2)
ACCGGTTGACGGATGGAG
>seqC  (real name is seq 3)
AGGATTAGGATTAGGAAT

File 2:

>seqC  (real name is seq 1)
AGGACTAGGATTAGGAAA
>seqE (real name is seq 2)
ACGGGTTGACGGACGGAG
>seqB  (real name is seq 3)
AAAACCGCGGGGTTTAT

of course, none of those sequences has the same name.

And what I would like to do is to know which of them are orthologous, for exemple a file giving:

Orthologous genes between sp1 : sp2 
seq1 : seq3
seq2 : seq2
seq3 : seq1

Thank you very much for you help.

orthologous clustering gene • 313 views
ADD COMMENTlink modified 12 months ago by Sishuo Wang170 • written 12 months ago by Darill30
1
gravatar for Sishuo Wang
12 months ago by
Sishuo Wang170
The University of British Columbia
Sishuo Wang170 wrote:

For protein coding genes, you can try inparanoid, orthomcl, get_homologues,, orthofinder, and many other tools. I think for your purpose, you can translate them into amino acids first and then run family clustering using the above tool(s), as you mention that you were going to calculate dN and dS.

ADD COMMENTlink written 12 months ago by Sishuo Wang170
0
gravatar for Buffo
12 months ago by
Buffo1.5k
Buffo1.5k wrote:

You need to use CD-HIT, especifically cd-hit-2d

ADD COMMENTlink written 12 months ago by Buffo1.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1825 users visited in the last hour