Question

Orthofinder:How to obtain single copy homologous sequences from result files

2

Entering edit mode

5.8 years ago

Zhenpeng Yu ▴ 20

orthofinder -fg RESULTS_DIR -M msa -oa

The command described above does not yield a single copy of the homologous gene.

I also have a question here, that is, we use protein sequences for alignment and clustering analysis, and if I finally want to get the corresponding nucleotide sequence, what should I do?

Thanks in advance!

single copy homologous sequences • 5.7k views

ADD COMMENT • link updated 3.1 years ago by sunnykevin97 ▴ 980 • written 5.8 years ago by Zhenpeng Yu ▴ 20

2

Entering edit mode

As you already used protein sequences of 8 marine species and found 8000 single copy genes (proteins) in result directory with orthogroups assigned by orthofinder, you can pick gene ids of each species from SingleCopyOrthogroups.txt file,

like this:

grep -Fwf SingleCopyOrthogroups.txt Orthogroups.txt > OrthologsIDS.txt

then you can retrieve nucleotide sequences of OrthologsIDS.txt from species`s nucleotide files.

ADD REPLY • link 5.8 years ago by Mehmet ▴ 820

0

Entering edit mode

You're amazing. The grep command is so cool that the problem is solved instantly. I sincerely thank you.

ADD REPLY • link 5.8 years ago by Zhenpeng Yu ▴ 20

0

Entering edit mode

The grep command is not working I tried for my data. Help is needed @ Mehmet.

ADD REPLY • link 3.1 years ago by sunnykevin97 ▴ 980

1

Entering edit mode

We could do with a little more info here. Eg. what species are you working with? what data do you have at hand? What is the ultimate goal?

what do you mean with

does not yield a single copy of the homologous gene

thx

ADD REPLY • link 5.8 years ago by lieven.sterck 15k

0

Entering edit mode

I'm interested in eight marine mammals, and now I've found more than 8000 single-copy homologous genes through orthofinder, and I want to extract the results through programs or scripts, but I'm not very good at programming myself. So ask if there is a corresponding software or script.

Thanks!

ADD REPLY • link 5.8 years ago by Zhenpeng Yu ▴ 20

0

Entering edit mode

OK, so you want to obtain all the single-copy genes that are present in each of the 8 mammals , or simply all single copy genes (which can be only present in a few species) ?

Without going into detail: all this info can be parsed from the CSV output files you should get. Do you get those CSV files?

ADD REPLY • link 5.8 years ago by lieven.sterck 15k

0

Entering edit mode

Can you also explain why you use that specific orthofinder cmdline? It's not wrong but it's not the 'default' one either

ADD REPLY • link 5.8 years ago by lieven.sterck 15k

0

Entering edit mode

The command line I described above is from the orthofinder description document.

This issue has also been discussed, with the following links https://github.com/davidemms/OrthoFinder/issues/154

ADD REPLY • link 5.8 years ago by Zhenpeng Yu ▴ 20

0

Entering edit mode

Hello, I want to know how do you sort out the sequence IDs contained in each homology group after you get the OrthologsIDS.txt file? thank you very much

ADD REPLY • link 4.1 years ago by yinbinqiu • 0

0

Entering edit mode

Can you define your question with more details? What do you want to do? In OrthologsIDS.txt file there are ortholog families with one gene in each ortholog family from each species.

ADD REPLY • link 4.1 years ago by Mehmet ▴ 820

0

Entering edit mode

I have obtained single copy orthogroups and corresponding amino acid sequence files containing orthogroups. How can I obtain the corresponding nucleotide sequence based on the id of these amino acid sequence files (such as XPxxxx of species A)? Thanks in advance.