Orthofinder:How to obtain single copy homologous sequences from result files
0
1
Entering edit mode
3.2 years ago
Zhenpeng Yu ▴ 10
orthofinder -fg RESULTS_DIR -M msa -oa


The command described above does not yield a single copy of the homologous gene.

I also have a question here, that is, we use protein sequences for alignment and clustering analysis, and if I finally want to get the corresponding nucleotide sequence, what should I do?

single copy homologous sequences • 2.7k views
2
Entering edit mode

As you already used protein sequences of 8 marine species and found 8000 single copy genes (proteins) in result directory with orthogroups assigned by orthofinder, you can pick gene ids of each species from SingleCopyOrthogroups.txt file,

like this:

grep -Fwf SingleCopyOrthogroups.txt Orthogroups.txt > OrthologsIDS.txt


then you can retrieve nucleotide sequences of OrthologsIDS.txt from speciess nucleotide files.

0
Entering edit mode

You're amazing. The grep` command is so cool that the problem is solved instantly. I sincerely thank you.

0
Entering edit mode

The grep command is not working I tried for my data. Help is needed @ Mehmet.

1
Entering edit mode

We could do with a little more info here. Eg. what species are you working with? what data do you have at hand? What is the ultimate goal?

what do you mean with

does not yield a single copy of the homologous gene

thx

0
Entering edit mode

I'm interested in eight marine mammals, and now I've found more than 8000 single-copy homologous genes through orthofinder, and I want to extract the results through programs or scripts, but I'm not very good at programming myself. So ask if there is a corresponding software or script.

Thanks!

0
Entering edit mode

OK, so you want to obtain all the single-copy genes that are present in each of the 8 mammals , or simply all single copy genes (which can be only present in a few species) ?

Without going into detail: all this info can be parsed from the CSV output files you should get. Do you get those CSV files?

0
Entering edit mode

Can you also explain why you use that specific orthofinder cmdline? It's not wrong but it's not the 'default' one either

0
Entering edit mode

The command line I described above is from the orthofinder description document.

This issue has also been discussed, with the following links https://github.com/davidemms/OrthoFinder/issues/154

0
Entering edit mode

Hello, I want to know how do you sort out the sequence IDs contained in each homology group after you get the OrthologsIDS.txt file? thank you very much

0
Entering edit mode

Can you define your question with more details? What do you want to do? In OrthologsIDS.txt file there are ortholog families with one gene in each ortholog family from each species.

0
Entering edit mode

I have obtained single copy orthogroups and corresponding amino acid sequence files containing orthogroups. How can I obtain the corresponding nucleotide sequence based on the id of these amino acid sequence files (such as XPxxxx of species A)? Thanks in advance.

1
Entering edit mode

You need something like this?

extract sequences based on ids file

0
Entering edit mode

Thank you very much. It solved my problem.