Question: Orthofinder:How to obtain single copy homologous sequences from result files
0
gravatar for Zhenpeng Yu
23 months ago by
China/Nanjing/Nanjing Normal University
Zhenpeng Yu0 wrote:
orthofinder -fg RESULTS_DIR -M msa -oa

The command described above does not yield a single copy of the homologous gene.

I also have a question here, that is, we use protein sequences for alignment and clustering analysis, and if I finally want to get the corresponding nucleotide sequence, what should I do?

Thanks in advance!

ADD COMMENTlink modified 10 weeks ago by yinbinqiu0 • written 23 months ago by Zhenpeng Yu0
1

We could do with a little more info here. Eg. what species are you working with? what data do you have at hand? What is the ultimate goal?

what do you mean with

does not yield a single copy of the homologous gene

thx

ADD REPLYlink modified 23 months ago • written 23 months ago by lieven.sterck7.8k

I'm interested in eight marine mammals, and now I've found more than 8000 single-copy homologous genes through orthofinder, and I want to extract the results through programs or scripts, but I'm not very good at programming myself. So ask if there is a corresponding software or script.

Thanks!

ADD REPLYlink written 23 months ago by Zhenpeng Yu0

OK, so you want to obtain all the single-copy genes that are present in each of the 8 mammals , or simply all single copy genes (which can be only present in a few species) ?

Without going into detail: all this info can be parsed from the CSV output files you should get. Do you get those CSV files?

ADD REPLYlink modified 23 months ago • written 23 months ago by lieven.sterck7.8k
1

As you already used protein sequences of 8 marine species and found 8000 single copy genes (proteins) in result directory with orthogroups assigned by orthofinder, you can pick gene ids of each species from SingleCopyOrthogroups.txt file,

like this:

grep -Fwf SingleCopyOrthogroups.txt Orthogroups.txt > OrthologsIDS.txt

then you can retrieve nucleotide sequences of OrthologsIDS.txt from species`s nucleotide files.

ADD REPLYlink written 23 months ago by Mehmet510

You're amazing. The grep command is so cool that the problem is solved instantly. I sincerely thank you.

ADD REPLYlink written 23 months ago by Zhenpeng Yu0

Can you also explain why you use that specific orthofinder cmdline? It's not wrong but it's not the 'default' one either

ADD REPLYlink written 23 months ago by lieven.sterck7.8k

The command line I described above is from the orthofinder description document.

This issue has also been discussed, with the following links https://github.com/davidemms/OrthoFinder/issues/154

ADD REPLYlink written 23 months ago by Zhenpeng Yu0

Hello, I want to know how do you sort out the sequence IDs contained in each homology group after you get the OrthologsIDS.txt file? thank you very much

ADD REPLYlink written 10 weeks ago by yinbinqiu0

Can you define your question with more details? What do you want to do? In OrthologsIDS.txt file there are ortholog families with one gene in each ortholog family from each species.

ADD REPLYlink written 10 weeks ago by Mehmet510

I have obtained single copy orthogroups and corresponding amino acid sequence files containing orthogroups. How can I obtain the corresponding nucleotide sequence based on the id of these amino acid sequence files (such as XPxxxx of species A)? Thanks in advance.

ADD REPLYlink written 9 weeks ago by yinbinqiu0

You need something like this?

extract sequences based on ids file

ADD REPLYlink written 9 weeks ago by Mehmet510

Thank you very much. It solved my problem.

ADD REPLYlink written 9 weeks ago by yinbinqiu0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1866 users visited in the last hour