Question: Orthogroups.csv file for orthofinder
gravatar for mxlsherry1992
20 months ago by
mxlsherry199230 wrote:

Dear all,

To interpret the orthofinder output file Orthogroups.csv, if I have three input protein fasta file, the output Orthogroups.csv is like below, the first two species have no reference genome, so its' ID looks like"Trinity_DN_...", since the ID has similar format for the first two species( Clarias, Pan), how could I identify if they are from the first species (Clarias) or third species (Pan)..

enter image description![enter image description here][1] here

ADD COMMENTlink modified 19 months ago by david_emms110 • written 20 months ago by mxlsherry199230

if the IDs used in each set are not unique you likely will run into trouble (I'm already surprised that blast did not complain on this?). Before running orthofinder it's a good idea to prefix the IDs from each set with a 'code' that indicates the species it's from.

ADD REPLYlink written 20 months ago by lieven.sterck9.4k

I think OrthoFinder does the conversion before running BLAST for you, for example in the WorkingDirectory I got:

$ head SpeciesIDs.txt SequenceIDs.txt
==> SpeciesIDs.txt <==
0: Athaliana.fasta
1: Bdistachyon.fasta
2: Hvulgare.fasta
3: Osativa.fasta
4: Pglaucum.fasta
5: Sbicolor.fasta
6: Sitalica.fasta
7: Zmays.fasta

==> SequenceIDs.txt <==
0_0: AT1G50920.1 | Symbols:  | Nucleolar GTP-binding protein | chr1:18870555-18872570 FORWARD LENGTH=671
0_1: AT1G36960.1 | Symbols:  | unknown protein; BEST Arabidopsis thaliana protein match is: unknown protein (TAIR:AT1G48095.1); Has 54 Blast hits to 54 proteins in 2 species: Archae - 0; Bacteria - 0; Metazoa - 0; Fungi - 0; Plants - 54; Viruses - 0; Other Eukaryotes - 0 (source: NCBI BLink). | chr1:14014796-14015508 FORWARD LENGTH=181
0_2: AT1G44020.1 | Symbols:  | Cysteine/Histidine-rich C1 domain family protein | chr1:16716692-16718656 REVERSE LENGTH=577
0_3: AT1G15970.1 | Symbols:  | DNA glycosylase superfamily protein | chr1:5486544-5488494 REVERSE LENGTH=352
0_4: AT1G73440.1 | Symbols:  | calmodulin-related | chr1:27611418-27612182 FORWARD LENGTH=254
0_5: AT1G75120.1 | Symbols: RRA1 | Nucleotide-diphospho-sugar transferase family protein | chr1:28197022-28198656 REVERSE LENGTH=402
0_6: AT1G17600.1 | Symbols:  | Disease resistance protein (TIR-NBS-LRR class) family | chr1:6053026-6056572 REVERSE LENGTH=1049
0_7: AT1G51380.1 | Symbols:  | DEA(D/H)-box RNA helicase family protein | chr1:19047960-19049967 FORWARD LENGTH=392
0_8: AT1G77370.1 | Symbols:  | Glutaredoxin family protein | chr1:29073916-29074642 FORWARD LENGTH=130
0_9: AT1G44090.1 | Symbols: ATGA20OX5, GA20OX5 | gibberellin 20-oxidase 5 | chr1:16760677-16762486 REVERSE LENGTH=385

$ grep '^>' Species0.fa | head
ADD REPLYlink modified 20 months ago • written 20 months ago by AK2.0k
gravatar for AK
20 months ago by
AK2.0k wrote:

Hi mxlsherry1992,

In the newer version of OrthoFinder (here for example 2.3.1), several output files become tab delimited (Change file endings to .tsv as appropriate).

And in the output file Orthogroups.tsv, the members in each family from different input sequence files are separated by a tab:

    Athaliana   Hvulgare    Osativa Pglaucum    Sbicolor    Sitalica    Zmays
OG0010401   AT1G09410.1, AT1G56690.1    HORVU4Hr1G052340.1  LOC_Os03g20190.1    Pgl_GLEAN_10026176      Seita.9G424600.1.p  Zm00001d028935_P001

By using the newer version, the members of your first two species (Clarias, Pan) will be separated by a tab and appear in the second and third columns of "Orthogroups.tsv", so you can identify them by selecting a specific column regardless of the naming.

ADD COMMENTlink modified 20 months ago • written 20 months ago by AK2.0k
gravatar for david_emms
19 months ago by
david_emms110 wrote:


Just following on from what SMK said, the Orthogroups.csv file was also a tab-delimited file. Genes from difference species are separated by a tab and genes within the same species are separated with a comma. If you open it in a spreadsheet program (e.g. Excel, LibreOffice Calc) and chose 'tab' as the delimiter then it will display correctly.

All the best David

ADD COMMENTlink written 19 months ago by david_emms110

Thank you! I got it!!!!

ADD REPLYlink written 19 months ago by mxlsherry199230
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1107 users visited in the last hour