How to use CAFE from Orthofinder Results
2
1
Entering edit mode
8 weeks ago

I asked this question a few days ago but did not get any answer. I tried finding it on my but could not really get it. I have orthofinder results 'Gene_count' which gives gene count in each orthogroup in each genome. Its format is like;

OG0000032 10 11 1 13 0 35

I also have SpeciesTree_rooted that is generated by orthofinder. I need to know if someone here has used orthofinder results and some modifications (if they did) to get successful gene families expansion/contraction results from CAFE or any other software please? I am really looking for an answer!

expansion orthofinder cafe gene • 788 views
4
Entering edit mode
7 weeks ago

I have been looking into something similar (maybe, not sure myself) and I didn't know about cafe. It looks good, thanks for bringing it up. The following R code should adapt orthofinder 2.x output to cafe5.

Lots of disclaimers: This code works for me in that it makes cafe complete without errors. Whether it makes something sensible I'm not sure as I'm still getting to grips with the whole idea of expansion/contraction of gene families.

First, make orthofinder's species tree ultrametric. It should be already binary and rooted as required by cafe.

library(ape)
library(data.table)

stopifnot(is.binary(tre))
stopifnot(is.rooted(tre))

if(is.ultrametric(tre)) {
utre <- tre
} else{
utre <- chronos(tre)
}
write.tree(utre, 'cafe/SpeciesTree_rooted_ultra.txt')


Then prepare the table of counts. I use the N0 output since the original orthogroups are deprecated in favour of the phylogenetic hierarchical orthogroups:

hog <- fread('Phylogenetic_Hierarchical_Orthogroups/N0.tsv')
hog[, OG := NULL]
hog[, Gene Tree Parent Clade := NULL]
hog <- melt(hog, id.vars='HOG', variable.name='species', value.name='pid')
hog <- hog[pid != '']
hog$n <- sapply(hog$pid, function(x) length(strsplit(x, ', ')[[1]]))

# Exclude HOGs with lots of genes in a one or more species.
keep <- hog[, list(n_max=max(n)), HOG][n_max < 100]$HOG hog <- hog[HOG %in% keep] # Exclude HOGs present in only 1 species keep <- hog[, .N, HOG][N > 1]$HOG
hog <- hog[HOG %in% keep]

counts <- dcast(hog, HOG ~ species, value.var='n', fill=0)
counts[, Desc := 'n/a']
setcolorder(counts, 'Desc')
fwrite(counts, 'hog_gene_counts.tsv', sep='\t')


Run cafe5 with your favorite settings:

nohup cafe5 -i cafe/hog_gene_counts.tsv -t cafe/SpeciesTree_rooted_ultra.txt -o cafe/results --cores 30 &
nohup cafe5 -i cafe/hog_gene_counts.tsv -t cafe/SpeciesTree_rooted_ultra.txt -o cafe/results_k3 --cores 30 -k 3 &
nohup cafe5 -i cafe/hog_gene_counts.tsv -t cafe/SpeciesTree_rooted_ultra.txt -o cafe/results_p_k3 --cores 30 -p -k 3 &

0
Entering edit mode

Hi Dariober! It worked!! Thank you so much!! It would be really helpful for others as well!

0
Entering edit mode
7 weeks ago

Edit: One can also gene Gene_count file by making some amendments like adding column of Desc with values n/a before Orthogroups columns, and remove Total column, then use dos2unix. It would work.

0
Entering edit mode

Note that the orthogroups directory is deprecated.

Once I'm here, orthofinder has also a script for making the species tree ultrametetric, make_ultrametric.py. I don't know how it compares to the method I suggest above though.