Question: Prepare gene families for CAFE
gravatar for qwzhang0601
3.8 years ago by
United States
qwzhang060170 wrote:


We have sequenced a new rodent species and predicted the gene models using maker2. Now, I am trying to find gene expansion or loss comparing to other rodents. As one of the input file to CAFE, I need to identify gene families and get the number of genes in each spices. I am new in such field, so I wonder which are the best solutions, what pipeline, tool or database I can achieve this relatively easier?


gene family cafe • 4.2k views
ADD COMMENTlink modified 11 months ago by mayankkaashyap30 • written 3.8 years ago by qwzhang060170

Maybe this thread from a few years back can help.

ADD REPLYlink written 3.8 years ago by h.mon31k

Hey,Santiago and Ben,Now I meet some trouble about CAFE: 1.I use the output of OrthoFinder, part of result provide me the species tree and gene family(orthogroups), I am not sure whether this species can be use as input tree of CAFE 2.After I load the OrthoFinder-version species tree to r8s to generate a ultrametric tree, the parameter nsites and fixage confused me, how can I obtain this information? is there any tools or software can help me? Thanks very much

ADD REPLYlink written 2.5 years ago by liangdong03090

Hi, I am using CAFE to isolate longest isoform for each gene (the first step in pipeline). Can anyone tell how long this step takes mine has been running for days with no output! Installation and file formats are all correct!! Thanks!

ADD REPLYlink written 11 months ago by mayankkaashyap30

No, this should be a fairly quick process. The script ) is quite simple, so I suggest you take a look at it and see if you can isolate the problem. Otherwise, we may need your input files to see if we can duplicate this issue.

ADD REPLYlink written 11 months ago by Ben Fulton110
gravatar for Santiago Montero-Mendieta
3.7 years ago by

I was one of the participants at the 2017 Workshop on Phylogenomics in Český Krumlov. Here you can find the CAFE TUTORIAL:

UPDATE: the files and python scripts needed to follow the CAFE tutorial are available at the CAFE v.4 website: (Tutorial files: )

ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by Santiago Montero-Mendieta130
gravatar for Ben Fulton
3.7 years ago by
Ben Fulton110
Bloomington, IN
Ben Fulton110 wrote:

A pipeline that is often used with CAFE might look something like the following:

  1. Isolate the longest isoform for each gene
  2. Create a single file containing all of these isoforms
  3. Run all-by-all BLAST on the file, optionally filtering low complexity regions with the -seg parameter
  4. Find clusters of similar sequences. You can use mcl ( for this
  5. Parse the mcl output to tabulate the number of gene copies found in each species for each gene family.

You may want to filter out gene families with large variances for better accuracy. This is all based on a tutorial written for CAFE and presented at the Workshop on Phylogenetics in Český Krumlov this year. You can find it online I believe.

ADD COMMENTlink written 3.7 years ago by Ben Fulton110
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1288 users visited in the last hour