Question

Prepare gene families for CAFE

1

Entering edit mode

7.2 years ago

qwzhang0601 ▴ 80

Hello:

We have sequenced a new rodent species and predicted the gene models using maker2. Now, I am trying to find gene expansion or loss comparing to other rodents. As one of the input file to CAFE, I need to identify gene families and get the number of genes in each spices. I am new in such field, so I wonder which are the best solutions, what pipeline, tool or database I can achieve this relatively easier?

Thanks

gene family CAFE • 7.7k views

ADD COMMENT • link updated 4.4 years ago by mayankkaashyap ▴ 30 • written 7.2 years ago by qwzhang0601 ▴ 80

0

Entering edit mode

Maybe this thread from a few years back can help.

ADD REPLY • link 7.2 years ago by h.mon 35k

0

Entering edit mode

Hey,Santiago and Ben,Now I meet some trouble about CAFE: 1.I use the output of OrthoFinder, part of result provide me the species tree and gene family(orthogroups), I am not sure whether this species can be use as input tree of CAFE 2.After I load the OrthoFinder-version species tree to r8s to generate a ultrametric tree, the parameter nsites and fixage confused me, how can I obtain this information? is there any tools or software can help me? Thanks very much

ADD REPLY • link 5.9 years ago by liangdong0309 • 0

0

Entering edit mode

Hi, I am using CAFE to isolate longest isoform for each gene (the first step in pipeline). Can anyone tell how long this step takes mine has been running for days with no output! Installation and file formats are all correct!! Thanks!

ADD REPLY • link 4.4 years ago by mayankkaashyap ▴ 30

0

Entering edit mode

No, this should be a fairly quick process. The script cafetutorial_longest_iso.py ) is quite simple, so I suggest you take a look at it and see if you can isolate the problem. Otherwise, we may need your input files to see if we can duplicate this issue.

ADD REPLY • link 4.4 years ago by Ben Fulton ▴ 150

score 5 · Answer 1 · 2017-03-24

I was one of the participants at the 2017 Workshop on Phylogenomics in Český Krumlov. Here you can find the CAFE TUTORIAL: http://evomicsorg.wpengine.netdna-cdn.com/wp-content/uploads/2016/06/cafe_tutorial-1.pdf

UPDATE: the files and python scripts needed to follow the CAFE tutorial are available at the CAFE v.4 website: https://hahnlab.github.io/CAFE (Tutorial files: https://iu.app.box.com/v/cafetutorial-files )

score 3 · Answer 2 · 2017-03-17

A pipeline that is often used with CAFE might look something like the following:

Isolate the longest isoform for each gene
Create a single file containing all of these isoforms
Run all-by-all BLAST on the file, optionally filtering low complexity regions with the -seg parameter
Find clusters of similar sequences. You can use mcl (http://micans.org/mcl/) for this
Parse the mcl output to tabulate the number of gene copies found in each species for each gene family.

You may want to filter out gene families with large variances for better accuracy. This is all based on a tutorial written for CAFE and presented at the Workshop on Phylogenetics in Český Krumlov this year. You can find it online I believe.