Prepare gene families for CAFE
2
0
Entering edit mode
4.9 years ago
qwzhang0601 ▴ 70

Hello:

We have sequenced a new rodent species and predicted the gene models using maker2. Now, I am trying to find gene expansion or loss comparing to other rodents. As one of the input file to CAFE, I need to identify gene families and get the number of genes in each spices. I am new in such field, so I wonder which are the best solutions, what pipeline, tool or database I can achieve this relatively easier?

Thanks

gene family CAFE • 5.6k views
ADD COMMENT
0
Entering edit mode

Maybe this thread from a few years back can help.

ADD REPLY
0
Entering edit mode

Hey,Santiago and Ben,Now I meet some trouble about CAFE: 1.I use the output of OrthoFinder, part of result provide me the species tree and gene family(orthogroups), I am not sure whether this species can be use as input tree of CAFE 2.After I load the OrthoFinder-version species tree to r8s to generate a ultrametric tree, the parameter nsites and fixage confused me, how can I obtain this information? is there any tools or software can help me? Thanks very much

ADD REPLY
0
Entering edit mode

Hi, I am using CAFE to isolate longest isoform for each gene (the first step in pipeline). Can anyone tell how long this step takes mine has been running for days with no output! Installation and file formats are all correct!! Thanks!

ADD REPLY
0
Entering edit mode

No, this should be a fairly quick process. The script cafetutorial_longest_iso.py ) is quite simple, so I suggest you take a look at it and see if you can isolate the problem. Otherwise, we may need your input files to see if we can duplicate this issue.

ADD REPLY
3
Entering edit mode
4.9 years ago
biomonte ▴ 160

I was one of the participants at the 2017 Workshop on Phylogenomics in Český Krumlov. Here you can find the CAFE TUTORIAL: http://evomicsorg.wpengine.netdna-cdn.com/wp-content/uploads/2016/06/cafe_tutorial-1.pdf

UPDATE: the files and python scripts needed to follow the CAFE tutorial are available at the CAFE v.4 website: https://hahnlab.github.io/CAFE (Tutorial files: https://iu.app.box.com/v/cafetutorial-files )

ADD COMMENT
2
Entering edit mode
4.9 years ago
Ben Fulton ▴ 110

A pipeline that is often used with CAFE might look something like the following:

  1. Isolate the longest isoform for each gene
  2. Create a single file containing all of these isoforms
  3. Run all-by-all BLAST on the file, optionally filtering low complexity regions with the -seg parameter
  4. Find clusters of similar sequences. You can use mcl (http://micans.org/mcl/) for this
  5. Parse the mcl output to tabulate the number of gene copies found in each species for each gene family.

You may want to filter out gene families with large variances for better accuracy. This is all based on a tutorial written for CAFE and presented at the Workshop on Phylogenetics in Český Krumlov this year. You can find it online I believe.

ADD COMMENT

Login before adding your answer.

Traffic: 1776 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6