How to get the Expansions and Contractions groups ID in CAFE resultes
1
1
Entering edit mode
5.5 years ago
xzpgocxx ▴ 20

Dear,

I have used the python scripts cafetutorial_report_analysis.py to summary CAFE output and get the pub, node, fams and anc files based on the Tutorial (CAFE: Computational Analysis of gene Family Evolution)

1) In the pub file, how to explain the content, example 45(13) in the second column? when reading the node file, I find 45 is mean the Expanded families number, but what is 13, the most Expanded group?

Species Expanded fams   Genes gained    genes/expansion Contracted fams Genes lost  genes/contraction   No change   Avg. Expansion
speciesA   45(13)     368     8.18  143(8)      287 2.01    159 3889

2) In node file, I can get how many families underwent expansions, contractions and rapidly evolving. But what is the relevant group ID?

3) In fams file, how to explain the results?

# The labeled CAFE tree:

speciesA 3940[+30],83[-6]

Any help is much appreciated. Thanks.

CAFE genome gene • 4.0k views
ADD COMMENT
1
Entering edit mode
5.5 years ago
Ben Fulton ▴ 150

This answer is copied from the identical question asked in the CAFE Google Group (https://groups.google.com/forum/#!forum/hahnlabcafe )

1) The number in parentheses is the number of rapidly changing families. So, in the Expanded fams column you see 45(13). That means for species A CAFE found that 43 gene families had expanded, with 13 of those 43 being rapid expansions.

2) The columns in the node file are Node, Expansions, Contractions, Rapids. Are you asking about the first column, Node? If so, these are the node labels of your input phylogeny, with internal nodes labeled by CAFE. To get the CAFE labeled phylogeny, you can look at the report file with the line that starts, "# IDs of nodes:" or the report analysis script should print this phylogeny out after it has run.

To get the IDs of the groups that are expanding and contracting for a particular node, you'll have to look in the fams file. In the first column will be the node, and in the second column are the IDs, comma separated. You'll have to parse out the expansions and contractions, but after you do that there should be 45 expansions.

You get the same from one of the summary report files xxxxx_anc.txt. This file contains all the gene families with their species wise count as well as counts in their internal nodes. The species & internal nodes has also been assigned a number in ascending order (in same topology as provided in input tree). You just need to arrange the columns in ascending order of this number. Now, whichever species you want the expansion / contractionfamily list, subtract the family gene count from the immediate ascending node gene count:

  • if the result number is +ve, the family is Contracted
  • if the result number is -ve, the family is Expanded
  • if the result number is 0, the family is Contracted

(However, if you use the -r 0 flag when running report_analysis, it will print out the IDs of all changing families in the fams file, not just the rapidly changing ones.)

3) This means that in species A, family 3940 has gained 30 genes and the asterisk means this is a rapid change. Also, family 83 has lost 6 genes, and again the asterisk means it is a rapid change.

ADD COMMENT

Login before adding your answer.

Traffic: 2522 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6