Workflow for big gene family analysis cross-species
1
0
Entering edit mode
4.0 years ago
lessismore ★ 1.3k

Dear all,

i'm dealing with a bulk of protein sequences from the same transcription factor family from distant organism families. I'd like to know what are the common good practices you use in this pipeline as i've seen that papers are very grey when they present this in their methods.

My analyses started using:
- HMM analysis to identify putative sequences in my target species.
- Filtering each gene for its longest variant
- It was followed by a conserved domain database CDD (by the way have you used it? with concise or full output?) that i use to filter the output for those with complete domains and with a significant hit for specific domain types.

Now i have few questions:

  1. Alignment
    Which algorithm and software do you recommend for the alignment.
  2. Post-alignment processing
    After the alignment, do you cut your alignment to focus only on the TF binding domain to make the tree construction easier?
  3. Phylogeny What program do you recommend for tree construction for >500 seqs.
    Which algorithms do you recommend for the tree constructions?
    And how many bootstrap?
    Do you suggest to collapse for bootstrap value e.g. >70?
  4. Tree annotation and publication ready
    What program do you use for annotating the tree?

If you can answer to one or few of these questions that would help already a lot.
Thanks in advance

phylogeny gene families • 638 views
ADD COMMENT
0
Entering edit mode
4.0 years ago

To build trees, you could use Li Heng's TreeBeST. It's used by Ensembl Compara. You can also see there what their pipeline looks like.

ADD COMMENT

Login before adding your answer.

Traffic: 1777 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6