Is It Good To Do Phylogeny Of Below Data
1
0
Entering edit mode
12.0 years ago
Sagar Nikam ▴ 160

i want both structure & seq data so i give following query to PDB,

Molecule Type=protein Experimental Method=X-RAY ENZYMECLASSIFICATION is 3: Hydrolases Number of Chains Search : Min Number of Chains=1 Max Number of Chains=1 Homologue Removal - 30% Identity Cutoff

it gives me 926 results i found that protein have different lengths i.e. from 100 to 900 Is it good ,doing phylogeny of this data?will it generate std tree? what consequences will occurs? i want Neighbor joining tree,will seq clusters correctly

protein phylogeny • 2.0k views
ADD COMMENT
3
Entering edit mode

Please clarify your question.

ADD REPLY
1
Entering edit mode
12.0 years ago
Jan Kosinski ★ 1.6k

No, it is not good data for phylogeny. Two most apparent reasons:

  • You have selected all hydrolases, many can be completely dissimilar at the sequence level, thus precluding any reliable phylogeny based on sequences
  • even if all hydrolases were similar at the sequence level, by selecting hydrolases ONLY with solved structures and constructing phylogeny of them you will not describe the evolution of all hydrolases, but just those non representative subset of those with solved structure. This may lead to wrong conclusions from biased selection of sequences. You must add sequences from sequence database instead (e.g. start BLAST from every hydrolase in PDB against NCBI nr database and merge collected sequences into single dataset)

If you just want to CLUSTER hydrolases with known structure based on their sequence: use tools linked at: http://www.biostars.org/post/show/7220/automatic-clustering-of-biological-sequences/ This will separate your dataset into groups with detectable sequence similarity.

If you want to CLUSTER based on structure similarity, compare all hydrolases structures all-against-all, build RMSD matrix, and cluster it using neighbor joining. Note that hydrolases are present in all structural classes and many folds, so RMSDs between clusters from different folds will be meaningless.

And, remove Max Number of Chains=1 from your filter, you want all hydrolases, also those in complexes, right?

ADD COMMENT

Login before adding your answer.

Traffic: 2498 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6