Question: Detect trees (newick) with specific topology
1
gravatar for ibasan
23 months ago by
ibasan40
ibasan40 wrote:

Dear community, i have trees (<3000) in newick format with four species like this example:

((Spec4:0.529207,(Spec3:0.0803395,Spec2:0.0124315)),Spec1:0,Spec1:0);

I am only interested to detect the trees in which two species are clustering together, like in the example Spec3 and Spec2. Is it possible to do that with a simple script or does anybody knows a software (tried phybin, ete3 compare already). I will be grateful if you someone could help.

ADD COMMENTlink modified 23 months ago by Juke-341.8k • written 23 months ago by ibasan40
1

Not aware of tool to subset trees based on topology. Yes, a script/regex could help.

I am wondering if you have the images of the trees? If you do, may be it's interesting to try deep learning / computer vision-based approach here?

ADD REPLYlink written 23 months ago by Khader Shameer17k
1

Dear Khader Shameer, atm i don't have the images of the trees (but could get them). Thanks for your reply.

ADD REPLYlink written 23 months ago by ibasan40
2
gravatar for jrj.healey
23 months ago by
jrj.healey10k
United Kingdom
jrj.healey10k wrote:

I think this might work, but it's a sort of 'brute force' way to do it. I would maybe re-factor your trees to cladograms and remove the branch lengths via a regex for the branch length and colon (in whatever your favourite regex language is), then you could simply grep or string search in some other manner for (Spec3,Spec2) and you'll find all trees which contain that grouping pretty easily.

e.g.: Remove decimals, sole zeros and colons from the file (probably not the most elegant regex):

Given your tree:

((Spec4:0.529207,(Spec3:0.0803395,Spec2:0.0124315)),Spec1:0,Spec1:0);

One could do:

cat test.tree | sed -e 's/[0-9]*\.[0-9]*//g' -e 's/0//g' -e 's/://g'

Yeilding:

 ((Spec4,(Spec3,Spec2)),Spec1,Spec1);

Then you can string search your yielded trees:

egrep -r -l "Spec(2|3),Spec(2|3)" .

Will give you all the filenames where Species 3 and Species 2 are adjacent nodes (in either orientation).

If you want to keep branch length in your trees as you're not just interested in topology, you could concoct a regex for use with grep:

egrep "Spec(2|3):(0?|[0-9]+\.[0-9]+),Spec(2|3):(0?|[0-9]+\.[0-9]+)" treefile.tree

But having to conjure that regex for every possible combination of topologies looks awful to me, so I'd be inclined to try it without the branch lengths.

I don't know how many topologies you're interested in finding in all your trees - this approach may not be feasible if it's a prohibitively large number.


Slightly more complex, if you'd like to see the match, and the file name, this is an option:

2 example sed-treated trees:

((Spec4,(Spec5,Spec6)),Spec2,Spec3);
((Spec4,(Spec3,Spec2)),Spec1,Spec1);

Passing a 'dummy filename' in the form of dev/null tricks grep in to printing the filename (as it thinks it's working on multiple files) and the actual match itself by default:

for file in *.tree ; do egrep "Spec(2|3),Spec(2|3)" "$file" /dev/null ; done

Would yeild:

sed2.tree:((Spec4,(Spec5,Spec6)),Spec2,Spec3);
sed.tree:((Spec4,(Spec3,Spec2)),Spec1,Spec1);

With the appropriate string matches highlighted (if your terminal is configured for it).

ADD COMMENTlink modified 23 months ago • written 23 months ago by jrj.healey10k
1

Dear jrj.healey, due to the fact that i'm not interested in keeping the branch lengths your idea is exactly what i need. Thanks a lot!

ADD REPLYlink written 23 months ago by ibasan40
0
gravatar for Jean-Karim Heriche
23 months ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche18k wrote:

Check if the species of interest are direct children of their common ancestor or simply count the number of children species of their common ancestor. This should be possible with most software with tree traversal capabilities. For example, the R phylobase package has the ancestor() and children() functions.

ADD COMMENTlink written 23 months ago by Jean-Karim Heriche18k

Dear Jean-Karim Heriche, i will have a look at the R phylobase package. Thanks!

ADD REPLYlink written 23 months ago by ibasan40
0
gravatar for Juke-34
23 months ago by
Juke-341.8k
Sweden
Juke-341.8k wrote:

I know one tool to do so, it's really powerful but it's in Prolog: bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-10-298

Maybe the paper cites other tools.

ADD COMMENTlink written 23 months ago by Juke-341.8k

Thanks for the link Juke-34. I will have alook at it!

ADD REPLYlink written 23 months ago by ibasan40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 748 users visited in the last hour