Question: Detect trees (newick) with specific topology
1
gravatar for ibasan
2.4 years ago by
ibasan40
ibasan40 wrote:

Dear community, i have trees (<3000) in newick format with four species like this example:

((Spec4:0.529207,(Spec3:0.0803395,Spec2:0.0124315)),Spec1:0,Spec1:0);

I am only interested to detect the trees in which two species are clustering together, like in the example Spec3 and Spec2. Is it possible to do that with a simple script or does anybody knows a software (tried phybin, ete3 compare already). I will be grateful if you someone could help.

ADD COMMENTlink modified 2.4 years ago by Juke-342.2k • written 2.4 years ago by ibasan40
1

Not aware of tool to subset trees based on topology. Yes, a script/regex could help.

I am wondering if you have the images of the trees? If you do, may be it's interesting to try deep learning / computer vision-based approach here?

ADD REPLYlink written 2.4 years ago by Khader Shameer18k
1

Dear Khader Shameer, atm i don't have the images of the trees (but could get them). Thanks for your reply.

ADD REPLYlink written 2.4 years ago by ibasan40
2
gravatar for jrj.healey
2.4 years ago by
jrj.healey13k
United Kingdom
jrj.healey13k wrote:

I think this might work, but it's a sort of 'brute force' way to do it. I would maybe re-factor your trees to cladograms and remove the branch lengths via a regex for the branch length and colon (in whatever your favourite regex language is), then you could simply grep or string search in some other manner for (Spec3,Spec2) and you'll find all trees which contain that grouping pretty easily.

e.g.: Remove decimals, sole zeros and colons from the file (probably not the most elegant regex):

Given your tree:

((Spec4:0.529207,(Spec3:0.0803395,Spec2:0.0124315)),Spec1:0,Spec1:0);

One could do:

cat test.tree | sed -e 's/[0-9]*\.[0-9]*//g' -e 's/0//g' -e 's/://g'

Yeilding:

 ((Spec4,(Spec3,Spec2)),Spec1,Spec1);

Then you can string search your yielded trees:

egrep -r -l "Spec(2|3),Spec(2|3)" .

Will give you all the filenames where Species 3 and Species 2 are adjacent nodes (in either orientation).

If you want to keep branch length in your trees as you're not just interested in topology, you could concoct a regex for use with grep:

egrep "Spec(2|3):(0?|[0-9]+\.[0-9]+),Spec(2|3):(0?|[0-9]+\.[0-9]+)" treefile.tree

But having to conjure that regex for every possible combination of topologies looks awful to me, so I'd be inclined to try it without the branch lengths.

I don't know how many topologies you're interested in finding in all your trees - this approach may not be feasible if it's a prohibitively large number.


Slightly more complex, if you'd like to see the match, and the file name, this is an option:

2 example sed-treated trees:

((Spec4,(Spec5,Spec6)),Spec2,Spec3);
((Spec4,(Spec3,Spec2)),Spec1,Spec1);

Passing a 'dummy filename' in the form of dev/null tricks grep in to printing the filename (as it thinks it's working on multiple files) and the actual match itself by default:

for file in *.tree ; do egrep "Spec(2|3),Spec(2|3)" "$file" /dev/null ; done

Would yeild:

sed2.tree:((Spec4,(Spec5,Spec6)),Spec2,Spec3);
sed.tree:((Spec4,(Spec3,Spec2)),Spec1,Spec1);

With the appropriate string matches highlighted (if your terminal is configured for it).

ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by jrj.healey13k
1

Dear jrj.healey, due to the fact that i'm not interested in keeping the branch lengths your idea is exactly what i need. Thanks a lot!

ADD REPLYlink written 2.4 years ago by ibasan40
0
gravatar for Jean-Karim Heriche
2.4 years ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche20k wrote:

Check if the species of interest are direct children of their common ancestor or simply count the number of children species of their common ancestor. This should be possible with most software with tree traversal capabilities. For example, the R phylobase package has the ancestor() and children() functions.

ADD COMMENTlink written 2.4 years ago by Jean-Karim Heriche20k

Dear Jean-Karim Heriche, i will have a look at the R phylobase package. Thanks!

ADD REPLYlink written 2.4 years ago by ibasan40
0
gravatar for Juke-34
2.4 years ago by
Juke-342.2k
Sweden
Juke-342.2k wrote:

I know one tool to do so, it's really powerful but it's in Prolog: bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-10-298

Maybe the paper cites other tools.

ADD COMMENTlink written 2.4 years ago by Juke-342.2k

Thanks for the link Juke-34. I will have alook at it!

ADD REPLYlink written 2.4 years ago by ibasan40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 917 users visited in the last hour