Question: Replace tip names in a newick file
0
gravatar for tlorin
4.6 years ago by
tlorin250
Switzerland
tlorin250 wrote:

Hello everyone :)

I have a collection of newick-formatted files containing gene IDs:

((gene1:1,gene2:1)100:1,gene3:1)100;
((gene4:1,gene5:1)100:1,gene6:1)100;

I have a list of equivalence between gene ID and species name:

speciesA=(gene1,gene4)
speciesB=(gene2,gene5)
speciesC=(gene3,gene6)

I would like to get the following output (to use Duptree software later):

((speciesA:1,speciesB:1)100:1,speciesC:1)100;
((speciesA:1,speciesB:1)100:1,speciesC:1)100;

Any idea of how I could proceed? Ideally in bash would be awesome :)

Thank you! :)

 

newick format • 1.6k views
ADD COMMENTlink modified 4.6 years ago by a.zielezinski8.9k • written 4.6 years ago by tlorin250
4
gravatar for a.zielezinski
4.6 years ago by
a.zielezinski8.9k
a.zielezinski8.9k wrote:

In Python, the solution may look like this:

import re
import sys

d = {}

fh = open(sys.argv[1])
for line in fh:
    species = line.split('=')[0]
    genes = re.findall('=\(([^\)]+)', line)[0].split(',')
    d[species] = genes
fh.close()

fh = open(sys.argv[2])
tree = fh.read()
for species in d:
    for gene in d[species]:
        tree = tree.replace(gene, species)

oh = open(sys.argv[2]+'.out', 'w')
oh.write(tree)
oh.close()

Assuming you have saved this code as gene2species.py and the newick tree is in the tree.newick file and the list of genes and species is in the genes.txt file, run the script as follows:

python gene2species.py genes.txt tree.newick

The translated newick tree will be saved in tree.newick.out:

((speciesA:1,speciesB:1)100:1,speciesC:1)100;
((speciesA:1,speciesB:1)100:1,speciesC:1)100;​
ADD COMMENTlink modified 4.6 years ago • written 4.6 years ago by a.zielezinski8.9k

Thanks so much! :)

ADD REPLYlink written 4.6 years ago by tlorin250
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2424 users visited in the last hour