Last Common Ancestor From Ncbi Taxonomy Using Java
2
4
Entering edit mode
11.0 years ago
Sudeep ★ 1.7k

Hi, Are there any java libraries available that can calculate the last common ancestor from NCBI taxonomy files ? I had a look in BioJava, but that was not much helpful. I know that lca methods exist in BioPerl, but could not find any implementation in Java. Thanks, Sudeep.

ncbi taxonomy java • 4.4k views
ADD COMMENT
0
Entering edit mode

from the flat file ?

ADD REPLY
8
Entering edit mode
11.0 years ago

The following java code reads the flat file nodes.bmp and print the common ancestor between two taxin-ids.

import java.io.*;
import java.util.*;
import java.util.regex.*;
public class Biostar10350
    {
    private Map<Integer,Integer> id2parent=new HashMap<Integer,Integer>(791000);
    /** get all ancestor of a given taxon id*/
    private List<Integer> lineage(int id)
        {
        if(!id2parent.containsKey(id)) throw new IllegalArgumentException();
        LinkedList<Integer> L=new LinkedList<Integer>();
        for(;;)
            {
            L.addFirst(id);
            Integer parent=id2parent.get(id);
            if(parent==null || parent.equals(id)) break;
            id=parent;
            }
        return L;
        }

    private int run(String filename,int id1,int id2) throws Exception
        {
        Pattern pipe=Pattern.compile("[\\|]");
        BufferedReader in=new BufferedReader(new FileReader(filename));
        String line;
        while((line=in.readLine())!=null)
            {
            String tokens[]=pipe.split(line,3);
            Integer tax_id=Integer.parseInt(tokens[0].trim());
            Integer parent_id=Integer.parseInt(tokens[1].trim());
            id2parent.put(tax_id,parent_id);
            }
        in.close();
        List<Integer> L1= lineage(id1);
        List<Integer> L2= lineage(id2);
        int index=-1;
        while(index+1 < L1.size()  &&
            index+1 < L2.size() &&
            L1.get(index+1).equals(L2.get(index+1)))
            {
            index++;
            }
        return L1.get(index);
        }

    public static void main(String args[]) throws Exception
        {
        System.out.println("Common ancestor is taxon-id:"+
            new Biostar10350().run(
                args[0],
                Integer.parseInt(args[1]),
                Integer.parseInt(args[2])) );
        }
    }

Usage:

$ javac Biostar10350.java 
$ java Biostar10350 nodes.dmp  9606 10090
Common ancestor is taxon-id:314146
ADD COMMENT
0
Entering edit mode

@Pierre Lindenbaum, thanks a lot for your time and effort, that really helped

ADD REPLY
5
Entering edit mode
10.4 years ago
Pablo Pareja ★ 1.6k

Hi,

I just published a blog post about how you can achieve that (also extended to a set of nodes with an arbitrary length) with Bio4j

http://blog.bio4j.com/2012/02/finding-the-lowest-common-ancestor-of-a-set-of-ncbi-taxonomy-nodes-with-bio4j/

Hope it's useful ;)

Pablo

ADD COMMENT
0
Entering edit mode

Hi thank you and +1 for the blog

ADD REPLY
0
Entering edit mode

@Sudeep thanks, glad you like it!

ADD REPLY

Login before adding your answer.

Traffic: 1319 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6