Hi,
I have a list of GI from BLASTX output and I would like to classify sequences based on bacteria genus. How can i identify the genus name to corresponding GI (only for bacteria).
Hi,
I have a list of GI from BLASTX output and I would like to classify sequences based on bacteria genus. How can i identify the genus name to corresponding GI (only for bacteria).
If you want to check whether a given GI is a bacterial or not, you can use the 'blastdbcmd' (a replacement for 'fastacmd') with the outfmt option '%K' which gives the taxonomic super kingdom (Bacteria / Archaea / Eukaryota / Viruses). I am assuming that you are using one of the NCBI BLAST databases.
If you want to get the Genus name for a given taxID, you can use NCBI eutilities which is recommended in the thread Istvan linked to. You need to concatenate your list of taxIDs as a comma separated string. The following example from the NCBI website gives the taxonomy information (including Genus name) for taxIDs 9913 and 30521 in XML format.
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=taxonomy&id=9913,30521
Here is a possible answer to the question: Automatically Getting The Ncbi Taxonomy Id From The Genbank Identifier
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I managed to get taxon id at species level with the help of fastacmd.. but how can i link it to parent (genus level) to find out how many bacteria sequences are there and what are they.
Thanks