Is there a different names.dmp file in the taxonomy name/id browser?
1
0
Entering edit mode
14 months ago
DNAngel ▴ 210

Hi all,

I have lists and lists of taxids and I want to get the species names along with family if possible. The names.dmp file only shows the scientific name and sometimes there are multiple names for the same taxid, but I noticed that when using the taxonomy browser (https://www.ncbi.nlm.nih.gov/Taxonomy/TaxIdentifier/tax_identifier.cgi) they always give just one name. So is it using a different names.dmp file or does it just give the first name hit for that taxid.

Furthermore, is there any file similar to names.dmp that gives family names for a taxid??

I should add that I have used the python package ete3 to get lineage and scientific names from taxid, but the output of the lineages is soo messy (they are not in neat columns for my to extract nor does it produce an output with clear headers) that I can't spend hours going through my hundreds of files to figure out which is the family name. For example, here is the lineage output I obtained from a small sample of my taxids:

taxid   0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34      
156304  root    Eukaryota   Eumetazoa   Arthropoda  Hexapoda    Hymenoptera Apocrita    Aculeata    Apidae  Pterygota   Opisthokonta    Metazoa Bilateria   Protostomia Neoptera    Endopterygota   Apoidea Insecta Xylocopinae Ceratinini  Ceratina    Dicondylia  Panarthropoda   cellular organisms  Ceratina calcarata  Pancrustacea    Mandibulata Zadontomerus    Ecdysozoa                       
65598   root    Eukaryota   Eumetazoa   Arthropoda  Hexapoda    Hymenoptera Apocrita    Aculeata    Apidae  Pterygota   Bombus  Opisthokonta    Metazoa Bilateria   Protostomia Neoptera    Endopterygota   Apoidea Insecta Bombus pascuorum    Apinae  Bombini Dicondylia  Panarthropoda   cellular organisms  Thoracobombus   Pancrustacea    Mandibulata Ecdysozoa                       
938226  root    Eukaryota   Eumetazoa   Arthropoda  Hexapoda    Lepidoptera Noctuidae   Pterygota   Opisthokonta    Metazoa Bilateria   Protostomia Neoptera    Endopterygota   Ditrysia    Noctuoidea  Glossata    Neolepidoptera  Heteroneura Insecta Dicondylia  Amphiesmenoptera    Panarthropoda   Acronictinae    Obtectomera cellular organisms  Pancrustacea    Mandibulata Craniophora Craniophora ligustri    Ecdysozoa               
156304  root    Eukaryota   Eumetazoa   Arthropoda  Hexapoda    Hymenoptera Apocrita    Aculeata    Apidae  Pterygota   Opisthokonta    Metazoa Bilateria   Protostomia Neoptera    Endopterygota   Apoidea Insecta Xylocopinae Ceratinini  Ceratina    Dicondylia  Panarthropoda   cellular organisms  Ceratina calcarata  Pancrustacea    Mandibulata Zadontomerus    Ecdysozoa                       
112596  root    Viruses Myoviridae  Caudovirales    Wolbachia phage WO  unclassified Myoviridae Duplodnaviria   Heunggongvirae  Uroviricota Caudoviricetes                                                                                                  
65598   root    Eukaryota   Eumetazoa   Arthropoda  Hexapoda    Hymenoptera Apocrita    Aculeata    Apidae  Pterygota   Bombus  Opisthokonta    Metazoa Bilateria   Protostomia Neoptera    Endopterygota   Apoidea Insecta Bombus pascuorum    Apinae  Bombini Dicondylia  Panarthropoda   cellular organisms  Thoracobombus   Pancrustacea    Mandibulata Ecdysozoa                       
156304  root    Eukaryota   Eumetazoa   Arthropoda  Hexapoda    Hymenoptera Apocrita    Aculeata    Apidae  Pterygota   Opisthokonta    Metazoa Bilateria   Protostomia Neoptera    Endopterygota   Apoidea Insecta Xylocopinae Ceratinini  Ceratina    Dicondylia  Panarthropoda   cellular organisms  Ceratina calcarata  Pancrustacea    Mandibulata Zadontomerus    Ecdysozoa                       
85660   root    Eukaryota   Eumetazoa   Arthropoda  Hexapoda    Hymenoptera Apocrita    Aculeata    Apidae  Pterygota   Bombus  Opisthokonta    Metazoa Bilateria   Protostomia Neoptera    Endopterygota   Apoidea Insecta Apinae  Bombini Dicondylia  Bombus hortorum Panarthropoda   cellular organisms  Megabombus  Pancrustacea    Mandibulata Ecdysozoa   
170557  root    Eukaryota   Eumetazoa   Arthropoda  Hexapoda    Phasmatodea Pterygota   Opisthokonta    Metazoa Bilateria   Protostomia Neoptera    Polyneoptera    Insecta Timema  Dicondylia  Panarthropoda   cellular organisms  Timema poppensis    Pancrustacea    Mandibulata Timematoidea    Timematidae Timematodea Ecdysozoa
589865  root    Bacteria    Proteobacteria  Deltaproteobacteria delta/epsilon subdivisions  cellular organisms  Desulfobacterales   Desulfobulbaceae    Desulfurivibrio Desulfurivibrio alkaliphilus    Desulfurivibrio alkaliphilus AHT 2  
7955    root    Eukaryota   Eumetazoa   Chordata    Vertebrata  Gnathostomata   Actinopterygii  Cypriniformes   Danio   Danio rerio Cyprinoidei Teleostei   Ostariophysi    Opisthokonta    Metazoa Bilateria   Deuterostomia   Neopterygii Craniata    Teleostomi  Euteleostomi    cellular organisms  Actinopteri Clupeocephala   Otophysi    Cypriniphysae   Otomorpha   Osteoglossocephalai Danionidae  Danioninae

As seen above, is it not always aligned (esp for Viruses, bacteria, some plants, nematodes, etc.) so it is hard to figure out what the families are when they aren't aligned properly. This is just a tiny sample of my files, I have millions of these lineages to go through.
Perhaps someone knows how to just extract family names from taxids using python ete3, which would be great. I've gone through all the commands and honestly I don't see a way to do that.

ncbi • 580 views
ADD COMMENT
2
Entering edit mode
14 months ago
GenoMax 120k

Using EntrezDirect (will improve later):

$ efetch -db taxonomy -id 156304 -format xml | xtract -pattern Taxon -block "*/Taxon" -if Rank -equals "family" -element ScientificName
Apidae
ADD COMMENT
0
Entering edit mode

I tried this out but I don't get any output. Just a blank. I tried other taxids and same thing just returns an empty line. Why might that be I wonder?

ADD REPLY
0
Entering edit mode

There may be no family information (or other bits) for some of the taxID's so there is not much you can do about that.

ADD REPLY
0
Entering edit mode

When viewing the data before xtract, the info is all there I can see the Ranks and family does say Apidae. I think maybe just need to tweak the xtract part. Here is the tidbit:

<Taxon>
        <TaxId>7458</TaxId>
        <ScientificName>Apidae</ScientificName>
        <Rank>family</Rank>
      </Taxon>
ADD REPLY
0
Entering edit mode

I got it to work with my basic dirty skills since I don't know awk well enough:

efetch -db taxonomy -id 76074 -format xml | grep -B 1 "<Rank>family</Rank>" | head -n 1 | cut -d ">" -f2 | cut -d
 "<" -f1
Exocoetidae
ADD REPLY
0
Entering edit mode

TaxID's are present at various levels. My original command only works with root taxID's that you have in the example above.

$ efetch -db taxonomy -id 156304 -format xml | xtract -pattern TaxaSet -group LineageEx -block Taxon -subset Taxon -tab '\n' -element TaxId,Rank,ScientificName
131567  no rank cellular organisms
2759    superkingdom    Eukaryota
33154   clade   Opisthokonta
33208   kingdom Metazoa
6072    clade   Eumetazoa
33213   clade   Bilateria
33317   clade   Protostomia
1206794 clade   Ecdysozoa
88770   clade   Panarthropoda
6656    phylum  Arthropoda
197563  clade   Mandibulata
197562  clade   Pancrustacea
6960    subphylum   Hexapoda
50557   class   Insecta
85512   clade   Dicondylia
7496    subclass    Pterygota
33340   infraclass  Neoptera
33392   cohort  Endopterygota
7399    order   Hymenoptera
7400    suborder    Apocrita
7434    infraorder  Aculeata
34735   superfamily Apoidea
7458    family  Apidae
78170   subfamily   Xylocopinae
78171   tribe   Ceratinini
78173   genus   Ceratina
236025  subgenus    Zadontomerus
ADD REPLY

Login before adding your answer.

Traffic: 1938 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6