Question: Wrong Taxonomic Information On Ncbi
8.1 years ago by
John40 wrote:

What should I do if I have found that taxonomic information of published complete genome is false. I recently conducted the 16S rRNA based phylogenetic test and found that the reported genome/species falls into different genus. For me it's a surprise as this misleading information is present on benchmarked biological resources such as NCBI, HAMAP, uniprot, JGI, KEGG, expasy, MIST etc. Furthermore my research was involved with this organism and closely related species, therefore after identifying the actual phylogeny of this organism my whole research is in jeopardy due to availability of wrong information especially with a complete published genome.

I request your views and suggestions for the necessary action on this issue.

EDIT (more information originally given as answer)

Actually the genome of the organism is already published and avaialable at NCBI.I'm involved in investigating complete genomes of a particular genus using bioinformatics approach.While performing comparative genome analysis specially to identify the core genome I found the data of this particular genome highly deviant with rest of other species in the same genus which cought my attention and then I tested this organism with both Blast and phylogenetic methods using 16srRNA gene.In both the results this partucular organism didnot show homology to the species/genus where it is presently classified rather it showed homology to other species of a different genus.I have reported this to the author of genome but still not getting any reply therefore I put my concern over here .And yes this is true that I was working with a wrong organism due to wrong information at NCBI

It would help to know the organism concerned. Also, you should "request" our suggestions, not "expect" them :-)

It will be hard for people to help you if you don't provide any instances related to your problem.

I'm struggling to see how an organism present in multiple public databases can be a secret.

To be clear, what I'm asking is: why can't you give us the name of the organism as used at the NCBI? That is public, nothing to do with your research and might help answer the question.

It is not possible to provide the details due to research confidentiality.However I have performed the phylogenetic analysis with this partulcar organism and related specie and observed the clear distinction.I'm requesting what should I do when my whole research went into chaos and also wasted my time due to the wrong taxonomic information

I'm requesting for the action.Evidence is with me

If you could at least tell us what the organism is in NCBI taxonomy, we could check for known problems with that instance.

To reinterate I cannot disclose the organism name as it may lead to new report but I have found that this organism have been wrongly classified and kept with complete genome information at NCBI and other biological databases.

Thats not the expected help for which I came here rather you started your own reseach.My request was to know what steps should I take incase of wrong information publically available which lead to wastage of time and resource.

Sorry I cannot as it is a new finding to be reported.I'm asking the way to report this blunder from all of us.

8.1 years ago by
Dave Lunt2.0k
Hull, UK
Dave Lunt2.0k wrote:

You need to do a couple of things first.

Firstly, rigorously confirm that you are correct using phylogenetic approaches. I know that you are interested in whole genomes, but ignore that for now and build a high quality tree of every sequence closely related to this genome sequence SSU rDNA whether the whole genome is sequenced or not. Start at SILVA, download a structural alignment of the taxonomic level containing both the genera involved in this controversy. Build a maximum likelihood tree (using e.g. RAXML). Work from there to see how robustly supported your conclusions are from the new phylogeny you generate.

Secondly you need to write it up as a paper and submit it to a journal that is familiar with molecular phylogenetics. Before you do this read this paper on misuse of the word homology in bioinformatics, nicely described in this blog post among many others.

It is not unlikely that you are wrong, molecular phylogenetics often brings up doubt about the correct nomenclature of well-known strains. Drosophila melanogaster is a famous example and perhaps ought to be renamed Sophophora melanogaster (see also here and here). Another important issue is what is the type species for the genus you are interested in?

In summary, you won't be able to get any database, or anyone else, to alter the taxonomy without first publishing a rigorous analysis in a peer reviewed journal. If you are not comfortable with phylogenetic analysis and systematics then you will probably need to find experts in these areas for your specific taxa to collaborate with. This will involved a substantial piece of work by you and patience while you publish and a consensus is reached. Important work always does though.

Dave Can we collaberate for the publication?

Unfortunately I'm not your man. You need someone who is an expert on the systematics of the taxonomic group you are dealing with, and thats almost certainly not me.

8.1 years ago by
Sydney, Australia
Neilfws48k wrote:

I'm not sure that I fully understand the problem. If I'm right, you seem to be saying that you have used 16S sequence to classify an organism and your classification disagrees with that found in NCBI Taxonomy? A few points:

  • Different taxonomic methods can group organisms differently.
  • NCBI taxonomy is not an authoritative source of taxonomic information. You may find that genus/species names change over time, or that different names are used. Check this carefully and consult other taxonomic databases.
  • if you genuinely believe that an online resource contains an error, the correct course of action is to contact the maintainer of the resource, describing the problem.

If you have been working with the "wrong" organism due to incorrect classification, that is indeed a problem. You should check your experimental results carefully, e.g. for contamination. But perhaps what you have discovered is that the organism should be reclassified - which is a finding?

8.1 years ago by
iw9oel_ad6.0k wrote:

Firstly, taxonomy (identification and classification) and phylogenetics (making evolutionary hypotheses) are not quite the same. Often phylogenetic evidence is used to support or guide taxonomy; however, taxonomy is a judgement call often based on wider evidence than molecular phylogeny. Having said that, compelling phylogenetic evidence may cause taxonomic boundaries to be re-drawn. It is quite common for bacteria to shift "genus" as new evidence arises e.g. some Rhodococcus spp. used to be called Nocardia spp. and when large portions of a genome may arrive by lateral transfer, the whole notion of "species" becomes hazy too.

On your specific problem; published results, including genome sequences and annotation, are an hypothesis. If you have evidence that invalidates it, you've moved things forwards. I would start by contacting the author of the paper to discuss your findings. AFAIK the originator of the Genbank record is the one who has to update the annotation.

If you can't/won't give any more details then you should go through the normal channels avaliable when there is a scientific disagreement - something that happens all the time.

Finally I would say that it is entirely the researcher's responsibilty to check the material they are working with. Don't believe everything you see in annotations; annotations are hypotheses and hypotheses may be incorrect. You can't blame the NCBI for that.

Well, I would always assess my input data. In most cases I would not expect to "re-do its annotation ... and everything else". But in rare cases yes, I've had to re-annotate entire genomes because they were covered initially by terrible automatically-generated gene predications.

That means whenever I chose any genome to work on I should re do it's annotation,taxnomic validation and everything else associated with that genome before starting my own experiment. My concern is why such misleading information is still harboured at highly used biological resources.I'm a bioinformatician so could identify the flaw but feel pity about all those who do the diversity study in wet lab and may wrongly identify a wrong species due to existing flaws hosted at ncbi

I wish it could be due to horizontal gene transfer then I would be the luckiest person to challenge the present day 16s rRNA based tree of life classification by carls woese .As my present organism is differing with rest of its species due to 16s rRNA phylogeny

