When trying to investigate a 16S rRNA dataset, I often identify several dozen/hundred species/families which are found in higher/lower abundances. I then start doing literature searches to see what they could be doing, where they have been observed before etc.
To me this sounds:
- Really selective, only sampling a few papers for each species.
- Limiting, as there is no way to do this fully for tens of species.
- Incredibly time consuming.
What I'm really looking for is a system which I can put a taxa list into and it'll say "Those ones are all anoxic" or "Those 5 have shown denitrifying ability". I don't know if this could be done with literature mining or where to start this, or if there is a database around in the world which curates data like this...
Any suggestions are appreciated.
That's great. I had ignored IMG since people often want more than "just the organisms with sequenced genomes". And I had not looked at it in a while. But there are so many microbial genomes now...and great to see the associated physiological data.
I was just going to tick this 'answered', but I'm not allowed to do so on my own post... Do you know if there is a way to do it?
I thought this used to be possible. See what Istvan says (contact him if no response here).