Nitrogen fixing enzyme annotation with HMMER and DIAMOND?
2
0
Entering edit mode
6 weeks ago

We are seeking to annotate nitrogen fixing genes within metagenomes similar to how Hotpep and dbCAN do for carbohydrate active enzymes.

First and foremost is there a premade tool capiable of doing this? I have searched but not been able to find anything.

If not is if a viable option to generate a profile HMM database for nitrogen fixing genes scrapped from UniProtKB and then use HMMER to annotate them within our metagenomes?

We would like to combine this with using the same protein sequences from Uniprot and create a DIAMOND database so use this tool for the same purpose of annotation nitrogen fixing genes. We would then cross reference the two tools and take hits that appear in both.

If this methodology valid and is uniprot a good place to go in order to scrape the protein sequences for a database?

annotation genome metagenome • 168 views
3
Entering edit mode
6 weeks ago

Nitrogen fixation is performed by enzymes of the class nitrogenases (EC 1.18.6.1, 1.18.6.2, 1.19.6.1, see KEGG Pathway map ) in bacteria and archaea.

Because this function is performed by a single complex you simply have to look for genes that code for the different subunits of the complex. You can find protein sequences that have this enzymatic function there or in UniprotKB as blast templates. You can further search AmiGO for proteins annotated with the molecular function GO term "Nitrogenase activity" (GO:0016163) and export sequences.

You do not need to build your own HMMs either, everything has been compiled by databases like PFAM. You can get an overview of different models here: https://www.ebi.ac.uk/interpro/search/text/nitrogenase/?page=1#table You can run a PFAM search or other tools with the individual models.

If you have enough resources you could even run your metagenome through InterproScan for annotation and look for the respective interpro ids: e.g. Nitrogenase iron-iron, delta subunit IPR014278.

0
Entering edit mode

Hi Michael,

Thank you for the detailed reply. Is AmigiGO a typo perhaps, as searching for it does not yield a GO annotation style tool?

Of the approaches you mentioned which do you feel would be best to capture the entire nitrogen fixing genes landscape of a metagenome?

1. AmigiGO
2. Pfam or other tool search (e.g. HMMER)
3. InterproScan

Lastly on the EBI page you linked there are 594 results returned. Am I correct as to interpret this as 594 different nitrogenous enzyme models where as the ~443k results in UniProtKB are an entry from a specific bacteria?

0
Entering edit mode

Dear Robert, I meant AmiGO, I have corrected the typo.

To get a complete picture of the nitrogen fixing genes, I would rank the methods as follows (top = most comprehensive)

1. Full InterProScan of the metagenome in DNA mode, provided contigs are long enough to give informative results, then filter for IPR's of interest, also enable GO annotation and look for all relevant terms
2. Use only some of the tools with the respective models like PFAM and TIGRFAMs with selected models
3. BlastX/TBlastN or eventually DIAMOND of the metagenome against templates of selected sequences from UniProtKB, AmiGO and KEGG can be used to help find those

You might even combine 1. & 3. Unfortunately, running a full InterProScan will require a cluster or multiple CPUs and most likely takes longest. Which method to choose depends on what else you want to do with the annotation.

If you intend to do the identification of only your gene family of interest it might be best to take a DIY approach to build a custom InterProScan pipeline: identify the databases in the InterProScan installation and replace all tool-specific databases with custom databases that only contain the models of interest. That would speed up the search, but I haven't tried it, and it would require some more in-depth knowledge about the different database formats.

Finally, I retrieved 194 results for my search "nitrogenase". Those are models of different proteins or domains and extracted from different databases. UniProtKB entries contain a single AA sequence or isoforms, normally from a single species.

0
Entering edit mode

Thank you i will attempt your suggestions. Thankfully I have access to a HPC so can run the full InterProScan without issue. Why is InterProScan is so sensitive, do tools such a Hotpep have popularity for identifying carbohydrate active enzymes?

0
Entering edit mode
6 weeks ago

Another option is to submit your data to the EMBL-EBI service MGnify, 2020 publication. They would run a full assembly and annotation pipeline including InterProScan.

Traffic: 2011 users visited in the last hour
FAQ
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.