Question: Does MetaCyc or BioCyc have an equivalent of KEGG orthologs that could be blasted against locally?
0
gravatar for O.rka
23 months ago by
O.rka210
O.rka210 wrote:

I've been using HUMAnN2 to go from human-removed microbiome shotgun metagenomic reads to HUMAnN2 attribute vectors with coverage and abundance values. The resulting attributes have identifiers that have the following structure: HISDEG-PWY: L-histidine degradation I|g__Streptococcus.s__Streptococcus_sanguinis where Streptococcus sanguinis is the taxonomic unit and HISDEG-PWY is the metabolic unit.

My question is how actually two separate questions:

(1) Is there a a database available through MetaCyc that has orthologs where one can blast against like KEGG orthologs? Which one would I download for this type of functionality.

From the url: https://metacyc.org/download.shtml

We provide the BioCyc databases (such as EcoCyc and MetaCyc) as collections of data files in several alternative formats including the following.

BioPAX format

Pathway Tools attribute-value format

Pathway Tools tabular format

SBML format

Gene Ontology annotations (EcoCyc only)

(2) How does HUMAnN2 go from read -> MetaCyc pathway & read -> species?

I know they use metaphlan2 in the backend which is how they get the species but how do they know which MetaCyc pathway to assign the protein?

ADD COMMENTlink modified 22 months ago by biouser20 • written 23 months ago by O.rka210
0
gravatar for biouser
22 months ago by
biouser20
Spain
biouser20 wrote:

For the 1st question according to their webpage: (https://bitbucket.org/biobakery/humann2/wiki/Home) *UniRef database provides gene family definitions *MetaCyc provides pathway definitions by gene family

It seems like they map to Uniprot, from there they get the "gene family" name and then they obtain the MetaCyc annotation. Let's say you already mapped to UniProt and get the protein id "P01189" (https://www.uniprot.org/uniprot/P01189) if you get the gene name from there "Name:POMC" you can easily obtain its description in the humancyc (instead of metacyc because for this example is a human protein): https://biocyc.org/gene?orgid=HUMAN&id=ENSG00000115138-MONOMER

I believe it works like that, so the way to go would be to map/blast to UniProt first I guess, thats my opinion at least.

You can download the database of Uniref50 for instance by following their tutorial: https://bitbucket.org/biobakery/humann2/wiki/Home#markdown-header-download-a-translated-search-database

Then you get the "uniref50_annotated.1.1.dmnd" and you can convert it to "fasta": diamond getseq -d uniref50_annotated.1.1.dmnd

Or downloading the fasta file directly from UniProt: https://www.uniprot.org/downloads

Anyway to have a more accurate answer you could also put your question in humann forum as well: https://groups.google.com/forum/#!forum/humann-users

Greetings

ADD COMMENTlink written 22 months ago by biouser20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1424 users visited in the last hour