Question: How associate raw reads with MetaCyc/Taxonomy from HUMAnN2?
0
gravatar for O.rka
13 months ago by
O.rka120
O.rka120 wrote:

Extending a discussion: https://groups.google.com/forum/#!msg/humann-users/0rbswpcxL1M/4mZNbNd8DAAJ

According to the conversation above this is what appears to be going on, PLEASE CORRECT WHERE NECESSARY:

  1. Reads would be mapped to genes in the following files in the [name]_temp/ directory: 1a. [name]_diamond_aligned.tsv

    NS500647:186:HV3F5BGX2:1:11101:17415:10831:N:0:CTCAGA gi|400294433|ref|NZ_ALJK01000240.1|:c8692-7928|1655|g__Actinomyces.s__Actinomyces_naeslundii|UniRef90_J2ZLR9|UniRef50_R5IQW2|765 99.18032786885246 122.0 0 NS500647:186:HV3F5BGX2:1:11101:20301:11561:N:0:CTCAGA gi|288801553|ref|NZ_GG740010.1|:c46051-44267|28132|g__Prevotella.s__Prevotella_melaninogenica|UniRef90_D9RTD5|UniRef50_R5FHH6|1785 93.27731092436974 119.0 0 NS500647:186:HV3F5BGX2:1:11101:21205:85401:N:0:CTCAGA gi|512460964|ref|NZ_KE150253.1|:c257004-255034|45242|g__Capnocytophaga.s__Capnocytophaga_granulosa|UniRef90_J5Y4E1|UniRef50_F8EB68|1971 95.86206896551724 145.0

1b. [name]_bowtie2_aligned.tsv NS500647:186:HV3F5BGX2:1:11101:17082:23151:N:0:CTCAGA|146 UniRef90_D1BPC2|753 75.0 40 10 0 6 125 212 251 1.5e-11 70.9 NS500647:186:HV3F5BGX2:1:11101:17082:23151:N:0:CTCAGA|146 UniRef90_K0Y074|828 73.2 41 11 0 3 125 236 276 2.5e-11 70.1 NS500647:186:HV3F5BGX2:1:11101:17082:23151:N:0:CTCAGA|146 UniRef90_E4L934|762 75.0 40 10 0 3 122 214 253 2.5e-11 70.1

  1. These genes/proteins/protein-cluster from [name]_bowtie2_aligned.tsv without taxonomy information and from [name]_diamond_aligned.tsv with taxonomy information would be associated with the reactions from the following file: 2a. humann2/data/pathways/metacyc_reactions_level4ec_only.uniref.bz2

    GLUCOSE-1-PHOSPHATE-PHOSPHODISMUTASE-RXN 2.7.1.41 UniRef50_G6EMD2 UniRef50_Q48UI5 UniRef50_T0UKK6 UniRef90_F4BRY1 UniRef90_G4L3Q0 UniRef90_G6EMD2 UniRef90_K0J4V1 UniRef90_R9TSN7 UniRef90_T0T682 UniRef90_T0UKK6

  2. The reactions from this would be associated with the pathway identifiers from this file: humann2/data/pathways/metacyc_pathways

    PWY-2681 RXN-4303 RXN-4304 RXN-4310 RXN-4305 RXN-4306 RXN-4312 RXN-4308 RXN-4314 RXN-4307 RXN-4313 RXN-4317 PWY1G-126 1.8.1.15-RXN RXN1G-6 METHGLYUT-PWY 1.1.1.283-RXN LACTALDDEHYDROG-RXN L-LACTDEHYDROGFMN-RXN RXN0-4281 RXN-8632 GLYOXIII-RXN GLYOXI-RXN GLYOXII-RXN DLACTDEHYDROGFAD-RXN

Is the above pipeline correct or am I missing details?

My specific questions about edge cases: (i) PWY-5030: L-histidine degradation III|g__Streptococcus.s__Streptococcus_sanguinis Would this HUMAnN2 attribute from the abundance profile contain all of the genes from all of the reactions in the [name]_diamond_aligned.tsv since there is taxonomy associated with the identifier?

(ii) PWY-2942: L-lysine biosynthesis III Would this one be from all of the genes in [name]_bowtie2_aligned.tsv since there is no taxonomy information in the identifier and this does not contain taxonomy information because it was identifier to an orthologous group?

(iii) UNINTEGRATED|g__Streptococcus.s__Streptococcus_sanguinis I'm not sure where the read -> organism mapping file is located.

(iv) Many of the UniRef(5/9)0_XYZ identifiers are not in in the humann2/data/pathways/metacyc_reactions_level4ec_only.uniref.bz2 file. How could these be handled? Is there another file where I should be looking for this information?

ADD COMMENTlink written 13 months ago by O.rka120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1933 users visited in the last hour