I downloaded a trimmed 16s file for a single patient from the HMPdacc. The body site was stool and the study was 16S_PP1. When I pick OTUs form the file using the qiime using;
pick_closed_reference_otus.py -i ./SRS075963.fsa -o $PWD/closed_reference_otu
I get a biom file with 12467 sample names, which is not what I expected. I was expecting to see a single sample name, since Im looking at 1 patient?
When I look at the sequence file I got from HMP, I get the following for the sample names for the sequences;
>GKLCT6U01ADS6O_cs_nbp_rc
>GKLCT6U01ERG3G_cs_nbp_rc
>GKLCT6U01DHALW_cs_nbp_rc
>GKLCT6U01DV7XB_cs_nbp_rc
The first 9 characters "GKLCT6U01" and the last 9 characters "_cs_nbp_rc " are conserved through all the sample names. The only difference between all of then is five character before the underscore "ADS6O" and "ERG3G." Which leads me to believe that these are sequence IDs and require a underscore after the "GKLCT6U01," so the renamed label would be;
>GKLCT6U01_ADS6O_cs_nbp_rc
>GKLCT6U01_ERG3G_cs_nbp_rc
>GKLCT6U01_DHALW_cs_nbp_rc
>GKLCT6U01_DV7XB_cs_nbp_rc
I want to know if this a correct assumption and if me editing the file is appropriate.
Thanks for helping