I was wondering if somebody could please help me solve this problem: I have several different fasta files, all containing 16S sequences (gut microbes), and I would like to determine how OTUs compare across samples; for example, is OTU01 in sample X, the same as OTU01 in samples Y, W and Z? Ideally I would like to combine all of these tables in one single matrix, with OTU and sample as variables, and OTU count as a response. The problem is that there are multiple fasta files, and they refer to sequences from different individuals, different runs and from different sequencing platforms (454 and Illumina).
I could probably do this manually, by generating a file with only representative sequences and the number of sequences per OTU for each of these samples (using Mothur) and then I could manually match sequences that are similar and give them the same name (e.g. "OTU01"). However, I believe there must be an easier and quicker way of accomplishing this task. If you have done anything similar to this, please let me know what tools or strategy you used. I appreciate your help!