I have a reference 16S rRNA dataset (AGP to be more precise) with computed alpha diversity according to the following pipeline: muscle -> fasttree -> scikit-bio faith PD. To remind, in AGP samples were sequenced using V4, with the length of reads = 150nts.

Now I want to compare a sample coming from the lab to this reference dataset. Most importantly, I want to compare alpha diversity. The problem is that this lab sample, despite being also of V4 region, has its full length, i.e., 200nts. I have several questions regarding the comparison of samples in such conditions.

  1. Should I trim sample from the lab to 150nts?
  2. Can I run the same alpha diversity pipeline independently for this sample (i.e., muscle -> fasttree -> scikit-bio faith PD) or should I integrate this sample into the AGP dataset and then run these steps for the merged dataset?
  3. Or maybe it is possible to place reads from the lab's sample to the tree of sequences (ASVs)?
