I have a reference 16S rRNA dataset (AGP to be more precise) with computed alpha diversity according to the following pipeline: muscle -> fasttree -> scikit-bio faith PD. To remind, in AGP samples were sequenced using V4, with the length of reads = 150nts.
Now I want to compare a sample coming from the lab to this reference dataset. Most importantly, I want to compare alpha diversity. The problem is that this lab sample, despite being also of V4 region, has its full length, i.e., 200nts. I have several questions regarding the comparison of samples in such conditions.
- Should I trim sample from the lab to 150nts?
- Can I run the same alpha diversity pipeline independently for this sample (i.e., muscle -> fasttree -> scikit-bio faith PD) or should I integrate this sample into the AGP dataset and then run these steps for the merged dataset?
- Or maybe it is possible to place reads from the lab's sample to the tree of sequences (ASVs)?