Hello,
I am analyzing codominant microsatellite (SSR) genotyping data for a population genetics study of a wildlife species. The data come from multiple batches, genotyped using similar markers but in different labs or platforms.
Although the same loci are used, I am concerned that batch effects due to differences in scoring, allele binning, or lab-specific protocols may bias my downstream analysis (e.g., STRUCTURE, PCA, FST).
Is there a recommended approach to detect and correct batch effects in SSR datasets when allele scoring may vary slightly across batches?
I am specifically working with codominant data (alleles scored by length) and would like to avoid artificial clustering or population structure due to technical artifacts.
Any R packages, workflows, or guidelines on standardizing such datasets would be greatly appreciated.
Thank you in advance for your time and suggestions!
— Shervin