Question

Batch effect in GWAS

0

Entering edit mode

3.9 years ago

Researcher ▴ 20

I am working on bacterial GWAS. I have two batches: When I applied GWAS on the first batch, I found 5 effective principal components (based on a scree plot for eigenvalues of an MDS) to control population stratification. When I applied GWAS on the second batch, I found 3 effective principal components (based on a scree plot of an MDS). When I merged the batches, I found again 3 principal components (based on a scree plot of an MDS). However, I expected to see around 8 components! what that means? Does it mean that some of my principal components got lost? How many components should I add for the merged GWAS analysis?

GWAS Genomics Batch-effect sequencing • 1.1k views

ADD COMMENT • link 3.9 years ago by Researcher ▴ 20

0

Entering edit mode

Please show all commands that you have used, and obviously mention how you are conducting PCA.

ADD REPLY • link 3.9 years ago by Kevin Blighe 87k

0

Entering edit mode

I applied MDS on a distance matrix for all samples based on phylogeny.

ADD REPLY • link 3.9 years ago by Researcher ▴ 20

0

Entering edit mode

Thanks, well, I cannot see exactly the commands that you're running and, e.g., how you are merging your datasets and how many dimensions you are including to control for population stratification; so, I am left to hypothesise in this regard. Important to also know the percent explained variation for each dimension, and if they actually segregate your samples in a bi-plot in the way that you think.

Does it mean that some of my principal components got lost? How many components should I add for the merged GWAS analysis?

No. If you merge 2 datasets, the primary sources of variation will change; therefore, so will a MDS analysis performed on this merged dataset. You should be able to configure the program to output more or all dimensions.

ADD REPLY • link 3.9 years ago by Kevin Blighe 87k

0

Entering edit mode

Dear @Kevin Blighe Actually, as I mentioned before, for the first dataset, I added 5 dimensions for controlling population structure, and for the second one, I added three. And the number of dimensions is decided based on the knee of scree plots for components of the MDS. And for the merge one, based on the knee of my scree plot for new MDS, again 3 dimensions should be added to control for population structure. Another point, I have all the dimensions, but I decided the number of dimensions to add to the model based on knee of my scree plots. And for merging dataset, I am merging them after quality control, by intersecting the variants appearing in both, and then applying a linear/logistic model.

ADD REPLY • link 3.9 years ago by Researcher ▴ 20

0

Entering edit mode

I see - thanks for explaining! It is still important to look at the actual percent accumulative explained variation along each successive dimension. Just using the 'knee' / 'elbow' method may not be a good metric if used in isolation.

ADD REPLY • link 3.9 years ago by Kevin Blighe 87k