Question: Probabilistic PCA on very sparse SNP matrix
0
gravatar for dominicdhall
9 months ago by
dominicdhall40
dominicdhall40 wrote:

I have a very sparse SNP matrix (~90% missing genotypes by sample and ~90% missing samples per SNP) which I would like to perform some sort of probabilistic PCA on. I have been using the packages VariantAnnotation to get the my snpMatrix object and originally tried to mimic a method shown here (https://www.bioconductor.org/packages/release/bioc/vignettes/snpStats/inst/doc/pca-vignette.pdf ) with the package snpStats. However, I don't believe this package was intended to work with extremely sparse SNP matrices and it struggles to correct for missing values within the SNP matrix.

I have tried to use the ppca function from the package pcaMethods but have not had a huge amount of success in finding any clusters of cells. Does anyone have any experience working with very sparse matrices for pca?

ADD COMMENTlink modified 7 months ago by Biostar ♦♦ 20 • written 9 months ago by dominicdhall40
1

what's your goal, i.e. what insights do you hope to get from the probabilistic PCA?

ADD REPLYlink written 9 months ago by Friederike3.3k

Can you first filter the sites that always have missing values first?

ADD REPLYlink written 7 months ago by btsui290
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1183 users visited in the last hour