Question: Probabilistic PCA on very sparse SNP matrix
gravatar for dominicdhall
9 months ago by
dominicdhall40 wrote:

I have a very sparse SNP matrix (~90% missing genotypes by sample and ~90% missing samples per SNP) which I would like to perform some sort of probabilistic PCA on. I have been using the packages VariantAnnotation to get the my snpMatrix object and originally tried to mimic a method shown here ( ) with the package snpStats. However, I don't believe this package was intended to work with extremely sparse SNP matrices and it struggles to correct for missing values within the SNP matrix.

I have tried to use the ppca function from the package pcaMethods but have not had a huge amount of success in finding any clusters of cells. Does anyone have any experience working with very sparse matrices for pca?

ADD COMMENTlink modified 7 months ago by Biostar ♦♦ 20 • written 9 months ago by dominicdhall40

what's your goal, i.e. what insights do you hope to get from the probabilistic PCA?

ADD REPLYlink written 9 months ago by Friederike3.3k

Can you first filter the sites that always have missing values first?

ADD REPLYlink written 7 months ago by btsui290
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1183 users visited in the last hour