Question: Probabilistic PCA on very sparse SNP matrix
gravatar for dominicdhall
2.4 years ago by
dominicdhall40 wrote:

I have a very sparse SNP matrix (~90% missing genotypes by sample and ~90% missing samples per SNP) which I would like to perform some sort of probabilistic PCA on. I have been using the packages VariantAnnotation to get the my snpMatrix object and originally tried to mimic a method shown here ( ) with the package snpStats. However, I don't believe this package was intended to work with extremely sparse SNP matrices and it struggles to correct for missing values within the SNP matrix.

I have tried to use the ppca function from the package pcaMethods but have not had a huge amount of success in finding any clusters of cells. Does anyone have any experience working with very sparse matrices for pca?

ADD COMMENTlink modified 2.2 years ago by Biostar ♦♦ 20 • written 2.4 years ago by dominicdhall40

what's your goal, i.e. what insights do you hope to get from the probabilistic PCA?

ADD REPLYlink written 2.4 years ago by Friederike6.3k

Can you first filter the sites that always have missing values first?

ADD REPLYlink written 2.2 years ago by btsui290
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2130 users visited in the last hour