I have never done PCA analysis before and the concept is new to me. I wonder if I can get help from more experienced bioinformatians here.
I have few samples sequenced for whole exome, all originated from the same origin. From what I understand, it's a good analysis to see the similarities between samples, so I'd like to do PCA analysis for my samples.
My question is how to organize the data? Currently I have the samples VCF. I'm using Python and R. Any thoughts how to organize the data?
It's very general question, but I don't know how to begin., so any help will be appreciated.
Editing my question to focus it:
I'd like to create as the below figure, only not gene expression, but of mutations of genes I have. I have a matrix of different samples and for each thousands of mutations, with the allele frequency. I'd like to create clustering based on the existing mutations and their allele frequency as the gradient.
Anyone knows which package I should use in R to do that?