Question: PCA analysis for beginners
0
gravatar for Bella_p
2.0 years ago by
Bella_p50
Bella_p50 wrote:

Hello,

I have never done PCA analysis before and the concept is new to me. I wonder if I can get help from more experienced bioinformatians here.

I have few samples sequenced for whole exome, all originated from the same origin. From what I understand, it's a good analysis to see the similarities between samples, so I'd like to do PCA analysis for my samples.

My question is how to organize the data? Currently I have the samples VCF. I'm using Python and R. Any thoughts how to organize the data?

It's very general question, but I don't know how to begin., so any help will be appreciated.

Many thanks!

Editing my question to focus it:

I'd like to create as the below figure, only not gene expression, but of mutations of genes I have. I have a matrix of different samples and for each thousands of mutations, with the allele frequency. I'd like to create clustering based on the existing mutations and their allele frequency as the gradient.

Anyone knows which package I should use in R to do that?

enter image description here

sequencing pca ngs R vcf • 1.4k views
ADD COMMENTlink modified 2.0 years ago by zx87549.4k • written 2.0 years ago by Bella_p50
2

@Kevin has a nice tutorial available here: PCA plot from read count matrix from RNA-Seq

While that is geared towards RNAseq you may find it generally useful.

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by genomax87k

Thanks! I'll take a look at it

ADD REPLYlink written 2.0 years ago by Bella_p50

The figure you show is hierarchical clustering (heatmap), not PCA... You can use heatmap.2 function from R gplots for heatmaps.

ADD REPLYlink written 2.0 years ago by Benn8.0k

You are right. I meant I wanted another figure similar to the figure I uploaded. The PCA analysis I couldn't find a way to do it not for gene expression data. Do you know if there is any size limitation for heatmap.2 ?

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by Bella_p50
1

I think the limit will be your hardware, using big data sets require great computing power (so good hardware with many RAM and cpu knots).

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by Benn8.0k
1

Bella_p : If your needs have changed then it may be worth creating a new post or looking through other posts on Biostars first. Now the header of your post no longer matches the requirements.

@Kevin also happens to have a tutorial for HeatMaps: How to plot a heatmap with two different distance matrices for X and Y

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by genomax87k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1476 users visited in the last hour