Question

How to do pca plot from .eigenval and .eigenvec files from Plink in R.

1

Entering edit mode

6.4 years ago

amitgourav.ghosh12 ▴ 70

I have plink.eigenval and plink.eigenvec files after using --pca operation in plink. Can you please suggest me how to proceed in R to do a pca plot? Thank you!

For reference: plink.eigenval-

R SNP • 9.3k views

ADD COMMENT • link updated 6.4 years ago by Devon Ryan 104k • written 6.4 years ago by amitgourav.ghosh12 ▴ 70

2

Entering edit mode

I plot PC1 and PC2 from .eigenvec (column 3rd and 4th) as scatter plot in R. You can save these columns in csv (comma separated file) and plot them.

ADD REPLY • link 6.4 years ago by BAGeno ▴ 190

2

Entering edit mode

Yes, use the eigenvec file

ADD REPLY • link 6.4 years ago by Kevin Blighe 88k

0

Entering edit mode

Thank you Sir! I did use that file, now I am trying to add colours according to groups(ethnicities) in the graph.

ADD REPLY • link 6.4 years ago by amitgourav.ghosh12 ▴ 70

0

Entering edit mode

Okay, have you been able to do that (colours)?

ADD REPLY • link 5.9 years ago by Kevin Blighe 88k

0

Entering edit mode

Thank you very much for your kind advice! It was successful, now I am trying add colours to the plot according to their groups(ethnicities) and finding them to be a bit daunting.

Thank you very much again!

ADD REPLY • link 6.4 years ago by amitgourav.ghosh12 ▴ 70

2

Entering edit mode

The most basic way to do that is to create a colour vector whose order matches the order of your samples in the input data, and then to use this vector via the col parameter to the plot() function.

samples <- c("GroupA","GroupB","GroupA","GroupA")
colour <- c("royalblue","firebrick1","royalblue","royalblue")
plot(..., col=colour)

There are other, more automate ways to do this, though. It is good practice to organise your metadata (including colouring) before starting a particular study, I have found. These can be regarded as initialisation and global variables.

ADD REPLY • link 6.4 years ago by Kevin Blighe 88k

0

Entering edit mode

Wow! Your advice might give me good head start about the ways to proceed or just to toy around the data to learn things in a playful manner. Please accept my gratitude for your kind guidance.

My file looks like the one down below (actually has 1977 individuals after filtering), I can plot the eigen values, but I want to colour(according to their ethnic groups) the dots and add the ethnicity(the second column) in the plot. I am wondering how can I accomplish that.

GA000217 Abkhasian 0.0147066 0.0363746 -0.0159528 0.0088663

Thank you!

ADD REPLY • link 6.4 years ago by amitgourav.ghosh12 ▴ 70

2

Entering edit mode

Okay, yes, 1977 is a lot. I presume that your first column is a unique sample ID, whereas the second column is some sort of super population. Abkhasia is around Syria, right?

If you want to colour by the super group, then there should only be a few unique values, overall. If the unique values were Abkhasians, Kurds, Turks, and Arabs, then you would do the following to create the colour vector automatically (assuming that your groups are in columns 1 and 2 of an object called eigenvec):

require("RColorBrewer")
population <- factor(eigenvec[,2], levels=c("Abkhasians", "Kurds", "Turks", "Arabs"))
col.population <- colorRampPalette(c("royalblue", "red3", "limegreen", "gold"))(length(unique(population)))[factor(population)]

This should colour each group as per the order of the listed populations and colours

Try it out.

Note that you can also produce different shapes by supplying a vector of PCH values to the pch parameter of plot(). If the super-populations have 30, 40, 50, and 60 samples respectively, and are grouped together in that order, then we could do this with:

shape <- c(rep(16, 30), rep(15, 40), rep(16, 50), rep(16, 60))

ADD REPLY • link 6.4 years ago by Kevin Blighe 88k

1

Entering edit mode

Great! I feel this is exactly what I wanted. Thank you Sir!

Regarding Abkhazia, yes you are almost correct. This region is bit norther up in Georgia. Situated in the eastern coast of Black Sea. Last but not the least, it is a disputed territory of Georgia which enjoyed some autonomy while it was part of USSR. It is pivotal player nowadays in Russia-Georgia relationship.

ADD REPLY • link 6.4 years ago by amitgourav.ghosh12 ▴ 70

1

Entering edit mode

Sounds interesting - I will visit some day.

ADD REPLY • link 6.4 years ago by Kevin Blighe 88k

0

Entering edit mode

I just finished counting the unique population groups in the samples. There were 254 different ethnicities in the sample size of 1977.

ADD REPLY • link 6.4 years ago by amitgourav.ghosh12 ▴ 70

1

Entering edit mode

Visually, that will be difficult for our crappy human eyes to distinguish, no? The colour spectrum is mathematically infinite but our ability to distinguish 2 very similar colours is not.

Can you arrange them into even larger groups?

ADD REPLY • link 6.4 years ago by Kevin Blighe 88k

1

Entering edit mode

Yes I plan to do so, I tried plotting it in R using ggplot, the colour reference took up the whole area instead of the plot.

I managed to plot it in Excel with the colours, but I presume it is mostly for visualisation only. The name of the add-in is XLMiner Data Visualisation. If I point the cursor on any dot, it shows its coordinates and ethnicity.

Thank you!

ADD REPLY • link 6.4 years ago by amitgourav.ghosh12 ▴ 70