Entering edit mode
6.6 years ago
amitgourav.ghosh12
▴
70
I have plink.eigenval and plink.eigenvec files after using --pca operation in plink. Can you please suggest me how to proceed in R to do a pca plot? Thank you!
For reference: plink.eigenval-
77.2046
50.9024
26.3038
14.4428
13.6282
9.79033
7.17447
6.59537
5.68401
5.56566
4.89969
4.29881
3.75531
3.60292
3.25788
3.12537
2.91697
2.79415
2.61761
2.5065
I plot PC1 and PC2 from .eigenvec (column 3rd and 4th) as scatter plot in R. You can save these columns in csv (comma separated file) and plot them.
Yes, use the eigenvec file
Thank you Sir! I did use that file, now I am trying to add colours according to groups(ethnicities) in the graph.
Okay, have you been able to do that (colours)?
Thank you very much for your kind advice! It was successful, now I am trying add colours to the plot according to their groups(ethnicities) and finding them to be a bit daunting.
Thank you very much again!
The most basic way to do that is to create a colour vector whose order matches the order of your samples in the input data, and then to use this vector via the
col
parameter to theplot()
function.There are other, more automate ways to do this, though. It is good practice to organise your metadata (including colouring) before starting a particular study, I have found. These can be regarded as initialisation and global variables.
Wow! Your advice might give me good head start about the ways to proceed or just to toy around the data to learn things in a playful manner. Please accept my gratitude for your kind guidance.
My file looks like the one down below (actually has 1977 individuals after filtering), I can plot the eigen values, but I want to colour(according to their ethnic groups) the dots and add the ethnicity(the second column) in the plot. I am wondering how can I accomplish that.
GA000217 Abkhasian 0.0147066 0.0363746 -0.0159528 0.0088663
Thank you!
Okay, yes, 1977 is a lot. I presume that your first column is a unique sample ID, whereas the second column is some sort of super population. Abkhasia is around Syria, right?
If you want to colour by the super group, then there should only be a few unique values, overall. If the unique values were Abkhasians, Kurds, Turks, and Arabs, then you would do the following to create the colour vector automatically (assuming that your groups are in columns 1 and 2 of an object called
eigenvec
):This should colour each group as per the order of the listed populations and colours
Try it out.
Note that you can also produce different shapes by supplying a vector of PCH values to the pch parameter of
plot()
. If the super-populations have 30, 40, 50, and 60 samples respectively, and are grouped together in that order, then we could do this with:Great! I feel this is exactly what I wanted. Thank you Sir!
Regarding Abkhazia, yes you are almost correct. This region is bit norther up in Georgia. Situated in the eastern coast of Black Sea. Last but not the least, it is a disputed territory of Georgia which enjoyed some autonomy while it was part of USSR. It is pivotal player nowadays in Russia-Georgia relationship.
Sounds interesting - I will visit some day.
I just finished counting the unique population groups in the samples. There were 254 different ethnicities in the sample size of 1977.
Visually, that will be difficult for our crappy human eyes to distinguish, no? The colour spectrum is mathematically infinite but our ability to distinguish 2 very similar colours is not.
Can you arrange them into even larger groups?
Yes I plan to do so, I tried plotting it in R using ggplot, the colour reference took up the whole area instead of the plot.
I managed to plot it in Excel with the colours, but I presume it is mostly for visualisation only. The name of the add-in is XLMiner Data Visualisation. If I point the cursor on any dot, it shows its coordinates and ethnicity.
Thank you!