PCA plot color differentiation
1
1
Entering edit mode
4.2 years ago
tothepoint ▴ 800

I followed the post related to PCA from vcf file followed by

library("SNPRelate")
vcf.fn<-"~/xxx/tmp.vcf"
snpgdsVCF2GDS(vcf.fn, "ccm.gds",  method="biallelic.only")
genofile <- openfn.gds("ccm.gds") 
ccm_pca<-snpgdsPCA(genofile)
plot(ccm_pca$eigenvect[,1],ccm_pca$eigenvect[,2] ,col=as.numeric(substr(ccm_pca$sample, 1,3) == 'CCM')+3, pch=2)

I got a plot but not able to conclude from it. There is no legend or color code differentiation between the breeds. My understanding to R is very limited and couldn't able to find a fix. May I get some help/tips to fix this problem. I will be grateful to you all.

PCA gwas color plink • 3.1k views
ADD COMMENT
0
Entering edit mode

You are already attempting to set colours via this parameter:

col=as.numeric(substr(ccm_pca$sample, 1,3) == 'CCM')+3

Did you write this code (above) on your own?

What is the output of:

ccm_pca$sample
substr(ccm_pca$sample, 1,3)
ADD REPLY
0
Entering edit mode

Dear Kevin, I found that code on previously mentioned post.

The output for

ccm_pca$sample

[1] "01" "03" "04" "05" "06"
[6] "07" "10:01" "10:03" "10:04" "10:05" [11] "10:06" "10:20" "20" "2:01" "2:03" [16] "2:04" "2:05" "2:06" "2:07" "2:20" [21] "3:01" "3:03" "3:04" "3:05" "3:06" [26] "3:07" "3:20" "4:01" "4:03" "4:04" [31] "4:05" "4:06" "4:07" "4:20" "5:01" [36] "5:03" "5:04" "5:05" "5:06" "5:07"

substr(ccm_pca$sample, 1,3)

[1] "01" "03" "04" "05" "06" "07" "10:" [8] "10:" "10:" "10:" "10:" "10:" "20" "2:0" [15] "2:0" "2:0" "2:0" "2:0" "2:0" "2:2" "3:0" [22] "3:0" "3:0" "3:0" "3:0" "3:0" "3:2" "4:0" [29] "4:0" "4:0" "4:0" "4:0" "4:0" "4:2" "5:0" [36] "5:0" "5:0" "5:0" "5:0" "5:0

Respectively

ADD REPLY
0
Entering edit mode

Thanks for sharing..!!

Although this works perfectly I found one issue regarding grouping of samples manually. What if we have to color this group into forest green.

'03', '10':03', '2:03'

Similarly, this group into red color.

'05', '10:05', '3:05'

Any suggestion.

ADD REPLY
0
Entering edit mode

Is the Reds or Greens palette sufficient?

Alternatively, you can literally use any colours that you want, like this:

myPalette <- c('white','pink','red2','red4')
colorRampPalette(myPalette)(length(unique(x)))[factor(x, levels = c('01', '03', '05', '08'))]
ADD REPLY
3
Entering edit mode
4.2 years ago

I am confident that you know to what these numbers relate?

set specific colours for each factor level

You can set colours manually to any vector like this:

x <- c('01','03','05','01','08','08','01','03','05')
unique(x)
[1] "01" "03" "05" "08"
colours <- c('royalblue', 'forestgreen', 'red4', 'pink')[factor(x, levels = c('01', '03', '05', '08'))]

Here, the order of the colours ('royalblue', 'forestgreen', 'red4', 'pink') must match the order of the factor levels that you are setting ('01', '03', '05', '08').

apply a colour gradient / pre-built palette to your factor levels

You can also automatically assign colours via RColorBrewer:

require('RColorBrewer')
pick.col <- brewer.pal(10, 'Spectral')
colII <- colorRampPalette(pick.col)(length(unique(x)))[factor(x, levels = c('01', '03', '05', '08'))]

Here, I use the 'Spectral' palette. For other palettes, take a look here (half-way): http://www.sthda.com/english/wiki/colors-in-r

colors-in-r-rcolorbrewer-palettes

create your own palette

For a custom palette, do something like this:

myPalette <- c('white','pink','red2','red4')
colorRampPalette(myPalette)(length(unique(x)))[factor(x, levels = c('01', '03', '05', '08'))]

----------------------

You can then plot a custom legend via the legend() function, which comes with base R.

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 3112 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6