Question: Usage of selectLab in PCAtools, R
0
gravatar for Sebastian Hesse
6 months ago by
Germany / Munich / Dr. von Hauner Children's Hospital
Sebastian Hesse190 wrote:

Hi guys I just can not figure out how to use the selectLab option in the biplot of PCAtools. I tried creating a logical vector (withF for every sample I don't want to have a lab for) as well as a vector with all the sample_ids that should have a label. But nothing works nd I can't wrap my head around how to make it work.

Thanks a lot for any comments! Sebastian

pcatools R • 242 views
ADD COMMENTlink modified 6 months ago • written 6 months ago by Sebastian Hesse190
2

Can you explain a bit more what the problem is? From what I know selectLab must be a subset of pca()$yvars so most commonly the column names of the matrix that was used for pca().

If you use the example data of ?biplot then a possible choice could be biplot(p, selectLab = c("sample28")).

With Kevin Blighe you have the expert (author of the tool) here at biostars.

ADD REPLYlink modified 6 months ago • written 6 months ago by ATpoint41k

Hey, indeed, it should just be a character vector of samples that you want to label. If you do not define the lab variable, then the default labels are:

lab = rownames(pcaobj$metadata)

So, if you use selectLab, be careful about what you are passing to lab, too.

ADD REPLYlink written 6 months ago by Kevin Blighe67k

Thanks for your answers, unfortunately it doesn't work.

Maybe I am wrong to assume that I can use the option to label only some of the data-points in the PCA? If I hand it a character vector with just a few of the row names from my metadata it doesn't work though but instead disables all labels.

Is it correct that selectLab allows you to show labels only for a few, selected data points instead of labels for every point or am I just totally wrong with this?

Example of what im trying to do:

biplot(data_PCA, 
       colby = "date_processed", 
       legendPosition = "bottom", 
       lab = data_PCA$metadata$genotype, 
       selectLab = c("s1020", "s1031"),
       pointSize = 6,
       title = "PCA title",
       caption = 'There is a clear seperation by date_processed')
ADD REPLYlink modified 6 months ago • written 6 months ago by Sebastian Hesse190
1

It definitely works, but is designed specifically for sample IDs, which should be unique.

Using the data from the vignette:

p1 <- biplot(p)
p2 <- biplot(p, selectLab = c('GSM65776','GSM65779','GSM65781'))
p3 <- biplot(p, lab = rownames(p$metadata), selectLab = c('GSM65776','GSM65779','GSM65781'))
cowplot::plot_grid(p1, p2, p3, ncol = 3)

ghfghg

For 'grouped' variable names, as is perhaps your genotype data, the way to go would be via colby or shape.

ADD REPLYlink modified 6 months ago • written 6 months ago by Kevin Blighe67k

Ah, ok. This explains it! It seems that selectLab is expecting the exact labels to include, not the corresponding row names / samples.

If I give selectLab a character vector with all metadata chars to include it works. Actually it needs a little workaround still with as.character(metadata) as otherwise it prints factor levels.

biplot(data_PCA, 
       colby = "date_processed", 
       legendPosition = "bottom", 
       lab = as.character(data_PCA$metadata$genotype), 
       selectLab = c("ELANE", "HAX1"),
       pointSize = 6,
       title = "PCA title",
       caption = 'There is a clear seperation by date_processed')

This works fine now. Thanks a lot for your help! (will put this into an answer to the question below)

ADD REPLYlink modified 6 months ago • written 6 months ago by Sebastian Hesse190

What is the content of data_PCA$metadata$genotype?

ADD REPLYlink written 6 months ago by ATpoint41k

data_PCA$metadata$genotype contains a character vector that I use as a label. If I use it without the selectLab option all runs fine but labels are shown for each data point which is a bit overwhelming. Thats why im trying to limit the label just to a few selected points.

ADD REPLYlink written 6 months ago by Sebastian Hesse190
3
gravatar for Sebastian Hesse
6 months ago by
Germany / Munich / Dr. von Hauner Children's Hospital
Sebastian Hesse190 wrote:

ATpoint and Kevin Blighe brought the solution!

selectLab expects a character vector with the exact labels we are handing to lab, not the corresponding row names. If your metadata are factors, labs need as.character(), otherwise it prints the factor levels only.

biplot(data_PCA, 
       colby = "date_processed", 
       legendPosition = "bottom", 
       lab = as.character(data_PCA$metadata$genotype), 
       selectLab = c("Genotype1", "Genotype2"), #select here from $metadata$genotype!
       pointSize = 6,
       title = "PCA title",
       caption = 'There is a clear seperation by date_processed')

Thanks a bunch guys, without you I couldn't have solved this!

ADD COMMENTlink modified 6 months ago • written 6 months ago by Sebastian Hesse190

Okay, great that it now works. Yes, the 'battle' between characters and factors is ongoing, and causes quite a few problems for developers. There are minor 'usability' issues with PCAtools that I am hoping to improve over time, but it functions fine and is well tested.

ADD REPLYlink modified 6 months ago • written 6 months ago by Kevin Blighe67k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1961 users visited in the last hour