Question

PCAplot to investigate PC2 loadings using DESeq data

0

Entering edit mode

3.1 years ago

n.tear ▴ 80

Hi

Im am attempting to understand which genes are contributing the most to PC2. As you can see from the PCA plot from the DESeq2 plotPCA() function below the triangle samples appear to be seperated on PC2. (They are all the same disease)

enter image description here

My main questions is what do I need to start working with PCATools pca() function?

Is using the rlog data from DESeq2 approriate as below?

dds.sm <- DESeqDataSet(gse, design = ~ batch + diagnosis)

dds.sm <- estimateSizeFactors(dds.sm)

rld.sm <- rlog(dds.sm, blind = FALSE)

rld.sm.output <- assay(rld.sm)

pca.project <- pca(rld.sm.output)


plotloadings(pca.project,
    rangeRetain = 0.01,
    labSize = 3.0,
    shapeSizeRange = c(3, 3),
    title = 'Loadings plot',
    subtitle = 'PC1, PC2',
    caption = 'Top 1% variables',
    shape = 24,
    col = c('limegreen', 'black', 'red3'),
    drawConnectors = TRUE)

And how can I add the gene symbols?

enter image description here

Thanks for any help!

Nathan

PCATools • 1.8k views

ADD COMMENT • link 3.1 years ago by n.tear ▴ 80

score 1 · Answer 1 · 2021-03-29

1

Entering edit mode

3.1 years ago

Kevin Blighe 87k

Hi,

Thanks for using PCAtools - I am the developer. You will have to convert these Ensembl gene IDs to HGNC symbols before running pca(). So, basically, change the rownames of rld.sm.output.

You can do this conversion by following this: Answer: Translating gene names to entrez id's

So, in your case, you would use:

require(org.Hs.eg.db)
mapping <- mapIds(
  org.Hs.eg.db,
  keys = rownames(rld.sm.output),
  column = 'SYMBOL',
  keytype = 'ENSEMBL')

rownames(rld.sm.output) <- make.unique(mapping)

Kevin

ADD COMMENT • link 3.1 years ago by Kevin Blighe 87k

0

Entering edit mode

Thank you, Kevin.

when I run:

rownames(rld.sm.output) <- make.unique(mapping)

I get

Error in.rowNamesDF<-(x, value = value) : missing values in 'row.names' are not allowed

perhaps this is a case of miss mapping? Is there an easy way to remove these rows? if that would be the appropriate thing to do?

Best wishes,

Nathan

ADD REPLY • link 3.1 years ago by n.tear ▴ 80

0

Entering edit mode

hmmm, if you print mapping to terminal, what does in actually show? how about table(is.na(mapping))

ADD REPLY • link 3.1 years ago by Kevin Blighe 87k

0

Entering edit mode

table(is.na(mapping))

FALSE  TRUE 
34859 25374

seems there is na values

ADD REPLY • link 3.1 years ago by n.tear ▴ 80

0

Entering edit mode

...'not good'! Maybe do:

make.unique(ifelse(is.na(mapping), rownames(rld.sm.output), mapping))

, and assign this to the rownames.

This means that, if NA, the original ENSG ID will be used; otherwise, the gene symbol will be used.

ADD REPLY • link 3.1 years ago by Kevin Blighe 87k

1

Entering edit mode

Thank you Kevin, this worked.

ADD REPLY • link 3.1 years ago by n.tear ▴ 80

0

Entering edit mode

which(is.na(rownames(rld.sm.output)))

returns

integer(0)

ADD REPLY • link 3.1 years ago by n.tear ▴ 80