PCAplot to investigate PC2 loadings using DESeq data
1
0
Entering edit mode
3.1 years ago
n.tear ▴ 80

Hi

Im am attempting to understand which genes are contributing the most to PC2. As you can see from the PCA plot from the DESeq2 plotPCA() function below the triangle samples appear to be seperated on PC2. (They are all the same disease)

enter image description here

My main questions is what do I need to start working with PCATools pca() function?

Is using the rlog data from DESeq2 approriate as below?

dds.sm <- DESeqDataSet(gse, design = ~ batch + diagnosis)

dds.sm <- estimateSizeFactors(dds.sm)

rld.sm <- rlog(dds.sm, blind = FALSE)

rld.sm.output <- assay(rld.sm)

pca.project <- pca(rld.sm.output)


plotloadings(pca.project,
    rangeRetain = 0.01,
    labSize = 3.0,
    shapeSizeRange = c(3, 3),
    title = 'Loadings plot',
    subtitle = 'PC1, PC2',
    caption = 'Top 1% variables',
    shape = 24,
    col = c('limegreen', 'black', 'red3'),
    drawConnectors = TRUE)

And how can I add the gene symbols?

enter image description here

Thanks for any help!

Nathan

PCATools • 1.8k views
ADD COMMENT
1
Entering edit mode
3.1 years ago

Hi,

Thanks for using PCAtools - I am the developer. You will have to convert these Ensembl gene IDs to HGNC symbols before running pca(). So, basically, change the rownames of rld.sm.output.

You can do this conversion by following this: Answer: Translating gene names to entrez id's

So, in your case, you would use:

require(org.Hs.eg.db)
mapping <- mapIds(
  org.Hs.eg.db,
  keys = rownames(rld.sm.output),
  column = 'SYMBOL',
  keytype = 'ENSEMBL')

rownames(rld.sm.output) <- make.unique(mapping)

Kevin

ADD COMMENT
0
Entering edit mode

Thank you, Kevin.

when I run:

rownames(rld.sm.output) <- make.unique(mapping)

I get

Error in.rowNamesDF<-(x, value = value) : missing values in 'row.names' are not allowed

perhaps this is a case of miss mapping? Is there an easy way to remove these rows? if that would be the appropriate thing to do?

Best wishes,

Nathan

ADD REPLY
0
Entering edit mode

hmmm, if you print mapping to terminal, what does in actually show? how about table(is.na(mapping))

ADD REPLY
0
Entering edit mode
table(is.na(mapping))

FALSE  TRUE 
34859 25374

seems there is na values

ADD REPLY
0
Entering edit mode

...'not good'! Maybe do:

make.unique(ifelse(is.na(mapping), rownames(rld.sm.output), mapping))

, and assign this to the rownames.

This means that, if NA, the original ENSG ID will be used; otherwise, the gene symbol will be used.

ADD REPLY
1
Entering edit mode

Thank you Kevin, this worked.

ADD REPLY
0
Entering edit mode
which(is.na(rownames(rld.sm.output)))

returns

integer(0)
ADD REPLY

Login before adding your answer.

Traffic: 1801 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6