Question: RNAseq and PAM50 prediction
gravatar for graeme.thorn
4 months ago by
London, United Kingdom
graeme.thorn50 wrote:

I've a set of RNAseq data from breast cancer tissue samples (counts and post-cqn-normalised log(RPKM) values) and wish to use the PAM50 classifier to classify them.

I've seen the question genefu for PAM50 prediction and the question RNAseq data and PAM50 method, and neither are particularly helpful in terms of what I need to input into the R/genefu predictor (using intrinsic.cluster.predict) to get consistent PAM50 classification. I only have 138 samples, so I'm not going to be able to train the classifier before running it on the remaining samples.

Is there anywhere with a workflow from RNAseq counts to PAM50 types or can someone provide details as to how to go about this?

rna-seq R genefu • 134 views
ADD COMMENTlink modified 4 months ago by Kevin Blighe65k • written 4 months ago by graeme.thorn50
gravatar for Kevin Blighe
4 months ago by
Kevin Blighe65k
Kevin Blighe65k wrote:

Hey Graeme,

I do not believe log(RPKMs) are ideal for this. If that is all that you have, then no problem, though.

I am convinced that a handful or even more of those PAM50 genes are not adding much information in terms of risk of metastasis in ER-positive, Her2-negative breast tumours. I neither believe there is any workflow for you to follow in relation to this, but you should have knowledge of regression and classification models. I gave a previous answer, here: How to exclude some of breast cancer subtypes just by looking at gene expression?

I would be interested in different approaches:

  • RandomForest¬ģ
  • Penalised regression (my previous answer)
  • Stepwise regression and / or just include all genes in the same regression model

To use any of these models to full effect, you would have to build it on known cases where metastasis occurred / did not occur, and then predict it on unknown cases.


ADD COMMENTlink modified 4 months ago • written 4 months ago by Kevin Blighe65k

Thanks Kevin, but this is work in collaboration with a commercial company who will be running PAM50 on the non-deduplicated data (the sequencing included UMIs, which we are taking into account, and they aren't), so I was looking for the most robust way of running PAM50 on the data so we can do a direct comparison.

ADD REPLYlink written 4 months ago by graeme.thorn50
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 730 users visited in the last hour