4
13
Entering edit mode
8.2 years ago
user ▴ 870

there's a widely used gene set PAM50 of 50 genes used to classify breast cancer subtypes, introduced in this paper:

Parker et. al. Supervised Risk Predictor of Breast Cancer Based on Intrinsic Subtypes http://jco.ascopubs.org/content/27/8/1160.abstract

where can the actual listing of 50 genes obtained through their analysis be found? I have not seen it as a supplementary table in any of the papers. The only place I saw the genes named is in the first figure of this paper: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3487945/figure/F1/ but I was hoping to find a more parse-able downloadable format instead of retyping gene symbols from the figure.

edit: I just typed it out from the image, which is primitive and error prone (and using gene symbols to identify genes is imprecise) but there it is in case others find it helpful

UBE2T
BIRC5
NUF2
CDC6
CCNB1
TYMS
MYBL2
CEP55
MELK
NDC80
RRM2
UBE2C
CENPF
PTTG1
EXO1
ORC6L
ANLN
CCNE1
CDC20
MKI67
KIF2C
ACTR3B
MYC
EGFR
KRT5
PHGDH
CDH3
MIA
KRT17
FOXC1
SFRP1
KRT14
ESR1
SLC39A6
BAG1
MAPT
PGR
CXXC5
MLPH
BCL2
MDM2
NAT1
FOXA1
BLVRA
MMP11
GPR160
FGFR4
GRB7
TMEM45B
ERBB2

cancer bioinformatics annotation classification • 24k views
1
Entering edit mode

Have you tried writing to the corresponding author of Parker, et al?

0
Entering edit mode

no because I figured that pam50 is so widely cited and used that I must be missing something obvious and it's out there in parseable format -- else how are other people using it? the paper has over a thousand citations!

0
Entering edit mode

I don't see the relevance. PAM50 is used for subtyping tumors, primarily, not for predicting outcomes.I don't think most gene expression signatures can subtype tumors correctly at all.

0
Entering edit mode

The relevance is that you need to be cautious and critical of claims that use gene signatures in classification (outcomes, subtypes, whatever).

0
Entering edit mode

fair enough but I don't see any evidence that this randomness results holds for subtypes. in breast cancer, the subtypes have biological meaning and it's unclear why random gene signatures would recapitulate that.

0
Entering edit mode

If you are happy with their methods and results than that is what matters. Biological meaning is indeed important and something missing from a large number of published classifiers.

0
Entering edit mode

I agree with this; however, the point is that using survival as a means of validating stratification is based on the assumption that each molecular subtype of cancer has significantly different survival time distributions. It is not valid in the case where cancer subtypes have similar distributions of survival times.

17
Entering edit mode
7.6 years ago
joelsparker1 ▴ 180

My apologies that these were difficult to find, but the information is out there in a usable form. The centroids, gene lists, and R code to produce the classification are all available along with the clinical information for the training set on this page: https://genome.unc.edu/pubsup/breastGEO/

Specifically, the R code and supporting data files are here: https://genome.unc.edu/pubsup/breastGEO/PAM50.zip

And the centroids alone are here: https://genome.unc.edu/pubsup/breastGEO/pam50_centroids.txt

In addition, this document provides additional information regarding classification of the PAM50 plus Claudin-low calls: https://genome.unc.edu/pubsup/breastGEO/Guide%20to%20Intrinsic%20Subtyping%209-6-10.pdf

nyone running PAM50 (or any classifier based on relative measurements such as expression) should understand the concepts in this paper: http://www.breast-cancer-research.com/content/pdf/s13058-015-0520-4.pdf

0
Entering edit mode

It appears that the link to "classification of the PAM50 plus Claudin-low" is broken. Can anyone post the pdf link?

6
Entering edit mode
8.2 years ago
arno.guille ▴ 400

You can download PAM50 gene set, Sorlie500 gene set and Hu306 gene set from the sup data of this paper. Breast cancer molecular profiling with single sample predictors: a retrospective analysis. http://www.ncbi.nlm.nih.gov/pubmed/20181526 Or with the genefu Package from Bioconductor http://www.bioconductor.org/packages/2.12/bioc/manuals/genefu/man/genefu.pdf Hope this helps

0
Entering edit mode

I suppose R code in a PDF behind a paywall is a little better than a PNG :) The Bioconductor link is good, though.

4
Entering edit mode
7.7 years ago
sachbioinfo ▴ 30

Hello, List of PAM50 genes: Gene symbol: ACTR3B, ANLN, BAG1, BCL2, BIRC5, BLVRA, CCNB1, CCNE1, CDC20, CDC6, CDH3, CENPF, CEP55, CXXC5, EGFR, ERBB2, ESR1, EXO1, FGFR4, FOXA1, FOXC1, GPR160, GRB7, KIF2C, KRT14, KRT17, KRT5, MAPT, MDM2, MELK, MIA, MKI67, MLPH, MMP11, MYBL2, MYC, NAT1, NDC80, NUF2, ORC6L, PGR, PHGDH, PTTG1, RRM2, SFRP1, SLC39A6, TMEM45B, TYMS, UBE2C, UBE2T

There corresponding Acc no

accession no: AB209174, NM_018685, NM_004323, NM_000633, NM_001012271, BX647539, NM_031966, BC035498, BG256659, NM_001254, NM_001793, NM_016343, NM_018131, BC006428, NM_005228, NM_001005862, NM_001122742, NM_130398, AB209631, NM_004496, NM_001453, AJ249248, NM_005310, NM_006845, BC042437, AK095281, M21389, NM_001123066, M92424, NM_014791, BG765502, NM_002417, NM_024101, NM_005940, BX647151, NM_002467, BC013732, NM_006101, NM_145697, NM_014321, NM_000926, AK093306, BE904476, AK123010, BC036503, NM_012319, AK098106, BQ056428, BC032677, BF690859

2
Entering edit mode
8.2 years ago
Neilfws 49k

Much "Googling" leads me to the same conclusion as you: this list of genes is not readily available as a list in plain text format. The best I could find was Figure A2 in this reference and this bookmark in the Cancer Genome Browser. There might be a way to download the list using the latter resource.

This is rather typical for cancer research, where results are often commercialized and/or patented, so it's in the interests of researchers to hide and obfuscate the raw data. They might also have chosen a name for the classifier that doesn't sound like a scoring matrix for sequence alignment!