Where To Download Pam50 Gene Set?
4
15
Entering edit mode
10.8 years ago
user ▴ 940

There's a widely used gene set PAM50 of 50 genes used to classify breast cancer subtypes, introduced in this paper:

Parker et. al. Supervised Risk Predictor of Breast Cancer Based on Intrinsic Subtypes http://jco.ascopubs.org/content/27/8/1160.abstract

where can the actual listing of 50 genes obtained through their analysis be found? I have not seen it as a supplementary table in any of the papers. The only place I saw the genes named is in the first figure of this paper: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3487945/figure/F1/ but I was hoping to find a more parse-able downloadable format instead of retyping gene symbols from the figure.

edit: I just typed it out from the image, which is primitive and error prone (and using gene symbols to identify genes is imprecise) but there it is in case others find it helpful

UBE2T
BIRC5
NUF2
CDC6
CCNB1
TYMS
MYBL2
CEP55
MELK
NDC80
RRM2
UBE2C
CENPF
PTTG1
EXO1
ORC6L
ANLN
CCNE1
CDC20
MKI67
KIF2C
ACTR3B
MYC
EGFR
KRT5
PHGDH
CDH3
MIA
KRT17
FOXC1
SFRP1
KRT14
ESR1
SLC39A6
BAG1
MAPT
PGR
CXXC5
MLPH
BCL2
MDM2
NAT1
FOXA1
BLVRA
MMP11
GPR160
FGFR4
GRB7
TMEM45B
ERBB2
cancer annotation classification • 30k views
ADD COMMENT
1
Entering edit mode

Have you tried writing to the corresponding author of Parker, et al?

ADD REPLY
0
Entering edit mode

no because I figured that pam50 is so widely cited and used that I must be missing something obvious and it's out there in parseable format -- else how are other people using it? the paper has over a thousand citations!

ADD REPLY
0
Entering edit mode

I don't see the relevance. PAM50 is used for subtyping tumors, primarily, not for predicting outcomes.I don't think most gene expression signatures can subtype tumors correctly at all.

ADD REPLY
0
Entering edit mode

The relevance is that you need to be cautious and critical of claims that use gene signatures in classification (outcomes, subtypes, whatever).

ADD REPLY
0
Entering edit mode

fair enough but I don't see any evidence that this randomness results holds for subtypes. in breast cancer, the subtypes have biological meaning and it's unclear why random gene signatures would recapitulate that.

ADD REPLY
0
Entering edit mode

If you are happy with their methods and results than that is what matters. Biological meaning is indeed important and something missing from a large number of published classifiers.

ADD REPLY
0
Entering edit mode

I agree with this; however, the point is that using survival as a means of validating stratification is based on the assumption that each molecular subtype of cancer has significantly different survival time distributions. It is not valid in the case where cancer subtypes have similar distributions of survival times.

ADD REPLY
20
Entering edit mode
10.2 years ago
joelsparker1 ▴ 200

My apologies that these were difficult to find, but the information is out there in a usable form. The centroids, gene lists, and R code to produce the classification are all available along with the clinical information for the training set on this page: https://genome.unc.edu/pubsup/breastGEO/

Specifically, the R code and supporting data files are here: https://genome.unc.edu/pubsup/breastGEO/PAM50.zip

And the centroids alone are here: https://genome.unc.edu/pubsup/breastGEO/pam50_centroids.txt

In addition, this document provides additional information regarding classification of the PAM50 plus Claudin-low calls: https://genome.unc.edu/pubsup/breastGEO/Guide%20to%20Intrinsic%20Subtyping%209-6-10.pdf

nyone running PAM50 (or any classifier based on relative measurements such as expression) should understand the concepts in this paper: http://www.breast-cancer-research.com/content/pdf/s13058-015-0520-4.pdf

ADD COMMENT
1
Entering edit mode

All the links given are asking to sent email to get the data

ADD REPLY
0
Entering edit mode

It appears that the link to "classification of the PAM50 plus Claudin-low" is broken. Can anyone post the pdf link?

ADD REPLY
9
Entering edit mode
10.2 years ago
sachbioinfo ▴ 80

Hello, List of PAM50 genes: Gene symbol: ACTR3B, ANLN, BAG1, BCL2, BIRC5, BLVRA, CCNB1, CCNE1, CDC20, CDC6, CDH3, CENPF, CEP55, CXXC5, EGFR, ERBB2, ESR1, EXO1, FGFR4, FOXA1, FOXC1, GPR160, GRB7, KIF2C, KRT14, KRT17, KRT5, MAPT, MDM2, MELK, MIA, MKI67, MLPH, MMP11, MYBL2, MYC, NAT1, NDC80, NUF2, ORC6L, PGR, PHGDH, PTTG1, RRM2, SFRP1, SLC39A6, TMEM45B, TYMS, UBE2C, UBE2T

There corresponding Acc no

accession no: AB209174, NM_018685, NM_004323, NM_000633, NM_001012271, BX647539, NM_031966, BC035498, BG256659, NM_001254, NM_001793, NM_016343, NM_018131, BC006428, NM_005228, NM_001005862, NM_001122742, NM_130398, AB209631, NM_004496, NM_001453, AJ249248, NM_005310, NM_006845, BC042437, AK095281, M21389, NM_001123066, M92424, NM_014791, BG765502, NM_002417, NM_024101, NM_005940, BX647151, NM_002467, BC013732, NM_006101, NM_145697, NM_014321, NM_000926, AK093306, BE904476, AK123010, BC036503, NM_012319, AK098106, BQ056428, BC032677, BF690859

ADD COMMENT
6
Entering edit mode
10.7 years ago
arno.guille ▴ 410

You can download PAM50 gene set, Sorlie500 gene set and Hu306 gene set from the sup data of this paper. Breast cancer molecular profiling with single sample predictors: a retrospective analysis. http://www.ncbi.nlm.nih.gov/pubmed/20181526 Or with the genefu Package from Bioconductor http://www.bioconductor.org/packages/2.12/bioc/manuals/genefu/man/genefu.pdf Hope this helps

ADD COMMENT
0
Entering edit mode

I suppose R code in a PDF behind a paywall is a little better than a PNG :) The Bioconductor link is good, though.

ADD REPLY
3
Entering edit mode
10.8 years ago
Neilfws 49k

Much "Googling" leads me to the same conclusion as you: this list of genes is not readily available as a list in plain text format. The best I could find was Figure A2 in this reference and this bookmark in the Cancer Genome Browser. There might be a way to download the list using the latter resource.

This is rather typical for cancer research, where results are often commercialized and/or patented, so it's in the interests of researchers to hide and obfuscate the raw data. They might also have chosen a name for the classifier that doesn't sound like a scoring matrix for sequence alignment!

ADD COMMENT

Login before adding your answer.

Traffic: 1935 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6