Question: Where To Download Pam50 Gene Set?
13
gravatar for user
4.3 years ago by
user770
United States
user770 wrote:

there's a widely used gene set PAM50 of 50 genes used to classify breast cancer subtypes, introduced in this paper:

Parker et. al. Supervised Risk Predictor of Breast Cancer Based on Intrinsic Subtypes http://jco.ascopubs.org/content/27/8/1160.abstract

where can the actual listing of 50 genes obtained through their analysis be found? I have not seen it as a supplementary table in any of the papers. The only place I saw the genes named is in the first figure of this paper: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3487945/figure/F1/ but I was hoping to find a more parse-able downloadable format instead of retyping gene symbols from the figure.

edit: I just typed it out from the image, which is primitive and error prone (and using gene symbols to identify genes is imprecise) but there it is in case others find it helpful

UBE2T
BIRC5
NUF2
CDC6
CCNB1
TYMS
MYBL2
CEP55
MELK
NDC80
RRM2
UBE2C
CENPF
PTTG1
EXO1
ORC6L
ANLN
CCNE1
CDC20
MKI67
KIF2C
ACTR3B
MYC
EGFR
KRT5
PHGDH
CDH3
MIA
KRT17
FOXC1
SFRP1
KRT14
ESR1
SLC39A6
BAG1
MAPT
PGR
CXXC5
MLPH
BCL2
MDM2
NAT1
FOXA1
BLVRA
MMP11
GPR160
FGFR4
GRB7
TMEM45B
ERBB2
ADD COMMENTlink modified 3.8 years ago by joelsparker1170 • written 4.3 years ago by user770
1

Have you tried writing to the corresponding author of Parker, et al?

ADD REPLYlink written 4.3 years ago by Alex Paciorkowski3.3k

no because I figured that pam50 is so widely cited and used that I must be missing something obvious and it's out there in parseable format -- else how are other people using it? the paper has over a thousand citations!

ADD REPLYlink written 4.3 years ago by user770
1

It's worth noting that: Most Random Gene Expression Signatures Are Significantly Associated with Breast Cancer Outcome.

ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by Neilfws47k

I don't see the relevance. PAM50 is used for subtyping tumors, primarily, not for predicting outcomes.I don't think most gene expression signatures can subtype tumors correctly at all.

ADD REPLYlink written 4.3 years ago by user770

The relevance is that you need to be cautious and critical of claims that use gene signatures in classification (outcomes, subtypes, whatever).

ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by Neilfws47k

fair enough but I don't see any evidence that this randomness results holds for subtypes. in breast cancer, the subtypes have biological meaning and it's unclear why random gene signatures would recapitulate that.

ADD REPLYlink written 4.3 years ago by user770

If you are happy with their methods and results than that is what matters. Biological meaning is indeed important and something missing from a large number of published classifiers.

ADD REPLYlink written 4.3 years ago by Neilfws47k

I agree with this; however, the point is that using survival as a means of validating stratification is based on the assumption that each molecular subtype of cancer has significantly different survival time distributions. It is not valid in the case where cancer subtypes have similar distributions of survival times.

ADD REPLYlink written 3.1 years ago by michael.sharpnack0
16
gravatar for joelsparker1
3.8 years ago by
joelsparker1170
joelsparker1170 wrote:

My apologies that these were difficult to find, but the information is out there in a usable form. The centroids, gene lists, and R code to produce the classification are all available along with the clinical information for the training set on this page: https://genome.unc.edu/pubsup/breastGEO/

Specifically, the R code and supporting data files are here: https://genome.unc.edu/pubsup/breastGEO/PAM50.zip

And the centroids alone are here: https://genome.unc.edu/pubsup/breastGEO/pam50_centroids.txt

In addition, this document provides additional information regarding classification of the PAM50 plus Claudin-low calls https://genome.unc.edu/pubsup/breastGEO/Guide%20to%20Intrinsic%20Subtyping%209-6-10.pdf

Anyone running PAM50 (or any classifier based on relative measurements such as expression) should understand the concepts in this paper: http://www.breast-cancer-research.com/content/pdf/s13058-015-0520-4.pdf

 

ADD COMMENTlink modified 22 months ago • written 3.8 years ago by joelsparker1170
6
gravatar for arno.guille
4.3 years ago by
arno.guille390
France
arno.guille390 wrote:

You can download PAM50 gene set, Sorlie500 gene set and Hu306 gene set from the sup data of this paper. Breast cancer molecular profiling with single sample predictors: a retrospective analysis. http://www.ncbi.nlm.nih.gov/pubmed/20181526 Or with the genefu Package from Bioconductor http://www.bioconductor.org/packages/2.12/bioc/manuals/genefu/man/genefu.pdf Hope this helps

ADD COMMENTlink modified 4.3 years ago • written 4.3 years ago by arno.guille390

I suppose R code in a PDF behind a paywall is a little better than a PNG :) The Bioconductor link is good, though.

ADD REPLYlink written 4.3 years ago by Neilfws47k
4
gravatar for sachbioinfo
3.8 years ago by
sachbioinfo30
India
sachbioinfo30 wrote:

Hello, List of PAM50 genes: Gene symbol: ACTR3B, ANLN, BAG1, BCL2, BIRC5, BLVRA, CCNB1, CCNE1, CDC20, CDC6, CDH3, CENPF, CEP55, CXXC5, EGFR, ERBB2, ESR1, EXO1, FGFR4, FOXA1, FOXC1, GPR160, GRB7, KIF2C, KRT14, KRT17, KRT5, MAPT, MDM2, MELK, MIA, MKI67, MLPH, MMP11, MYBL2, MYC, NAT1, NDC80, NUF2, ORC6L, PGR, PHGDH, PTTG1, RRM2, SFRP1, SLC39A6, TMEM45B, TYMS, UBE2C, UBE2T

There corresponding Acc no

accession no: AB209174, NM_018685, NM_004323, NM_000633, NM_001012271, BX647539, NM_031966, BC035498, BG256659, NM_001254, NM_001793, NM_016343, NM_018131, BC006428, NM_005228, NM_001005862, NM_001122742, NM_130398, AB209631, NM_004496, NM_001453, AJ249248, NM_005310, NM_006845, BC042437, AK095281, M21389, NM_001123066, M92424, NM_014791, BG765502, NM_002417, NM_024101, NM_005940, BX647151, NM_002467, BC013732, NM_006101, NM_145697, NM_014321, NM_000926, AK093306, BE904476, AK123010, BC036503, NM_012319, AK098106, BQ056428, BC032677, BF690859

ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by sachbioinfo30
2
gravatar for Neilfws
4.3 years ago by
Neilfws47k
Sydney, Australia
Neilfws47k wrote:

Much "Googling" leads me to the same conclusion as you: this list of genes is not readily available as a list in plain text format. The best I could find was Figure A2 in this reference and this bookmark in the Cancer Genome Browser. There might be a way to download the list using the latter resource.

This is rather typical for cancer research, where results are often commercialized and/or patented, so it's in the interests of researchers to hide and obfuscate the raw data. They might also have chosen a name for the classifier that doesn't sound like a scoring matrix for sequence alignment!

ADD COMMENTlink written 4.3 years ago by Neilfws47k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 747 users visited in the last hour