Question: Is the PAM50 subtype available for TCGA BRCA data?
gravatar for nashtf
4.9 years ago by
United States
nashtf10 wrote:

I have some UNC Illumina RNAseqV2 data with about 100 genes, 800 patients with UNC ID. I'd like to find the subtype of each tumor (normal, luminal A, luminal B, basal, HER2) for a classifier. Preferably with the UNC ID but if TCGA barcode is provided I believe it's possible to match them up. I can't find it on TCGA website. Maybe just looking in wrong places.

rna-seq breast cancer tcga • 6.3k views
ADD COMMENTlink modified 4.5 years ago by hAjmal40 • written 4.9 years ago by nashtf10
gravatar for hAjmal
4.5 years ago by
hAjmal40 wrote:

You can use TCGAbiolinks to retrieve the list

cancer <- "BRCA"
PlatformCancer <- "IlluminaHiSeq_RNASeqV2"
dataType <- "rsem.genes.results"
pathCancer <- "TCGAData/miRNA"

datQuery <- TCGAquery(tumor = cancer, platform = PlatformCancer, level = "3")  
lsSample <- TCGAquery_samplesfilter(query = datQuery)

# get subtype information
dataSubt <- TCGAquery_subtype(tumor = cancer)
lumA <- dataSubt[which(dataSubt$PAM50.mRNA == "Luminal A"),1]
allSamples <- lsSample$IlluminaHiSeq_RNASeqV2 #1218 total samples
lumASamples <- allSamples[grep(x = allSamples, pattern = paste(lumA, collapse = "|"))] # 263 luminal samples found
ADD COMMENTlink written 4.5 years ago by hAjmal40
gravatar for David Quigley
4.9 years ago by
David Quigley11k
San Francisco
David Quigley11k wrote:

You'll find a list derived from microrarrays (Nature 2012 release) at


It appears that there is no canonical PAM50 call set for the RNAseq version, leaving everyone to make their own calls (using the genefu package or some other means) and getting somewhat different results for the edge case tumors.

ADD COMMENTlink written 4.9 years ago by David Quigley11k
gravatar for kanwarjag
4.9 years ago by
United States
kanwarjag1.1k wrote:

Should that mean that RNA seq data PAM50 is not a stable test?

ADD COMMENTlink modified 10 months ago by RamRS30k • written 4.9 years ago by kanwarjag1.1k

@kanwarjag: I assume your answer is meant as a comment to my answer above; if so, please leave it as a comment next time rather than posting an answer to the question. What I mean by "somewhat different results for edge case tumors" is that, in practice, when you use genefu to assign PAM50 classes, the assignments are contingent on the centroid values used for the individual subtypes. For most tumors the assignment will be robust to small variations of these values, but in my experience there are about 10% of tumors that are sensitive to small variations in these parameters, and it's hard to get agreement on these tumors. This is not in any way related to RNAseq or the choice of technology; it's a function of how the test is performed.

ADD REPLYlink modified 10 months ago by RamRS30k • written 4.9 years ago by David Quigley11k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1192 users visited in the last hour