Is the PAM50 subtype available for TCGA BRCA data?
2
1
Entering edit mode
6.9 years ago
nashtf ▴ 20

I have some UNC Illumina RNAseqV2 data with about 100 genes, 800 patients with UNC ID. I'd like to find the subtype of each tumor (normal, luminal A, luminal B, basal, HER2) for a classifier. Preferably with the UNC ID but if TCGA barcode is provided I believe it's possible to match them up. I can't find it on TCGA website. Maybe just looking in wrong places.

breast-cancer tcga rna-seq • 8.6k views
4
Entering edit mode
6.4 years ago
hAjmal ▴ 50

You can use TCGAbiolinks to retrieve the list

source("http://www.bioconductor.org/biocLite.R")
cancer <- "BRCA"
PlatformCancer <- "IlluminaHiSeq_RNASeqV2"
dataType <- "rsem.genes.results"

datQuery <- TCGAquery(tumor = cancer, platform = PlatformCancer, level = "3")
lsSample <- TCGAquery_samplesfilter(query = datQuery)

# get subtype information
dataSubt <- TCGAquery_subtype(tumor = cancer)
lumA <- dataSubt[which(dataSubt$PAM50.mRNA == "Luminal A"),1] allSamples <- lsSample$IlluminaHiSeq_RNASeqV2 #1218 total samples
lumASamples <- allSamples[grep(x = allSamples, pattern = paste(lumA, collapse = "|"))] # 263 luminal samples found

2
Entering edit mode
6.9 years ago

You'll find a list derived from microrarrays (Nature 2012 release) at

https://tcga-data.nci.nih.gov/docs/publications/brca_2012/

specifically

http://tcga-data.nci.nih.gov/docs/publications/brca_2012/BRCA.547.PAM50.SigClust.Subtypes.txt

It appears that there is no canonical PAM50 call set for the RNAseq version, leaving everyone to make their own calls (using the genefu package or some other means) and getting somewhat different results for the edge case tumors.

0
Entering edit mode

Should that mean that RNA seq data PAM50 is not a stable test?

0
Entering edit mode

@kanwarjag: I assume your answer is meant as a comment to my answer above; if so, please leave it as a comment next time rather than posting an answer to the question. What I mean by "somewhat different results for edge case tumors" is that, in practice, when you use genefu to assign PAM50 classes, the assignments are contingent on the centroid values used for the individual subtypes. For most tumors the assignment will be robust to small variations of these values, but in my experience there are about 10% of tumors that are sensitive to small variations in these parameters, and it's hard to get agreement on these tumors. This is not in any way related to RNAseq or the choice of technology; it's a function of how the test is performed.