Question: Is the PAM50 subtype available for TCGA BRCA data?
0
gravatar for nashtf
3.8 years ago by
nashtf10
United States
nashtf10 wrote:

I have some UNC Illumina RNAseqV2 data with about 100 genes, 800 patients with UNC ID. I'd like to find the subtype of each tumor (normal, luminal A, luminal B, basal, HER2) for a classifier. Preferably with the UNC ID but if TCGA barcode is provided I believe it's possible to match them up. I can't find it on TCGA website. Maybe just looking in wrong places.

rna-seq breast cancer tcga • 5.1k views
ADD COMMENTlink modified 3.4 years ago by hAjmal30 • written 3.8 years ago by nashtf10
2
gravatar for David Quigley
3.8 years ago by
David Quigley11k
San Francisco
David Quigley11k wrote:

You'll find a list derived from microrarrays (Nature 2012 release) at

https://tcga-data.nci.nih.gov/docs/publications/brca_2012/

specifically

http://tcga-data.nci.nih.gov/docs/publications/brca_2012/BRCA.547.PAM50.SigClust.Subtypes.txt

It appears that there is no canonical PAM50 call set for the RNAseq version, leaving everyone to make their own calls (using the genefu package or some other means) and getting somewhat different results for the edge case tumors.

ADD COMMENTlink written 3.8 years ago by David Quigley11k
2
gravatar for hAjmal
3.4 years ago by
hAjmal30
hAjmal30 wrote:

You can use TCGAbiolinks to retrieve the list

source("http://www.bioconductor.org/biocLite.R")
library(TCGAbiolinks)
cancer <- "BRCA"
PlatformCancer <- "IlluminaHiSeq_RNASeqV2"
dataType <- "rsem.genes.results"
pathCancer <- "TCGAData/miRNA"

datQuery <- TCGAquery(tumor = cancer, platform = PlatformCancer, level = "3")  
lsSample <- TCGAquery_samplesfilter(query = datQuery)

# get subtype information
dataSubt <- TCGAquery_subtype(tumor = cancer)
lumA <- dataSubt[which(dataSubt$PAM50.mRNA == "Luminal A"),1]
allSamples <- lsSample$IlluminaHiSeq_RNASeqV2 #1218 total samples
lumASamples <- allSamples[grep(x = allSamples, pattern = paste(lumA, collapse = "|"))] # 263 luminal samples found
ADD COMMENTlink written 3.4 years ago by hAjmal30
0
gravatar for kanwarjag
3.8 years ago by
kanwarjag990
United States
kanwarjag990 wrote:

Should that mean that RNA seq data PAM50 is not a stable test?

 

ADD COMMENTlink written 3.8 years ago by kanwarjag990

@kanwarjag: I assume your answer is meant as a comment to my answer above; if so, please leave it as a comment next time rather than posting an answer to the question. What I mean by "somewhat different results for edge case tumors" is that, in practice, when you use genefu to assign PAM50 classes, the assignments are contingent on the centroid values used for the individual subtypes. For most tumors the assignment will be robust to small variations of these values, but in my experience there are about 10% of tumors that are sensitive to small variations in these parameters, and it's hard to get agreement on these tumors. This is not in any way related to RNAseq or the choice of technology; it's a function of how the test is performed.
 

ADD REPLYlink written 3.8 years ago by David Quigley11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1281 users visited in the last hour