Question: Differential expression analysis error - TCGAbiolinks
gravatar for Misty
10 months ago by
Misty0 wrote:

Hi Everyone,

I am trying to analyze TCGA-BRCA (harmonized) data using TCGAbiolinks and I am stuck at the DEA stage where I get the following error:

Error in dataFilt.brca[, dataFiltNT] : subscript out of bounds

I can't seem to pinpoint where and what is giving me this error. I have attached my code which I put together trying to teach myself using tutorials. What I want to do is: compare expression of my gene of interest (SOX10) between different PAM50 subtypes. The thing is, when I do DE on normal versus tumor, it shows that SOX10 is downregulated. However, when I try to visualize SOX10 expression on UCSC (even cBioportal) I see that it is upregulated in basal tumors compared to normal (or normal-like, I'm so confused?!!). I specifically want to show that even though SOX10 is downregulated in pooled subtypes when compared to normal, it is upregulated if only basal subtype is considered. Also, I tried doing multiple constrasts using the contrast.matrix function but it gave me a whole lot of errors so I gave up.

Is there a simpler way to do this? I really need to show stats on a nice boxplot of Log2FC versus PAM50 subtype! I also want to further stratify each Pam50 into Her2+ and Her2-

Please help, and thank you!!



Cancer <- "TCGA-BRCA"
query_BRCA <- GDCquery(project = Cancer,
                       data.category = "Transcriptome Profiling",
                       data.type = "Gene Expression Quantification", 
                       workflow.type = "HTSeq - Counts")

samples_BRCA <- getResults(query_BRCA,cols=c("cases"))

queryDown_brca <- GDCquery(project = Cancer, 
                            data.category = "Transcriptome Profiling",
                            data.type = "Gene Expression Quantification", 
                            workflow.type = "HTSeq - Counts", 
                            barcode = samples_BRCA)

GDCdownload(query = queryDown_brca, method = "api", files.per.chunk = 50)

dataPrep1.brca <- GDCprepare(query = queryDown_brca, 
                             save = TRUE, 
                             save.filename = "TCGA_BRCA_HTSeq_Counts.rda")

dataPrep.brca <- TCGAanalyze_Preprocessing(object = dataPrep1.brca, 
                                           cor.cut = 0.6,
                                           datatype = "HTSeq - Counts")

dataNorm.brca <- TCGAanalyze_Normalization(tabDF = dataPrep.brca,
                                           geneInfo = geneInfoHT,
                                           method = "gcContent")

dataNorm.brca <- TCGAanalyze_Normalization(tabDF = dataNorm.brca,
                                           geneInfo = geneInfoHT,
                                           method = "geneLength")

dataFilt.brca <- TCGAanalyze_Filtering(tabDF = dataNorm.brca,
                                       method = "quantile", 
                                       qnt.cut =  0.25)


### Prepping for DEA ###

sampleTP <- TCGAquery_SampleTypes(barcode = colnames(dataFilt.brca), typesample = "TP")
sampleNT <- TCGAquery_SampleTypes(barcode = colnames(dataFilt.brca), typesample = "NT")
dataFiltTP <- dataFilt.brca[,sampleTP]
dataFiltNT <- dataFilt.brca[,sampleNT]

dataSubt_BRCA <-TCGAquery_subtype(tumor = "BRCA")
samplePam50_BRCA.LumA <- dataSubt_BRCA[dataSubt_BRCA$BRCA_Subtype_PAM50 %in% "LumA",]


dataFiltTP.LumA <- dataFiltTP[,substr(colnames(dataFiltTP),1,12) %in% samplePam50_BRCA.LumA$patient]


DEG_LumA_NT <- TCGAanalyze_DEA(mat1 = dataFilt.brca[,dataFiltNT],
                            mat2 = dataFilt.brca[,dataFiltTP.LumA],
                            Cond1type = "Normal",
                            Cond2type = "Tumor",
                            fdr.cut = 0.05 ,
                            logFC.cut = 1,
                            method = "glmLRT", ClinicalDF = data.frame())

Error in dataFilt.brca[, dataFiltNT] : subscript out of bounds

rna-seq brca tcga tcgabiolinks • 360 views
ADD COMMENTlink modified 10 months ago by Biostar ♦♦ 20 • written 10 months ago by Misty0

There are issues, generally, with the functionality of TCGAanalyze_DEA(). It was first elaborated here: Matched paired DEA analysis of BRCA samples using TCGAbiolinks, works for paired = FALSE but not paired = TRUE

I contacted the main developer, who then forwarded it to another developer responsible for that part of the package, and nobody ever got back to me. I then raised a ticket (issue) on GitHub about it but none of the developers responded.

If I may suggest that you obtain the data directly from the GDC and process it from there.

ADD REPLYlink written 10 months ago by Kevin Blighe56k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1350 users visited in the last hour