Question: Differential expression analysis error - TCGAbiolinks
gravatar for Misty
16 months ago by
Misty0 wrote:

Hi Everyone,

I am trying to analyze TCGA-BRCA (harmonized) data using TCGAbiolinks and I am stuck at the DEA stage where I get the following error:

Error in dataFilt.brca[, dataFiltNT] : subscript out of bounds

I can't seem to pinpoint where and what is giving me this error. I have attached my code which I put together trying to teach myself using tutorials. What I want to do is: compare expression of my gene of interest (SOX10) between different PAM50 subtypes. The thing is, when I do DE on normal versus tumor, it shows that SOX10 is downregulated. However, when I try to visualize SOX10 expression on UCSC (even cBioportal) I see that it is upregulated in basal tumors compared to normal (or normal-like, I'm so confused?!!). I specifically want to show that even though SOX10 is downregulated in pooled subtypes when compared to normal, it is upregulated if only basal subtype is considered. Also, I tried doing multiple constrasts using the contrast.matrix function but it gave me a whole lot of errors so I gave up.

Is there a simpler way to do this? I really need to show stats on a nice boxplot of Log2FC versus PAM50 subtype! I also want to further stratify each Pam50 into Her2+ and Her2-

Please help, and thank you!!



Cancer <- "TCGA-BRCA"
query_BRCA <- GDCquery(project = Cancer,
                       data.category = "Transcriptome Profiling",
                       data.type = "Gene Expression Quantification", 
                       workflow.type = "HTSeq - Counts")

samples_BRCA <- getResults(query_BRCA,cols=c("cases"))

queryDown_brca <- GDCquery(project = Cancer, 
                            data.category = "Transcriptome Profiling",
                            data.type = "Gene Expression Quantification", 
                            workflow.type = "HTSeq - Counts", 
                            barcode = samples_BRCA)

GDCdownload(query = queryDown_brca, method = "api", files.per.chunk = 50)

dataPrep1.brca <- GDCprepare(query = queryDown_brca, 
                             save = TRUE, 
                             save.filename = "TCGA_BRCA_HTSeq_Counts.rda")

dataPrep.brca <- TCGAanalyze_Preprocessing(object = dataPrep1.brca, 
                                           cor.cut = 0.6,
                                           datatype = "HTSeq - Counts")

dataNorm.brca <- TCGAanalyze_Normalization(tabDF = dataPrep.brca,
                                           geneInfo = geneInfoHT,
                                           method = "gcContent")

dataNorm.brca <- TCGAanalyze_Normalization(tabDF = dataNorm.brca,
                                           geneInfo = geneInfoHT,
                                           method = "geneLength")

dataFilt.brca <- TCGAanalyze_Filtering(tabDF = dataNorm.brca,
                                       method = "quantile", 
                                       qnt.cut =  0.25)


### Prepping for DEA ###

sampleTP <- TCGAquery_SampleTypes(barcode = colnames(dataFilt.brca), typesample = "TP")
sampleNT <- TCGAquery_SampleTypes(barcode = colnames(dataFilt.brca), typesample = "NT")
dataFiltTP <- dataFilt.brca[,sampleTP]
dataFiltNT <- dataFilt.brca[,sampleNT]

dataSubt_BRCA <-TCGAquery_subtype(tumor = "BRCA")
samplePam50_BRCA.LumA <- dataSubt_BRCA[dataSubt_BRCA$BRCA_Subtype_PAM50 %in% "LumA",]


dataFiltTP.LumA <- dataFiltTP[,substr(colnames(dataFiltTP),1,12) %in% samplePam50_BRCA.LumA$patient]


DEG_LumA_NT <- TCGAanalyze_DEA(mat1 = dataFilt.brca[,dataFiltNT],
                            mat2 = dataFilt.brca[,dataFiltTP.LumA],
                            Cond1type = "Normal",
                            Cond2type = "Tumor",
                            fdr.cut = 0.05 ,
                            logFC.cut = 1,
                            method = "glmLRT", ClinicalDF = data.frame())

Error in dataFilt.brca[, dataFiltNT] : subscript out of bounds

rna-seq brca tcga tcgabiolinks • 656 views
ADD COMMENTlink modified 15 months ago by Biostar ♦♦ 20 • written 16 months ago by Misty0

There are issues, generally, with the functionality of TCGAanalyze_DEA(). It was first elaborated here: Matched paired DEA analysis of BRCA samples using TCGAbiolinks, works for paired = FALSE but not paired = TRUE

I contacted the main developer, who then forwarded it to another developer responsible for that part of the package, and nobody ever got back to me. I then raised a ticket (issue) on GitHub about it but none of the developers responded.

If I may suggest that you obtain the data directly from the GDC and process it from there.

ADD REPLYlink written 15 months ago by Kevin Blighe65k

Hi Kevin, is there any update on this issue? i am also facing the same issue [Error in dataFilt[, samples] : subscript out of bounds)]. Thanks in advance.

ADD REPLYlink written 7 weeks ago by immunogirl220

Nope, there was never a response:

As I'm now a developer myself, I know that it can genuinely be difficult to find time to follow up to these issues. However, in the case of the 'paired' analysis issue, TCGAbiolinks is actually providing misleading results.

If you can, I would get the data from Xena Browser and then re-process it. If you're just interested in a single gene, you could also use cBioPortal (GUI)

ADD REPLYlink written 6 weeks ago by Kevin Blighe65k

Did you try the code in my other linked thread?

ADD REPLYlink written 6 weeks ago by Kevin Blighe65k

No, I did not try, yet. But, I'll try to give it a go. I was working on manually curated datasets (patient IDs divided based on clinical features). I tried codes on several different sets, sometime it works, sometimes it doesn't.

ADD REPLYlink written 6 weeks ago by immunogirl220
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1117 users visited in the last hour