Question: Differential expression analysis error - TCGAbiolinks
0
gravatar for Misty
8 weeks ago by
Misty0
Canada
Misty0 wrote:

Hi Everyone,

I am trying to analyze TCGA-BRCA (harmonized) data using TCGAbiolinks and I am stuck at the DEA stage where I get the following error:

Error in dataFilt.brca[, dataFiltNT] : subscript out of bounds

I can't seem to pinpoint where and what is giving me this error. I have attached my code which I put together trying to teach myself using tutorials. What I want to do is: compare expression of my gene of interest (SOX10) between different PAM50 subtypes. The thing is, when I do DE on normal versus tumor, it shows that SOX10 is downregulated. However, when I try to visualize SOX10 expression on UCSC (even cBioportal) I see that it is upregulated in basal tumors compared to normal (or normal-like, I'm so confused?!!). I specifically want to show that even though SOX10 is downregulated in pooled subtypes when compared to normal, it is upregulated if only basal subtype is considered. Also, I tried doing multiple constrasts using the contrast.matrix function but it gave me a whole lot of errors so I gave up.

Is there a simpler way to do this? I really need to show stats on a nice boxplot of Log2FC versus PAM50 subtype! I also want to further stratify each Pam50 into Her2+ and Her2-

Please help, and thank you!!

TCGA BRCA - SOX10

library(TCGAbiolinks)
library(ggplot2)
library(dplyr)
library(DESeq2)

Cancer <- "TCGA-BRCA"
query_BRCA <- GDCquery(project = Cancer,
                       data.category = "Transcriptome Profiling",
                       data.type = "Gene Expression Quantification", 
                       workflow.type = "HTSeq - Counts")

samples_BRCA <- getResults(query_BRCA,cols=c("cases"))

queryDown_brca <- GDCquery(project = Cancer, 
                            data.category = "Transcriptome Profiling",
                            data.type = "Gene Expression Quantification", 
                            workflow.type = "HTSeq - Counts", 
                            barcode = samples_BRCA)

GDCdownload(query = queryDown_brca, method = "api", files.per.chunk = 50)

dataPrep1.brca <- GDCprepare(query = queryDown_brca, 
                             save = TRUE, 
                             save.filename = "TCGA_BRCA_HTSeq_Counts.rda")

dataPrep.brca <- TCGAanalyze_Preprocessing(object = dataPrep1.brca, 
                                           cor.cut = 0.6,
                                           datatype = "HTSeq - Counts")

dataNorm.brca <- TCGAanalyze_Normalization(tabDF = dataPrep.brca,
                                           geneInfo = geneInfoHT,
                                           method = "gcContent")

dataNorm.brca <- TCGAanalyze_Normalization(tabDF = dataNorm.brca,
                                           geneInfo = geneInfoHT,
                                           method = "geneLength")

dataFilt.brca <- TCGAanalyze_Filtering(tabDF = dataNorm.brca,
                                       method = "quantile", 
                                       qnt.cut =  0.25)

dim(dataFilt.brca)

### Prepping for DEA ###

sampleTP <- TCGAquery_SampleTypes(barcode = colnames(dataFilt.brca), typesample = "TP")
sampleNT <- TCGAquery_SampleTypes(barcode = colnames(dataFilt.brca), typesample = "NT")
dataFiltTP <- dataFilt.brca[,sampleTP]
dataFiltNT <- dataFilt.brca[,sampleNT]

dataSubt_BRCA <-TCGAquery_subtype(tumor = "BRCA")
dataSubt_BRCA$BRCA_Subtype_PAM50
samplePam50_BRCA.LumA <- dataSubt_BRCA[dataSubt_BRCA$BRCA_Subtype_PAM50 %in% "LumA",]

samplePam50_BRCA.LumA$BRCA_Subtype_PAM50

dataFiltTP.LumA <- dataFiltTP[,substr(colnames(dataFiltTP),1,12) %in% samplePam50_BRCA.LumA$patient]

dim(dataFiltNT)
dim(dataFiltTP.LumA)

DEG_LumA_NT <- TCGAanalyze_DEA(mat1 = dataFilt.brca[,dataFiltNT],
                            mat2 = dataFilt.brca[,dataFiltTP.LumA],
                            pipeline="limma",
                            Cond1type = "Normal",
                            Cond2type = "Tumor",
                            fdr.cut = 0.05 ,
                            logFC.cut = 1,
                            voom=TRUE,
                            method = "glmLRT", ClinicalDF = data.frame())

Error in dataFilt.brca[, dataFiltNT] : subscript out of bounds

rna-seq brca tcga tcgabiolinks • 134 views
ADD COMMENTlink modified 6 weeks ago by Biostar ♦♦ 20 • written 8 weeks ago by Misty0

There are issues, generally, with the functionality of TCGAanalyze_DEA(). It was first elaborated here: Matched paired DEA analysis of BRCA samples using TCGAbiolinks, works for paired = FALSE but not paired = TRUE

I contacted the main developer, who then forwarded it to another developer responsible for that part of the package, and nobody ever got back to me. I then raised a ticket (issue) on GitHub about it but none of the developers responded.

If I may suggest that you obtain the data directly from the GDC and process it from there.

ADD REPLYlink written 6 weeks ago by Kevin Blighe45k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1261 users visited in the last hour