Comparison of TCGA and SRA data
0
1
Entering edit mode
7.5 years ago

Hello all, I am currently working on RNA seq analysis of TCGA and SRA cancer data.I have processed the SRA data similar to TCGA by following their pipeline.The problem appears in differential expression analysis.I have used edgeR, voom and DESeq to carry out the DEG analysis using both expected and normalized count.I am not confident about the results since the pooled normal samples that come from both SRA and TCGA cluster differently which leads me to believe that there might be a bias.I have 2 questions to ask regarding this: 1.Is the broad firehose data of TCGA genes.normalized.results samples already log 2 normalized after upper quartile normalization or should we normalize them before finding the differentially expressed genes? 2.Is there any other method to identify the DEGs which takes into account the database differences?

Regards, Swati krishna

RNA-Seq TCGA SRA Diffrential Expression • 2.5k views
ADD COMMENT
0
Entering edit mode

Did you try doing batch normalization (using Combat)? TCGA and SRA might be sequenced using different library kits (poly A vs rib zero or vice versa )which also have to be corrected.

ADD REPLY
0
Entering edit mode

Hello Ron, Sorry for the late reply.yes, i tried Combat,but its giving negative values in the result matrix.How can i give it as an input to edgeR?

ADD REPLY
0
Entering edit mode

I used the below code to use Combat to remove batch effects from normal and cancer expected counts seperately. I replaced the negative values with 0 and continued with edgeR. Is this a right way to proceed?

sample=c("N1","N2","N3","N4","N5","N6", "N7","N8","N9","N10", "N11","N12","N13","N14","N15","N16", "N17","N18","N19","N20", "N21","N22","N23","N24","N25","N26", "N27","N28","N29","N30")

db=c(rep("TCGA",12),rep("SRA",18))

type=c(4,1,4,1,5,4,4,1,5,1,1,4,2,3,3,3,2,3,3,2,2,2,2,2,2,3,2,2,3,3)

batch=data.frame(sample,db,type)

my_DGEList <- DGEList(counts=normal_ec)

normal_norm <- calcNormFactors(my_DGEList, method="upperquartile")

my_data = cpm(normal_norm, log=TRUE, prior.count=2)

combat <- ComBat(dat=my_data, batch=my_batch,par.prior = FALSE, prior.plots = TRUE) # batch file included 5 batches categorized according to the cluster dendrogram plotted for the normal data.

ADD REPLY

Login before adding your answer.

Traffic: 2141 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6