Question: Comparison of TCGA and SRA data
1
gravatar for swatikrishna113
2.9 years ago by
swatikrishna11310 wrote:

Hello all, I am currently working on RNA seq analysis of TCGA and SRA cancer data.I have processed the SRA data similar to TCGA by following their pipeline.The problem appears in differential expression analysis.I have used edgeR, voom and DESeq to carry out the DEG analysis using both expected and normalized count.I am not confident about the results since the pooled normal samples that come from both SRA and TCGA cluster differently which leads me to believe that there might be a bias.I have 2 questions to ask regarding this: 1.Is the broad firehose data of TCGA genes.normalized.results samples already log 2 normalized after upper quartile normalization or should we normalize them before finding the differentially expressed genes? 2.Is there any other method to identify the DEGs which takes into account the database differences?

Regards, Swati krishna

ADD COMMENTlink modified 2.7 years ago by Biostar ♦♦ 20 • written 2.9 years ago by swatikrishna11310

Did you try doing batch normalization (using Combat)? TCGA and SRA might be sequenced using different library kits (poly A vs rib zero or vice versa )which also have to be corrected.

ADD REPLYlink written 2.9 years ago by Ron970

Hello Ron, Sorry for the late reply.yes, i tried Combat,but its giving negative values in the result matrix.How can i give it as an input to edgeR?

ADD REPLYlink written 2.8 years ago by swatikrishna11310

Check out these links: edgeR: Correct pipeline for DE analysis with multiple conditions and batches

http://www.bioconductor.org/help/workflows/rnaseqGene/#batch

Remove Batch Effect From RNAseq with SVAseq and Combat

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by Ron970

I used the below code to use Combat to remove batch effects from normal and cancer expected counts seperately. I replaced the negative values with 0 and continued with edgeR. Is this a right way to proceed?

sample=c("N1","N2","N3","N4","N5","N6", "N7","N8","N9","N10", "N11","N12","N13","N14","N15","N16", "N17","N18","N19","N20", "N21","N22","N23","N24","N25","N26", "N27","N28","N29","N30")

db=c(rep("TCGA",12),rep("SRA",18))

type=c(4,1,4,1,5,4,4,1,5,1,1,4,2,3,3,3,2,3,3,2,2,2,2,2,2,3,2,2,3,3)

batch=data.frame(sample,db,type)

my_DGEList <- DGEList(counts=normal_ec)

normal_norm <- calcNormFactors(my_DGEList, method="upperquartile")

my_data = cpm(normal_norm, log=TRUE, prior.count=2)

combat <- ComBat(dat=my_data, batch=my_batch,par.prior = FALSE, prior.plots = TRUE) # batch file included 5 batches categorized according to the cluster dendrogram plotted for the normal data.

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by swatikrishna11310
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2295 users visited in the last hour