Question: TCGA data analysis: DEG analysis
gravatar for aswanikrishnap1994
4 weeks ago by
aswanikrishnap199410 wrote:

i am currently working with TCGA dataset from UCSC xena browser. I have completed the differential gene expression analysis using Deseq2 from genepattern for one cancer dataset. I have some doubts regarding the result and want to know how to proceed further for analyzing across different cancer samples. I am very new to TCGA and currently doing the analysis based on

Following are my doubts regarding the analysis

1. HT seq count file downloaded from xena has transcript id's, i want gene id's for my analysis. How should i do this?
2. For generating a heatmap for DEG's of different cancer dataset should i use the log2 expression values from DEseq2?

rna-seq tcga deg analysis • 168 views
ADD COMMENTlink modified 4 weeks ago by ATpoint25k • written 4 weeks ago by aswanikrishnap199410

There is no need to SHOUT. I have removed the excessive uppercase letters from your title.

ADD REPLYlink written 4 weeks ago by WouterDeCoster42k
gravatar for ATpoint
4 weeks ago by
ATpoint25k wrote:

Towards 1) You should check how exactly this file has been created. If it is indeed transcript level then aggregate it to the gene level with tximport which you can then seamlessly integrate into DESeq2. Check the respective manuals. Code is given there.

2) I would use Z-transformed log2 expression values for clustering. This could be the log2-transformed values from DESeq2 itself or you use vst or rlog on the raw gene-level data again. The latter two are already log2 after running the command. Given a data frame with FCs you can do t(scale(t(fc.matrix))) to get them. This will focus the clustering on the relative differences between the samples for each gene and is robust against outliers e.g. some genes showing extreme fold changes as the Z-FCs are a relative measure for each gene indicating how much each sample diverges from the mean of all samples for each gene. See e.g. the Wikipedia article on Z-scores (standardization).

ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by ATpoint25k

Thank you so much for the reply,Then i should use the log2 expression values for comparison across the cancer datasets

ADD REPLYlink written 26 days ago by aswanikrishnap199410
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1713 users visited in the last hour