Question

How to handle data_RNA_Seq_v2_expression_median from TCGA

2

Entering edit mode

6.7 years ago

zhangyunjie1992 ▴ 30

hi, I currently downloaded the data_RNA_Seq_v2_expression_median.txt from cbioportal. I found the read_counts is not integer and I don't know how to process this type of normalized data with Deseq2.

Should I download the previous-leveled data from portal and use Deseq2 from bottom to the top, or is there other package to find out the differentially expressed genes?

Hugo_Symbol TCGA-BJ-A0YZ-01 TCGA-BJ-A0Z0-01 TCGA-BJ-A0Z2-01
UBE2Q2P2    1.8867  2.6927  10.0867
HMGB1P1 139.6335    181.2141    203.7297
LOC155060   45.3978 131.8725    248.4856
RNU12-2P    0.4165  0.3948  0.9502
SSX9    0   0   0
CXORF67 0   1.1845  2.3756

Hope some with experience handling the pre-processed data from TCGA could answer this question.

Many thanks!

Michael

rna-seq TCGA • 3.3k views

ADD COMMENT • link updated 5.9 years ago by Kevin Blighe 87k • written 6.7 years ago by zhangyunjie1992 ▴ 30

score 1 · Answer 1 · 2018-06-13

Edit: 14th May, 2020:

better to obtain the HT-seq raw counts from Xena Browser and process those in DESeq, following this guidance: A: Normalisation of RNAseq data from UCSC Xena Browser

Original answer:

------------------------------

For DESeq2, you should obtain the raw counts. This data from cBioPortal is already normalised.

However, if you obtain the Z-scores from cBioPortal, then, according to cBioPortal, you can infer that something is higher in tumour if it has a Z-score >=2.

If you want the raw counts, you can obtain those from the GDC Data Portal.

Kevin