Question

Advice needed on survival analysis using TCGA

0

Entering edit mode

17 months ago

CaroH ▴ 10

Hello everyone,

I would like to analyze the following TCGA dataset: TCGA-LUAD. I would like to see if several genes of interest are correlated with a better chance of survival in patients with LUAD.

I managed to retrieve the information needed using TCGAbiolinks. However, after running GDCdownload & GDCprepare, I am a bit lost on how to proceed forward.

I looked at this interesting post: Survival analysis of TCGA patients integrating gene expression (RNASeq) data, but unlike the author, I had to retrieve the dataset using STAR - Counts and I only have one condition which is the samples coming from the Primary Tumor.

Do I need to normalize my data? If so, how should I normalize it?

Thanks for your advice!

Transcriptomics Gene-expression TCGA • 731 views

ADD COMMENT • link updated 17 months ago by Hamid Ghaedi 3.2k • written 17 months ago by CaroH ▴ 10

score 0 · Answer 1 · 2022-12-09

Do I need to normalize my data?

Yes, you do

If so, how should I normalize it?

There are different approaches that you may follow e.g. voom function from limma package to normalize the data. Check out this repo, it may help you with your analysis.

With regards to the type of data, you can start with counts coming from STAR, then proceed with normalization as indicated in the abovementioned GitHub repository.

In terms of conditions, my understanding is that you want to do a survival analysis between groups of samples showing "high" and "low" levels of expression for a gene of interest. So this can be a column in your data frame and would be the grouping variable in your analysis.