Entering edit mode
4.8 years ago
lovely.molbio
▴
10
Hi, I am trying to analyze TCGA breast cancer patients data. Based on a gene expression (high and low) and survival, I have divided all breast cancer patients into two groups. I have sorted patients/sample ID into two groups. Now, I want to analyze gene expression of whole set (high expression sample vs low expression samples) by creating two groups and analyzing DEG. Do i need to download the data from TCGA to perform this analysis? If yes, how can i do this? Thanks a lot in advance.
...but you indicate that you have already obtained the data, or am I incorrect? What exactly do you currently have, and what more do you need?
Thank you for your response. No, I have not downloaded any data from TCGA, yet. I have used Broad Firehose and some other tools to classify the data. Now I would like to perform a detailed gene expression analysis among two groups. There is normalized data available on Broad firehose portal, but, as much as I could understand this Broad Firehose data is not compatible with R based DEG analysis. So, my question is how should I proceed now from here?
I have not used the data from Broad Firehose; however, UCSC's Xena Browser has expression data available in HTseq and FPKM counts: https://xenabrowser.net/datapages/ (look for the 'GDC TCGA' datasets)
You will want those HTseq counts, however, you will have to convert them back to raw integer counts, as I show here: A: Normalisation of RNAseq data from UCSC Xena Browser
After that, the counts will be okay to input to DESeq2, EdgeR, or the limma / voom pipeline for processing.
Awesome. I'll do that. Thanks a lot.