TCGA expression dataset handling
1
0
Entering edit mode
3.2 years ago
imaparna27 ▴ 20

Hi, I have downloaded open-access TCGA FPKM-samples for expression analysis with all the relevant manifest and metadata files form TCGA cart. It was required to download data separately due to different sub-types and on the basis of raw data processing (like RNA-seq, miRNA-seq).

Now how do I form a matrix for DEG analysis, given that all my samples are individual txt files and I require to put let's say 100 samples and their sample conditions altogether in a csv file?

Also, I am using R-DESeq2 for DEG analysis. What can be the way to solve this issue, also suggest if there is any alternative against FPKM from TCGA, for TCGA data expression analysis?

expression analysis TCGA samples FPKM data • 1.5k views
ADD COMMENT
0
Entering edit mode
3.2 years ago
dsull ★ 5.8k

Generally, I find it easier to obtain TCGA data from https://xenabrowser.net/datapages/ -- you get gene expression data for all samples into a single file (and then you can just read that file into R and select your samples of interest).

From that site, you can also get counts that you can use for DEG analysis. You need to use counts, not FPKMs. You unfortunately can't use FPKMs for statistically sound DEG analysis.

I prefer using the files generated by "UCSC Toil RNA-seq Recompute" on that site -- that pipeline is more up-to-date than GDC's.

ADD COMMENT
0
Entering edit mode

Thanks for your response. I went through the data and matrix as suggested by you, however I couldn't get sample info whether it is control or tumor? It seems matrix has only counts information along with TCGA barcodes/identifiers in columns.

ADD REPLY
1
Entering edit mode

The TCGA barcodes look something like: TCGA-CJ-4875-01

To figure out what this means (e.g. if it's tumor or normal), please refer to: https://docs.gdc.cancer.gov/Encyclopedia/pages/TCGA_Barcode/

(Hint: The example I provided above is tumor, not normal)

ADD REPLY
0
Entering edit mode

Hi, From your comment, I didn't get how we may know about our samples of interest. I'm new in genomic data analysis and R.

I downloaded data sets for two tumours from GDC and combined all of the samples into one file for 60483 gene_ids. My purpose is to find out the common genes in both tumors. What I did, I used DESeq2 and that gave me a result file, showing differentially expressed genes. I'm confused now that either I'm doing right or not? Please guide me. Thanks

ADD REPLY

Login before adding your answer.

Traffic: 3007 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6