need some help on how to use DESeq2 for TCGA data
1
0
Entering edit mode
2.4 years ago
Leo • 0

Hello, I am sorry for this newbie question, but I spent all morning trying to find it out but can't find a clear answer anywhere.

I want to normalise RNA seq data from TCGA using DESeq2. I use the TCGA-Assembler R package to download RNA seq data using array platform: "gene_RNAseq" which gives an excel with raw_count and scaled_estimate for each patient sample and gene. Should I use the "ProcessRNASeqData" function from the TCGA-Assembler package or just go with the excel file given by the "DownloadRNASeqData" function?

Then before I can start using DESeq2, I have to create a count matrix. How do I do this? Does anyone perhaps have any code ready that I can use for this type of data?

I really would appreciate any help because I have no idea how to continue. Thanks!

edit: read somewhere that I should download "illuminahiseq_rnaseqv2-RSEM_genes_normalized (MD5)" from firehose. how can I convert this data so I can use it for DESeq2?

DESeq2 TCGA R TCGA-Assembler • 3.6k views
ADD COMMENT
1
Entering edit mode

In this repository, I provide R code to download bladder cancer data from TCGA (TCGA-BLCA) using the package TCGABiolinks and then used DESeq2 to analyze the data. If you want to give it a try, just replace "TCGA-BLCA" with your cancer of interest TCGA abbreviation.

ADD REPLY
0
Entering edit mode

Hamid Ghaedi , the repository you provided has really been helpful. however, I have problem with this syntax write.csv(res_df, file= paste0(resultsNames(dds)[2], ".csv") initially I corrected it by adding a close bracket. But then, after running the code, I got this as saved file

saved_csv

I ran this before the write.csv

df <- ensid_symbol(row.names(res_output))

result_df <- as.data.frame(res_output)
result_df$ensembl_gene_id <- row.names(result_df) result_df <- merge(df,result_df, by = "ensembl_gene_id") resOrdered<-result_df[with(result_df, order(abs(log2FoldChange), padj, decreasing = TRUE)), ]

I have tried all I can but it seems am not getting solution. is there something am doing wrong?

kindly assist

ADD REPLY
0
Entering edit mode

Jakpa Thanks for mentioning the closing bracket :). Can you change the values you have passed to alpha= (let's set this as 0.1) and/or lfcThreshold = (let's set this as 0.5) in the following chunk in your code? Then run the code and inspect the result

res <- results(dds, alpha = 0.05,  altHypothesis = "greaterAbs", lfcThreshold = 1.5) # alpha controls FDR rate
ADD REPLY
0
Entering edit mode

Hamid Ghaedi , Thank you for your response. But, its still same output. i.e, empty rows

I couldnt figure the reason:)

any suggestion?

ADD REPLY
0
Entering edit mode

Then open a new question and provide all the code you are using, and you will get feedback.

ADD REPLY
0
Entering edit mode

Leo, do not delete posts that have received feedback.

ADD REPLY
3
Entering edit mode
2.4 years ago
Barry Digby ★ 1.3k

Here is a .Rmd file for downloading miRNA, mRNA expression from TCGA-PRAD and downstream DESeq2 analysis using TCGA-Biolinks, which should do what you need. Take the parts you need and substitute in your cancer of interest at the biolinks step.

https://github.com/BarryDigby/TCGA_Biolinks/blob/master/TCGA_Biolinks.Rmd

A cautionary tale: you will be missing ~10% of the ~60,000 genes (the same happens with firehose)..

ADD COMMENT

Login before adding your answer.

Traffic: 1801 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6