Question: TCGA- which files to download for analyzing differentially expressed miRNAs
0
gravatar for Aly S
7 days ago by
Aly S0
Aly S0 wrote:

I have just started working on TCGA data, and I observed that the RNA-seq (HT-Seq counts) files also have the ENSEMBL gene ids for miRNAs, which means that the expression values of miRNA genes are also present in the RNA-seq files.(?)

So then why does TCGA have a separate miRNA quantification dataset (files ending with .mirbase.mirna.quantification)?

I am confused because I plan to find both the differentially expressed genes as well as miRNAs, and don't know which dataset to consider for DESeq2.

Please help! :(

ADD COMMENTlink written 7 days ago by Aly S0
1

You need to download them separately.

The mirbase.mirna.quantification files are what you want for miRNA DE analysis. You will want to subset the HT-Seq counts too if they contain roughly 50,000 rows (harmonized data) to contain only coding genes ~20,000

ADD REPLYlink written 7 days ago by Barry Digby460

Thank you so much! Any idea how can I filter out only the coding genes?

ADD REPLYlink written 6 days ago by Aly S0
1

I have code here (https://github.com/BarryDigby/TCGA_Biolinks/blob/master/TCGA_Biolinks.Rmd) that does everything you want: download data, prepare metadata, filtering coding genes, differential expression analysis. It's a good basic template to start with.

It was conducted on TCGA PRAD. Install packages as required, change PRAD to your tissue type of interest and you are good to go.

ADD REPLYlink modified 6 days ago • written 6 days ago by Barry Digby460
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1121 users visited in the last hour