TCGA miRNA data - is this an appropriate approach?
Entering edit mode
7.4 years ago

I have never worked with micro RNA data previously and want to know if there is anything wrong with the methodology I've come up with for analyzing it:

  1. Download Level_3 TCGA miRNA data for all samples in the cancer type of interest (this data consists of two-column file for each sample; one column lists miRNA names, the second lists normalized expression values)
  2. Use clinical data to define sub-populations based on Patient Barcode.
  3. Within each sub-population of interest, compute the mean and median for each miRNA across samples then compute the standard deviation and variance for both mean and median.
  4. Use the descriptive statistics from step 3 to determine which miRNAs change the least within the sub-population of interest then compare their mean/median against different sub-populations of interest. The rationale for choosing least-changing miRNAs is that if they differ from other sub-populations when compared, the difference is more likely to be significant. The selection of mean vs median for comparison is decided based on the prevalence and intensity of outliers for a given miRNA within a sub-population.
  5. If you find any miRNAs that are differentially expressed in sub-populations, calculate some error bars

If there is anything glaringly wrong or problematic with this approach - please let me know. Anything more subtle, I'll still be happy to hear about it.

TCGA RNA-Seq miRNA • 1.8k views
Entering edit mode

I am not familiar with miRNA analysis, but for RNA Seq analaysis, the data usually follow a negative binomial model where the variance increases as mean increases. So you might want to take that into consideration when you perform your analysis


Login before adding your answer.

Traffic: 1054 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6