Question: Identifying 5p and 3p in miRNA isoform expression data from TCGA for feature selection
gravatar for jiaqiwu
2.1 years ago by
jiaqiwu10 wrote:

I've downloaded some miRNA expression data from TCGA (for CHOL) and the isoform quantification files look like this:

miRNA_ID    isoform_coords  read_count  reads_per_million_miRNA_mapped  cross-mapped    miRNA_region 
hsa-let-7a-1    hg38:chr9:94175939-94175962:+   1   0.706072    N   precursor 
hsa-let-7a-1    hg38:chr9:94175942-94175962:+   1   0.706072    N   precursor 
hsa-let-7a-1    hg38:chr9:94175961-94175984:+   2   1.412144    N   mature,MIMAT0000062 
hsa-let-7a-1    hg38:chr9:94175962-94175981:+   45  31.773244   N   mature,MIMAT0000062

However, in other projects and papers, I always see selected features labeled as hsa-let-7a-1-3p or hsa-let-7a-5p, etc. Where is the 3p/5p coming from? Does it correspond with the +/- strand?

Additionally, how do I pool this data between different samples so I can run differential expression analysis between data from CHOL samples and other cancer types (i.e., BRCA). My end goal is to perform feature selection methods and then use the selected features to predict cancer types, but I am unsure how to process this data.

Thanks in advance.

ADD COMMENTlink written 2.1 years ago by jiaqiwu10

Did you find a way to do this? I want to figure out the 3p/5p forms from the isoform quantification files too, but don't know how or where to begin!

ADD REPLYlink written 6 weeks ago by ginny0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1405 users visited in the last hour