Question: Identifying 5p and 3p in miRNA isoform expression data from TCGA for feature selection
I've downloaded some miRNA expression data from TCGA (for CHOL) and the isoform quantification files look like this:

miRNA_ID    isoform_coords  read_count  reads_per_million_miRNA_mapped  cross-mapped    miRNA_region 
hsa-let-7a-1    hg38:chr9:94175939-94175962:+   1   0.706072    N   precursor 
hsa-let-7a-1    hg38:chr9:94175942-94175962:+   1   0.706072    N   precursor 
hsa-let-7a-1    hg38:chr9:94175961-94175984:+   2   1.412144    N   mature,MIMAT0000062 
hsa-let-7a-1    hg38:chr9:94175962-94175981:+   45  31.773244   N   mature,MIMAT0000062

However, in other projects and papers, I always see selected features labeled as hsa-let-7a-1-3p or hsa-let-7a-5p, etc. Where is the 3p/5p coming from? Does it correspond with the +/- strand?

Additionally, how do I pool this data between different samples so I can run differential expression analysis between data from CHOL samples and other cancer types (i.e., BRCA). My end goal is to perform feature selection methods and then use the selected features to predict cancer types, but I am unsure how to process this data.

Thanks in advance.

Did you find a way to do this? I want to figure out the 3p/5p forms from the isoform quantification files too, but don't know how or where to begin!

