I've downloaded some miRNA expression data from TCGA (for CHOL) and the isoform quantification files look like this:
miRNA_ID isoform_coords read_count reads_per_million_miRNA_mapped cross-mapped miRNA_region hsa-let-7a-1 hg38:chr9:94175939-94175962:+ 1 0.706072 N precursor hsa-let-7a-1 hg38:chr9:94175942-94175962:+ 1 0.706072 N precursor hsa-let-7a-1 hg38:chr9:94175961-94175984:+ 2 1.412144 N mature,MIMAT0000062 hsa-let-7a-1 hg38:chr9:94175962-94175981:+ 45 31.773244 N mature,MIMAT0000062
However, in other projects and papers, I always see selected features labeled as
hsa-let-7a-5p, etc. Where is the
3p/5p coming from? Does it correspond with the +/- strand?
Additionally, how do I pool this data between different samples so I can run differential expression analysis between data from CHOL samples and other cancer types (i.e., BRCA). My end goal is to perform feature selection methods and then use the selected features to predict cancer types, but I am unsure how to process this data.
Thanks in advance.