Hello! I have some questions about the data from TCGA(experimental.strategy = "miRNA-Seq"), there are two data types "miRNA Expression Quantification" and "Isoform Expression Quantification". I have read GDC data user guide which says that The former contains summed expression for all reads aligned to known miRNAs in the miRBase reference. If there are multiple alignments to different miRNAs or different regions of the same miRNA, the read is flagged as cross-mapped and every miRNA annotation is preserved. The latter contains observed isoforms.
I have download TCGA project TCGA-PRAD with experimental.strategy = "miRNA-Seq" from both types data, and I want to predict the differential expression miRNA targets, I have read some posts in biostar, I think I need mature miRNA counts, and the format of miRNA name is MIMAT in data Isoform Expression Quantification column miRNA_region. now I will aggregate the "Isoform Expression Quantification" data to acquire the mature miRNA data, but I have some questions. from GDC data user guide I can see If there are multiple alignments to different miRNAs or different regions of the same miRNA, the read is flagged as cross-mapped, so I think I should take these action, all data below is from "Isoform Expression Quantification", fistly, if the data is from the same miRNA_ID(Isoform Expression Quantification column) with cross-mapped value is Y, I should keep the max read_counts value, and next if the data is from the same miRNA_region I should sum the values from different miRNA_ID. Is it right?
Please correct me if you think I'm wrong!
Thanks in advance!