Hello all,
I am interested in examing precursor miRNA data from TCGA (GDC Data Portal).
Although I do not have access to controled raw sequences (FASTA or BAM), I noticed that RNA-seq data in the format of "HTSeq - Counts" contain read counts and RPM (read counts per million) values with miRNA identifiers (both in ensemble gene ids and external gene names).
Specifically I am using R and the TCGAbiolinks package to access GDC data.
Therefore after examing the RNA seq data (which are in "HTSeq - Counts" and belong to a RangedSummarizedExperiment object), I noticed that the rows with the miRNA IDs are mapped to miRNA genomic coordinates (IRanges object) that vary in width from 41 to approximatelly 8*10^6 bp.
So my questions are:
1) Can I use rows with miRNA identifiers from RNA-seq in HTSeq - Counts format, retrieved from GDC Data Portal, as expression values for precursor miRNAs?
2) From these related posts about miRNA-seq and miRNA-expression in TCGA I figured that the miRNA-seq data of GDC data portal correspond to mature miRNAs and isomirs and therefore can not be used for evaluating the expression of precursor miRNAs.
Is this conclusion correct?
Thanks in advance!
Kostas