Hi all, I'm new in rna-seq analysis and confused about the gene expression profiling. How to obtain the overview gene expression profiling such as how many total number of protein-coding gene, non-coding gene and pseudogene? I tried with the workflow from article: Toward a Reference Gene Catalog of Human Primary Monocytes. (https://doi.org/10.1089/omi.2016.0124)
- Cuffnorm (The FPKM >0.1 threshold was used to determine expressed transcripts)
This article also reported as by applying an FPKM >0.1 threshold, we have identified a total of 20,371 genes and 82,996 transcripts expressed in our monocyte datasets.
The part I confused is how to applying an FPKM >0.1 threshold and which file should I applied to (cuffnorm output file: gene.fpkm_table or transcript.gtf file)? And how they identified the amount of protein-coding, non-coding and pseudogene from these 20,371 genes?
There have many article reported their result as how much of total genes and transcripts in their datasets, but I really confused how they obtain it.
I really need some help to understand this. Thank you