Dear all, we have the RNA-SEQ data for 7 timepoints for a species, and this species has a reference genome. I found some papers will identify the gene number in each timepoint, but they used different method. 1. Some "an ad hoc cutoff for detectable expression was set at >=2 reads per transcript. Using this cutoff, 11187 gene transcripts could be detected in the RNA-Seq data set. " 2. Some using" we counted the number of clean reads aligned to litchi gene sequences and performed normalization using the RPkM method. After lowly expressed genes (< 5RPKM) were filtered, we identified 17572 genes in all samples"
It seems one of the method is using cutoff for the reads count directly, and another method works on the normalized data. So my first question is that which one should I use..(my species is channel catfish). And my second question is that, each of the method removed the low expressed genes, right? But for counting the expressed genes in different timepoints, I think there is no need for us to filter the low expressed genes (since these genes also expression...)
I will be appreciated if you could help solve this problem..