I recently calculate the differential expressed genes from RNASeq data by DESeq. One question is bothering me.
When we checked the number of reads in every gene we annotated, we found some genes have huge number of reads, compare to total reads, such as:
sample s-664561 s-665905 s-655605 ZC1 ZC2 ZC3 total_reads 1190411 2702168 4201061 7254155 1546197 8178163 TCONS_00022986+TCONS_00022987 831654 2391414 3859575 541428 268746 758906 % 69.86276168 88.49982681 91.87143438 7.463694944 17.38109698 9.279663416
One gene TCONS_00022986+TCONS_00022987 takes about 70-90% of total reads in S group. I doubt this will affect the differential expression analysis in DESeq. So how to handle this issue in analysis. Or just simply remove this gene?
Thanks a lot, Cam
Could you elaborate on your experimental design and analysis pipeline? How was the library prep and sequencing performed, which steps did you take to generate these counts?
library is total RNA with rRNA depleted, and strand-specific. HTSeq-count for generating counts.