Can somebody explain to me how is differential promoter use inferred from RNAseq data? Here I'm talking about the way it is implemented by cuffdiff (I followed TUXEDO pipeline). In the documentation it says:
This tab delimited file lists, for each gene, the amount of overloading detected among its primary transcripts, i.e. how much differential promoter use exists between samples. Only genes producing two or more distinct primary transcripts (i.e. multi-promoter genes) are listed here.
What is considered a primary transcript? Is it equivalent to pre-mRNA ? If my library is poly(A) enriched it shouldn't be rich for pre-mRNAs yet I still get a result (TEST STATUS OK) for some of the genes. I understand that poly(A) enrichment is not ideal and hence there could have been some reads derived from pre-mRNAs but in that case I would have been limited by stochastic forces ? Hence making comparisons between conditions would be challenging. And finally why would differential levels of pre-mRNA indicate differential promoter usage and why only genes producing two or more distinct primary transcripts would be included in analysis while the rest would not?