So, I tried to investigate correlation between a gene and it's gene promoter. I realized that for both the gene and its promoter have several alternative splicing and it is listed as different entry in my read count table (has its own Ensemble gene ID). I generated read count matrix using express and then manually read it with DESeq2's function DESeqDataSetFromMatrix. After that I call rlog function to the DESeq object and tried to plot the assay.
I'm kinda confused to see the correlation between gene and its promoter because there are many transcript for the gene and many transcript for the promoter (well, promoter is also gene). What I'm thinking is, can I just add each of the gene transcript so that I get the total transcript from all of the splice variant? I'm not sure but I remember read some post about DESeq2 which we can not just add directly from 2 different gene and should do some normalization. Is the normalization already included in rld function? Thank you.
By the way, I noticed something strange after calling the rld function. I found that several example which has 0 read counts actually have some values in the log transform value after the rld function. Is it normal? Thank you