We now have experimentally validated transcripts and thereby new and improved gene-models for some of our genes. I want to assign counts from our previous RNA-seqs to those, and therefore I will have to re-count. So far so good. I would like to keep the counts for the 'official' predicted models though until the Ensembl models are updated. Do I have to remove the old predicted gene-models that overlap the new ones? I am having the following concerns:
- If I do not keep the old models, I am not going to get counts for them and not being able to display these counts or to see if the new models change much, so I should keep all.
- If I do not remove old overlapping models, reads hitting the overlapping regions might need to be counted twice.
- If reads are counted twice, am I messing with library size/composition normalization (calcNormFactors, double number of mapping reads for some genes) and thereby defeating normalization?
- I don't think this is comparable to alternative transcripts, because in the update-case only one of those transcripts likely exists.
I am using easyRNAseq for counting and edgeR for CPM calculation for now.