When processing raw fasta data from the ENCODE dataset, I am getting inconsistent rpkm count values between biological replicates using cufflinks, even though the number of aligned reads in the respective bam files are similar. The inconsistent ENCODE datasets give me very low global gene expression that is obviously not correct. For example, in some datasets, only 400 of a group of 1400 genes are "expressed", whereas the correct value should be around 1000 from my analysis of ENCODE biological replicate, values provided by the ENCODE pipeline and data from ProteinAtlas.
Does anybody have experience with low rpkm values produced by Cufflinks? in the genes.tracking file, those genes that are not expressed have 0s for everything and "OK" status.
Thank you!
Tony