I'm using Galaxy to analyze some mouse RNA seq samples. I'm encountering a strange issue that I think is coming up in Cufflinks. This is a screenshot of my cufflinks settings. I'm getting weird FPKM values across different samples for a bunch of Snords and MIRs. For example, in one sample, these genes are very high and apparently absent in the other three
If I inspect the BAM file of accepted hits generated by Tophat2, I can see the samples that gave a value of 0 are actually quite high and similar to the sample that gave a value of 2700. So, it looks like Tophat is doing a good job of placing the reads, but cufflinks is improperly assigning FPKM values.
Also, I've noticed that the conf_lo and conf_hi values are bit goofy. For example, for Snord34 FPKM, FPKM_conf_lo, FPKM_conf_hi values for two of the four samples are:
So the issue is that if I'm looking at differentially regulated genes, these sorts of genes are popping up and I have to inspect them all manually in IGV to see if indeed there are different counts. I should say that each sample seems to have its own signature of weird Snords and other genes. Put another way, there are about 10 genes, usually small nucleolar RNAs, that give FPKMs of >2000 for one sample and 0 for the other three and another 10 genes for another sample.