So im very new to this whole deal, and very new to computer science stuff in general. I trying to do RNA seq computation and seem to be running into an unusual problem (i think). I am running Slurm jobs in the terminal and my end results are weird. The job is a whole pipeline using bowtie2, then tophat, then cufflinks, cuff quant, and then featurecounts and cuff norm. The idea is to take the raw counts from featurecounts and use it in edgeR. I run cuffnorms at the end to get FPKM counts, just to get an idea before starting edgeR. I noticed that feature counts is outputting counts with about 25,000 gene or features, yet cuffnorms is outputing 57,000 gene or features. The whole pipeline is using the same .gff3 and .fa files from ensembl (mouse). Does anyone know why this is happening?
Predicting novel transcripts is the whole point of using cufflinks
Yes, but many people run it even if they are not interested in them. For example, the original poster expects the output to match the original GFF file (known genes only).
If that is the case I strongly reccomend using Salmon or Kallisto. Kallisto can be downloaded from here and the manual for running Kallisto can be found here. Salmon can be downloaded from here and a manual for running Salmon can be found here. I actually wrote a entire section about the considerations for usage of different quantification tools recently.