I'm using data from e coli, which is a pretty much well studied organism and a nice annotation file(NC000913.3) can be downloaded from NCBI.
After I used cufflinks with the very basic command line(see below) and got the genes.fpkm_tracking.txt. I found 2576 features(or lines in cufflinks' output file )' fpkm, while the original annotation file consisted of ~4500 genes. In another dataset(different experiment) I used, cufflinks gave me 3502 features.
I found that part of the reason was that cufflinks grouped some genes together(assume they belonged to a single transcript ??!!). And I think another explanation is that the 0 count genes were not reported.
Do you think it's a reasonable analysis result?
Command line cufflinks -o <out_dir >="" -g="" <annotation.gff=""> <aligned.bam>
Example lines of cufflinks output: tracking_id class_code nearest_ref_id gene_id gene_short_name tss_id locus length coverage FPKM FPKM_conf_lo FPKM_conf_hi FPKM_status
CUFF.8 - - CUFF.8 fixC,fixX - NC_000913.3:44179-45750 - - 10.5497 1.36756 19.7577 OK
gene44 - - gene44 yaaU - NC_000913.3:45806-47138 - - 1.98785 0.960989 3.00309 OK
CUFF.9 - - CUFF.9 kefC,kefF - NC_000913.3:47245-49631 - - 49.5173 34.9478 49.0769 OK
CUFF.10 - - CUFF.10 folA - NC_000913.3:49822-50302 - - 0 0 1.00003 OK
CUFF.11 - - CUFF.11 - - NC_000913.3:50567-57253 - - 116.716 112.037 122.431 OK
gene48 - - gene48 apaH - NC_000913.3:50379-51222 - - 0 0 0.379608 OK
gene49 - - gene49 apaG - NC_000913.3:51228-51606 - - 0 0 0.846585 OK
CUFF.12 - - CUFF.12 pdxA,rsmA,surA - NC_000913.3:51608-54702 - - 1039.98 872.984 933.547 OK