I'm interested in finding novel genes and isoforms using Cufflinks but I've run into a problem.
When I run cufflinks with --GTF /<my annotation file> option my genes.fpkm file contains about 25,279 entries with gene ids, FPKM, etc. Everything looks fine.
When I run with --GTF-guide, my genes.fpkm file contains 24,984 entries with 8,275 of the gene_IDs indicating a novel gene with the CUFF.* identifier. What I expected was more entries than in the first case, and a few novel genes to complement the first dataset.
This seems like an awful lot of novel genes. Furthermore, many of the gene IDs found in the --GTF output are missing in the --GTF-guide output as if cufflinks decided in the second case to call them novel or to remove them all together. I'm not sure if this sort of thing is usual or if I'm justified in being confused by this output?
Thanks for any help!,
-Jeremy
Did you specify the
-M/--mask-fileoption in these runs?Can you share your input command?