I have recently posted a question on this matter, but this one is quite a different issue. I am using cufflinks to quantitate my RNA-seq data. And in order to get the gene names, the
-g argument is enabled, with the proper
gtf reference file. In short, the command is the following:
cufflinks -b hg19.fa -g hg19.refGene.gtf -u [sample.bam]
All the output files are generated (genes.fpkm_tracking, isoforms.fpkm_tracking, skipped.gtf, transcripts.gft). However, when I examined the *genes.fpkm_tracking file, most of the targets are not annotated (they are presented as CUFF.[ID number]). I thought that it had something to do with the reference
gtf file. And in fact, it seems that the correct annotation is missed by one base. As an example, these are the locations of a few genes on the genes.fpkm_tracking file that cufflinks is not able to map:
And these are the locations of real genes, according to the
ATM Chr11: 108,093,559-108,239,826
Note: Although the
gtf file contains various transcripts for the same gene, I am just using one example per gene (showing those starting from the very first base).
This is observed over a few thousands of transcripts (the difference of one base, it is). So my questions are:
1. What seems to be the problem here?
2. Is there any argument in cufflinks one can use to fix this?
3. Is there any other way to solve it?
As always, any insight is greatly appreciated. (: