Question: Using Gencode annotation with cufflinks
gravatar for drollix
5.3 years ago by
United States
drollix10 wrote:


I am experimenting with the latest Gencode human annotation releases (v20 and v21) with my RNA-seq pipeline. After filtering the annotation file for known and level 1 and level 2 transcripts, I find that cufflinks reports multiple FPKM values for some ENSG genes in the gene.fpkm_tracking file. A manual inspection suggests that the multiple FPKM values correspond to distinct transcripts for these genes, but these values should ideally get rolled up into the gene level value (as it does for a majority of the other multi-transcript genes).

Has anyone else noticed this, and is there a secret sauce to working with the Gencode annotation GTF files with cufflinks?

rna-seq cufflinks • 1.9k views
ADD COMMENTlink modified 5.3 years ago by EagleEye6.6k • written 5.3 years ago by drollix10
gravatar for EagleEye
5.3 years ago by
EagleEye6.6k wrote:

if you experience problem with V20, try this on your GTF and check your results (I had problem saying that duplicate GeneID, in your case cufflinks did not throw it as duplicate but instead it has two different expression values).


sed -i 's/\"\ tag\ \"PAR\"\;/\"\;\ tag\ \"PAR\"\;/g' gencode.v20.annotation.gtf 

ADD COMMENTlink modified 5.3 years ago • written 5.3 years ago by EagleEye6.6k

Thanks, I'll give it a try, though the genes with duplicated FPKM's were not on chromosome Y. I also didn't find any pattern (yet) about the transcript structures of the multi-FPKM genes. Some had disjoint transcripts while some had completely overlapping ones.

ADD REPLYlink written 5.3 years ago by drollix10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1084 users visited in the last hour