Cuffmerge drops gene_ids and transcript features?
1
3
Entering edit mode
9.2 years ago
m.e.chaffee ▴ 30

Hello All,

I have run cufflinks on 6 different samples. I want to use cuffmerge to combine the .gtf files for cuffdiff. However, I have two problems.

ex. ind.gtf:

GL349621.1 Cufflinks transcript 163188 166581 1000 - . gene_id "ACYPI52640"; transcript_id "ACYPI52640-RA"; FPKM "16.6471765920"; frac "0.641012"; conf_lo "15.699389"; conf_hi "17.594964"; cov "133.457105"; full_read_support "yes";
GL349621.1 Cufflinks exon 163188 164381 1000 - . gene_id "ACYPI52640"; transcript_id "ACYPI52640-RA"; exon_number "1"; FPKM "16.6471765920"; frac "0.641012"; conf_lo "15.699389"; conf_hi "17.594964"; cov "133.457105";

merged.gtf:

GL349621.1 Cufflinks exon 159336 159819 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "1"; oId "CUFF.3.1"; tss_id "TSS1";
GL349621.1 Cufflinks exon 159902 160013 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "2"; oId "CUFF.3.1"; tss_id "TSS1";
  1. In the third column of the ind.gtf files there are exons and transcript. When I run cuffmerge the transcript lines are not there? Only exon features are left.
  2. In the ind. gtf files I have gene_ids, but when my files are merged cuffmerge seems to remove the gene_id's and replace them with the XLOC numbers. Is there any way to prevent this from happening?
RNA-Seq Cuffmerge • 3.8k views
ADD COMMENT
1
Entering edit mode

Were you able to fix this? I have the same problem.

ADD REPLY
0
Entering edit mode
9.2 years ago
Renesh ★ 2.2k

I think you can not prevent it. The transcript contains the multiple exons and all the exons in same transcript will have same XLOC ids in merged.gtf. The only exons remaining in merged.gtf files indirectly represent transcripts. The gene_ids that appear in ind.gtf files is given by user while running cufflink pipeline and it is not important.

If you compare the co-ordinates of transcript in ind.gtf file and exons in merged.gtf, it should cover whole transcript.

ADD COMMENT
0
Entering edit mode

Except the gene_ids are important for my data. I added the gene_ids during cufflinks. If I use a reference .gff file while running cuffmerge the gene_ids are kept, but not when I run without a reference .gff file.

ADD REPLY
0
Entering edit mode

Can you post command used for cufflinks?

ADD REPLY
0
Entering edit mode

Sure! I have basically 6 identical runs of cufflinks with the following command:

cufflinks -p 4 -o Cuffout44 -F 0.01 -u -g OGS2.1uc.gff3 Acyr_2.0/tophat44/accepted_hits.bam
ADD REPLY
0
Entering edit mode

Did you run cuffdiff? In the output from cuffdiff, you should get gene expression files (gene, isoform, cds) and in those files you can find both XLOC and gene_ids in adjacent columns.

ADD REPLY

Login before adding your answer.

Traffic: 2004 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6