Question: Cuffmerge drops gene_ids and transcript features?
3
gravatar for m.e.chaffee
4.6 years ago by
m.e.chaffee30
United States
m.e.chaffee30 wrote:

Hello All, 

I have run cufflinks on 6 different samples. I want to use cuffmerge to combine the .gtf files for cuffdiff. However, I have two problems. 

ex. ind.gtf: 
GL349621.1 Cufflinks transcript 163188 166581 1000 - . gene_id "ACYPI52640"; transcript_id "ACYPI52640-RA"; FPKM "16.6471765920"; frac "0.641012"; conf_lo "15.699389"; conf_hi "17.594964"; cov "133.457105"; full_read_support "yes";
GL349621.1 Cufflinks exon 163188 164381 1000 - . gene_id "ACYPI52640"; transcript_id "ACYPI52640-RA"; exon_number "1"; FPKM "16.6471765920"; frac "0.641012"; conf_lo "15.699389"; conf_hi "17.594964"; cov "133.457105";


merged.gtf: 
GL349621.1 Cufflinks exon 159336 159819 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "1"; oId "CUFF.3.1"; tss_id "TSS1";
GL349621.1 Cufflinks exon 159902 160013 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "2"; oId "CUFF.3.1"; tss_id "TSS1";


1) In the third column of the ind.gtf files there are exons and transcript. When I run cuffmerge the transcript lines are not there? Only exon features are left. 

2) In the ind. gtf files I have gene_ids, but when my files are merged cuffmerge seems to remove the gene_id's and replace them with the XLOC numbers. Is there any way to prevent this from happening?

cuffmerge rna-seq • 2.5k views
ADD COMMENTlink modified 4.6 years ago by Renesh1.6k • written 4.6 years ago by m.e.chaffee30
1

Were you able to fix this? I have the same problem.

ADD REPLYlink written 3.0 years ago by rborgesm10
0
gravatar for Renesh
4.6 years ago by
Renesh1.6k
United States
Renesh1.6k wrote:

I think you can not prevent it. The transcript contains the multiple exons and all the exons in same transcript will have same XLOC ids in merged.gtf. The only exons remaining in merged.gtf files indirectly represent transcripts.  The gene_ids that appear in ind.gtf files is given by user while running cufflink pipeline and it is not important.

If you compare the co-ordinates of transcript in ind.gtf file and exons in merged.gtf, it should cover whole transcript.

ADD COMMENTlink written 4.6 years ago by Renesh1.6k

Except the gene_ids are important for my data. I added the gene_ids during cufflinks. If I use a reference .gff file while running cuffmerge the gene_ids are kept, but not when I run without a reference .gff file.

ADD REPLYlink written 4.6 years ago by m.e.chaffee30

can you post command used for cufflink?

ADD REPLYlink written 4.6 years ago by Renesh1.6k

Sure! I have basically 6 identical runs of cufflinks with the following command:  

cufflinks -p 4  -o Cuffout44 -F 0.01 -u -g OGS2.1uc.gff3 Acyr_2.0/tophat44/accepted_hits.bam

ADD REPLYlink written 4.6 years ago by m.e.chaffee30

Did you run cuffdiff? In the output from cuffdiff, you should get gene expression files (gene, isoform, cds) and  in those files you can find both XLOC and gene_ids in adjacent columns.

ADD REPLYlink written 4.6 years ago by Renesh1.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1977 users visited in the last hour