Question: Cuffcompare combined.gtf output contains only exons
1
gravatar for jnoble333
2.8 years ago by
jnoble33320
jnoble33320 wrote:

Hello,

I have gtf files from a pasa run for 96 samples. My intention was to run cuffcompare on all 96 to produce a master gtf file for all samples. Instead when I run it my results contain only transfrags annotated as "exon". My organism is P. trichocarpa. I am calling cuff compare via:

cuffcompare -G -i glist.txt

I have also tried using the reference gtf for my organism but it does not affect the output.

Also the combined gtf is missing many of the sites that are found in 2 or more individuals. For instance all of my input gtf's contain:

Chr01 assembler transcript 1660 2502 . - . gene_id "PASA_cluster_1"; transcript_id "align_id:296215|asmbl_1"; Chr01 assembler exon 1660 2502 . - . gene_id "PASA_cluster_1"; transcript_id "align_id:296215|asmbl_1";

Chr01 assembler transcript 2906 6646 . - . gene_id "PASA_cluster_2"; transcript_id "align_id:296216|asmbl_2"; Chr01 assembler exon 2906 3475 . - . gene_id "PASA_cluster_2"; transcript_id "align_id:296216|asmbl_2"; Chr01 assembler exon 3506 3928 . - . gene_id "PASA_cluster_2"; transcript_id "align_id:296216|asmbl_2"; Chr01 assembler exon 6501 6646 . - . gene_id "PASA_cluster_2"; transcript_id "align_id:296216|asmbl_2";

Yet these sites are not present in the combined gtf. I have also tried running the input gtf's individually against the reference gtf and the combined gtf is still only exon and these sites are still missing (they are present in the reference genome).

My combind.gtf looks like:

Chr01 assembler exon 8371 9365 . + . gene_id "XLOC_000001"; transcript_id "brep1_00000001"; exon_number "1"; oId "align_id:291588|asmbl_7"; tss_id "TSS1"; Chr01 assembler exon 9150 9425 . + . gene_id "XLOC_000001"; transcript_id "brep1_00036183"; exon_number "1"; oId "align_id:308505|asmbl_7"; tss_id "TSS2";

Any ideas as to why I'm only getting exons in the combined output? Thanks.

rna-seq • 1.0k views
ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by jnoble33320
1

What else were you expecting the GTF to contain? The cufflinks suite doesn't output gene and transcript lines, but they are generally redundant as they are simply the union of all the exon entries linked ot the same transcript or gene. If you want regenerate these the tool gtf2gtf from the CGAT package should be able to do the job. De novo transcript assembly tools assemble transcript structures, and do not really have any knowledge of ORFs etc, so you won't see CDS or UTR entries either.

ADD REPLYlink written 2.8 years ago by i.sudbery4.4k

I was expecting the merge and compare output gtf's to follow the same format as a cufflinks output gtf, similar to the input gtf files. I do see your point about the redundancy though. My transcriptome assembly was genome guided. I'll give gtf2gtf a try to get the transcript lines added.

Thanks!

ADD REPLYlink written 2.8 years ago by jnoble33320

I didn't think that cufflinks output did include transcript and gene lines.

ADD REPLYlink written 2.8 years ago by i.sudbery4.4k

It follows the formate of transcript, exon, exon.... transcript (repeat). It definitely isn't the same as a reference annotation. It does not include a gene field.

ADD REPLYlink written 2.8 years ago by jnoble33320

This also occurs when using cuffmerge on all of the gtf files.

ADD REPLYlink written 2.8 years ago by jnoble33320
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 939 users visited in the last hour