Cuffcompare combined.gtf output contains only exons
0
1
Entering edit mode
7.7 years ago
jnoble333 ▴ 20

Hello,

I have gtf files from a pasa run for 96 samples. My intention was to run cuffcompare on all 96 to produce a master gtf file for all samples. Instead when I run it my results contain only transfrags annotated as "exon". My organism is P. trichocarpa. I am calling cuff compare via:

cuffcompare -G -i glist.txt

I have also tried using the reference gtf for my organism but it does not affect the output.

Also the combined gtf is missing many of the sites that are found in 2 or more individuals. For instance all of my input gtf's contain:

Chr01 assembler transcript 1660 2502 . - . gene_id "PASA_cluster_1"; transcript_id "align_id:296215|asmbl_1"; Chr01 assembler exon 1660 2502 . - . gene_id "PASA_cluster_1"; transcript_id "align_id:296215|asmbl_1";

Chr01 assembler transcript 2906 6646 . - . gene_id "PASA_cluster_2"; transcript_id "align_id:296216|asmbl_2"; Chr01 assembler exon 2906 3475 . - . gene_id "PASA_cluster_2"; transcript_id "align_id:296216|asmbl_2"; Chr01 assembler exon 3506 3928 . - . gene_id "PASA_cluster_2"; transcript_id "align_id:296216|asmbl_2"; Chr01 assembler exon 6501 6646 . - . gene_id "PASA_cluster_2"; transcript_id "align_id:296216|asmbl_2";

Yet these sites are not present in the combined gtf. I have also tried running the input gtf's individually against the reference gtf and the combined gtf is still only exon and these sites are still missing (they are present in the reference genome).

My combind.gtf looks like:

Chr01 assembler exon 8371 9365 . + . gene_id "XLOC_000001"; transcript_id "brep1_00000001"; exon_number "1"; oId "align_id:291588|asmbl_7"; tss_id "TSS1"; Chr01 assembler exon 9150 9425 . + . gene_id "XLOC_000001"; transcript_id "brep1_00036183"; exon_number "1"; oId "align_id:308505|asmbl_7"; tss_id "TSS2";

Any ideas as to why I'm only getting exons in the combined output? Thanks.

RNA-Seq • 2.3k views
ADD COMMENT
1
Entering edit mode

What else were you expecting the GTF to contain? The cufflinks suite doesn't output gene and transcript lines, but they are generally redundant as they are simply the union of all the exon entries linked ot the same transcript or gene. If you want regenerate these the tool gtf2gtf from the CGAT package should be able to do the job. De novo transcript assembly tools assemble transcript structures, and do not really have any knowledge of ORFs etc, so you won't see CDS or UTR entries either.

ADD REPLY
0
Entering edit mode

I was expecting the merge and compare output gtf's to follow the same format as a cufflinks output gtf, similar to the input gtf files. I do see your point about the redundancy though. My transcriptome assembly was genome guided. I'll give gtf2gtf a try to get the transcript lines added.

Thanks!

ADD REPLY
0
Entering edit mode

I didn't think that cufflinks output did include transcript and gene lines.

ADD REPLY
0
Entering edit mode

It follows the formate of transcript, exon, exon.... transcript (repeat). It definitely isn't the same as a reference annotation. It does not include a gene field.

ADD REPLY
0
Entering edit mode

This also occurs when using cuffmerge on all of the gtf files.

ADD REPLY

Login before adding your answer.

Traffic: 1998 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6