Generating Gene List From Cufflinks Transcripts
1
3
Entering edit mode
12.3 years ago
Abhi ★ 1.6k

I want to convert the cufflinks output from transcript level to gene level with the exon info for doing the gene expression analysis.

Any method to collapse cufflinks transcripts to genes. I thought I would check before writing code.

Thanks! -Abhi

cufflinks gene • 7.1k views
ADD COMMENT
0
Entering edit mode

genes.fpkm_tracking file should have gene level information

ADD REPLY
0
Entering edit mode

@RM : I probably should have mentioned it but the information in the genes.fpkm_tracking file is not sufficient. It doesn't have the exon info as well as strand of the gene. It contains gene start and end coordinates.

ADD REPLY
0
Entering edit mode

@Abhi: can you give input and possible output so that it will be more clearer...

ADD REPLY
0
Entering edit mode

You should also indicate how you ran Cufflinks. In particular, did you supply a GTF file of known transcripts using the -G or -g option?

ADD REPLY
0
Entering edit mode

I did supply my GTF file of known transcripts but I am not sure if that will make any difference to the question. What I am looking for is a way to collapse the cufflinks transcripts into genes retaining the exon and strand level info Thanks!

ADD REPLY
3
Entering edit mode
12.3 years ago

If you used the -G option and a reference GTF file with Cufflinks, then you should have an expression value for each transcript in this GTF file. The original GTF file should contain transcript-to-gene relationships allowing you to merge multiple transcripts to a single gene. The GTF file should also contain strand info. You may also be able to use the GTF file that is generated during the Cufflinks run and stored in the output directory. Since Cufflinks does merge from the transcript level to the gene level for you, perhaps you can combine the genes.fpkm_tracking file with the GTF to get a single file with expression, strand, and exon level info.

As for exon level info ... This is not clear. Each gene may have multiple transcripts and each transcript has multiple exons. The exons from each transcript may be unique to that transcript, redundant with an exon in one or more additional transcripts, or partially overlapping with an exon in one or more additional transcripts. If you merge to the gene level, what do you mean by maintaining exon level info? Perhaps you can further describe exactly what you want your output file to look like. Do you want one row per exon? Or one row per gene? If so, how would exon information be represented in this file?

For purposes of discussion here is some sample Cufflinks output:


genes.fpkm_tracking

tracking_id class_code  nearest_ref_id  gene_id gene_short_name tss_id  locus   length  coverage    FPKM    FPKM_conf_lo    FPKM_conf_hi    FPKM_status
ENSG00000236601 -   -   ENSG00000236601 ENSG00000236601 -   1:453632-460480 -   -   0   0   0   OK
ENSG00000224813 -   -   ENSG00000224813 ENSG00000224813 -   1:329783-453948 -   -   0.00976477  0   0.0663024   OK

isoforms.fpkm_tracking

tracking_id class_code  nearest_ref_id  gene_id gene_short_name tss_id  locus   length  coverage    FPKM    FPKM_conf_lo    FPKM_conf_hi    FPKM_status
ENST00000450983 -       -       ENSG00000236601 ENSG00000236601 -       1:453632-460480 607     0       0       0       0       OK
ENST00000412666 -       -       ENSG00000236601 ENSG00000236601 -       1:453826-460465 426     0       0       0       0       OK
ENST00000431812 -       -       ENSG00000224813 ENSG00000224813 -       1:329783-334271 336     0.190769        0.00976477      0       0.0292943       OK
ENST00000445840 -       -       ENSG00000224813 ENSG00000224813 -       1:334125-334305 180     1.02218e-07     5.23219e-09     0       0.0413242       OK
ENST00000455207 -       -       ENSG00000224813 ENSG00000224813 -       1:334128-446155 413     3.98904e-15     2.04184e-16     0       0.0167376       OK
ENST00000455464 -       -       ENSG00000224813 ENSG00000224813 -       1:334139-342806 573     1.4205e-13      7.27101e-15     0       0.028762        OK
ENST00000440163 -       -       ENSG00000224813 ENSG00000224813 -       1:439364-453722 462     0       0       0       0       OK
ENST00000453935 -       -       ENSG00000224813 ENSG00000224813 -       1:450886-453942 498     0       0       0       0       OK
ENST00000431321 -       -       ENSG00000224813 ENSG00000224813 -       1:453216-453948 406     0       0       0       0       OK

transcripts.gtf

1   Cufflinks   transcript  453633  460480  1   -   .   gene_id "ENSG00000236601"; transcript_id "ENST00000450983"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1   Cufflinks   exon    453633  454166  1   -   .   gene_id "ENSG00000236601"; transcript_id "ENST00000450983"; exon_number "1"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1   Cufflinks   exon    460408  460480  1   -   .   gene_id "ENSG00000236601"; transcript_id "ENST00000450983"; exon_number "2"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1   Cufflinks   transcript  453827  460465  1   -   .   gene_id "ENSG00000236601"; transcript_id "ENST00000412666"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1   Cufflinks   exon    453827  454166  1   -   .   gene_id "ENSG00000236601"; transcript_id "ENST00000412666"; exon_number "1"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1   Cufflinks   exon    460380  460465  1   -   .   gene_id "ENSG00000236601"; transcript_id "ENST00000412666"; exon_number "2"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1   Cufflinks   transcript  329784  334271  1000    +   .   gene_id "ENSG00000224813"; transcript_id "ENST00000431812"; FPKM "0.0097647655"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0.029294"; cov "0.190769";
1   Cufflinks   exon    329784  329976  1000    +   .   gene_id "ENSG00000224813"; transcript_id "ENST00000431812"; exon_number "1"; FPKM "0.0097647655"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0.029294"; cov "0.190769";
1   Cufflinks   exon    334129  334271  1000    +   .   gene_id "ENSG00000224813"; transcript_id "ENST00000431812"; exon_number "2"; FPKM "0.0097647655"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0.029294"; cov "0.190769";
1   Cufflinks   transcript  334126  334305  1   +   .   gene_id "ENSG00000224813"; transcript_id "ENST00000445840"; FPKM "0.0000000052"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.041324"; cov "0.000000";
1   Cufflinks   exon    334126  334305  1   +   .   gene_id "ENSG00000224813"; transcript_id "ENST00000445840"; exon_number "1"; FPKM "0.0000000052"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.041324"; cov "0.000000";
1   Cufflinks   transcript  334129  446155  1   +   .   gene_id "ENSG00000224813"; transcript_id "ENST00000455207"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.016738"; cov "0.000000";
1   Cufflinks   exon    334129  334297  1   +   .   gene_id "ENSG00000224813"; transcript_id "ENST00000455207"; exon_number "1"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.016738"; cov "0.000000";
1   Cufflinks   exon    439467  439568  1   +   .   gene_id "ENSG00000224813"; transcript_id "ENST00000455207"; exon_number "2"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.016738"; cov "0.000000";
1   Cufflinks   exon    446014  446155  1   +   .   gene_id "ENSG00000224813"; transcript_id "ENST00000455207"; exon_number "3"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.016738"; cov "0.000000";
1   Cufflinks   transcript  334140  342806  1   +   .   gene_id "ENSG00000224813"; transcript_id "ENST00000455464"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.028762"; cov "0.000000";
1   Cufflinks   exon    334140  334297  1   +   .   gene_id "ENSG00000224813"; transcript_id "ENST00000455464"; exon_number "1"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.028762"; cov "0.000000";
1   Cufflinks   exon    342392  342806  1   +   .   gene_id "ENSG00000224813"; transcript_id "ENST00000455464"; exon_number "2"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.028762"; cov "0.000000";
1   Cufflinks   transcript  439365  453722  1   +   .   gene_id "ENSG00000224813"; transcript_id "ENST00000440163"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1   Cufflinks   exon    439365  439568  1   +   .   gene_id "ENSG00000224813"; transcript_id "ENST00000440163"; exon_number "1"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1   Cufflinks   exon    446014  446193  1   +   .   gene_id "ENSG00000224813"; transcript_id "ENST00000440163"; exon_number "2"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1   Cufflinks   exon    453645  453722  1   +   .   gene_id "ENSG00000224813"; transcript_id "ENST00000440163"; exon_number "3"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1   Cufflinks   transcript  450887  453942  1   +   .   gene_id "ENSG00000224813"; transcript_id "ENST00000453935"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1   Cufflinks   exon    450887  451086  1   +   .   gene_id "ENSG00000224813"; transcript_id "ENST00000453935"; exon_number "1"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1   Cufflinks   exon    453645  453942  1   +   .   gene_id "ENSG00000224813"; transcript_id "ENST00000453935"; exon_number "2"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1   Cufflinks   transcript  453217  453948  1   +   .   gene_id "ENSG00000224813"; transcript_id "ENST00000431321"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1   Cufflinks   exon    453217  453318  1   +   .   gene_id "ENSG00000224813"; transcript_id "ENST00000431321"; exon_number "1"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1   Cufflinks   exon    453645  453948  1   +   .   gene_id "ENSG00000224813"; transcript_id "ENST00000431321"; exon_number "2"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
ADD COMMENT

Login before adding your answer.

Traffic: 1645 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6