Problems aboout Cufflinks results
1
0
Entering edit mode
9.2 years ago
512788522 ▴ 20

Hi!

I have used Tophat and Cufflinks to analyse RNA-seq data. I run the following command:

tophat2 -p 5 -o ./tophat  --library-type fr-firststrand -G ./genes.gtf ./ucsc.hg19 \
    2014-2194_141118_SN484_0322_AC5K6UACXX_6_1.fq.gz \
    2014-2194_141118_SN484_0322_AC5K6UACXX_6_2.fq.gz
cufflinks -p 5 -u -g ./genes.gtf -o ./cufflinks ./tophat/accepted_hits.bam

and Cufflinks generated a GTF file-transcripts.gtf, it looks like:

chr1    Cufflinks    transcript    34611    36081    1    -    .    gene_id "FAM138A"; transcript_id "NR_026818_1"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000"; full_read_support "no";
chr1    Cufflinks    exon    34611    35174    1    -    .    gene_id "FAM138A"; transcript_id "NR_026818_1"; exon_number "1"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
chr1    Cufflinks    exon    35277    35481    1    -    .    gene_id "FAM138A"; transcript_id "NR_026818_1"; exon_number "2"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
chr1    Cufflinks    exon    35721    36081    1    -    .    gene_id "FAM138A"; transcript_id "NR_026818_1"; exon_number "3"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
chr1    Cufflinks    transcript    134773    140566    1000    -    .    gene_id "CUFF.31"; transcript_id "NR_039983"; FPKM "0.7545033517"; frac "1.000000"; conf_lo "0.651984"; conf_hi "0.857022"; cov "5.628816"; full_read_support "yes";
chr1    Cufflinks    exon    134773    139696    1000    -    .    gene_id "CUFF.31"; transcript_id "NR_039983"; exon_number "1"; FPKM "0.7545033517"; frac "1.000000"; conf_lo "0.651984"; conf_hi "0.857022"; cov "5.628816";
chr1    Cufflinks    exon    139790    139847    1000    -    .    gene_id "CUFF.31"; transcript_id "NR_039983"; exon_number "2"; FPKM "0.7545033517"; frac "1.000000"; conf_lo "0.651984"; conf_hi "0.857022"; cov "5.628816";
chr1    Cufflinks    exon    140075    140566    1000    -    .    gene_id "CUFF.31"; transcript_id "NR_039983"; exon_number "3"; FPKM "0.7545033517"; frac "1.000000"; conf_lo "0.651984"; conf_hi "0.857022"; cov "5.628816";
chr1    Cufflinks    transcript    732240    735831    1000    -    .    gene_id "CUFF.53"; transcript_id "CUFF.53.1"; FPKM "0.2604531606"; frac "1.000000"; conf_lo "0.186299"; conf_hi "0.334607"; cov "2.247511"; full_read_support "yes";
chr1    Cufflinks    exon    732240    735831    1000    -    .    gene_id "CUFF.53"; transcript_id "CUFF.53.1"; exon_number "1"; FPKM "0.2604531606"; frac "1.000000"; conf_lo "0.186299"; conf_hi "0.334607"; cov "2.247511";
chr1    Cufflinks    transcript    749660    751452    1000    -    .    gene_id "CUFF.6"; transcript_id "CUFF.6.1"; FPKM "0.2508481299"; frac "1.000000"; conf_lo "0.145715"; conf_hi "0.355981"; cov "2.160173"; full_read_support "yes";
chr1    Cufflinks    exon    749660    751452    1000    -    .    gene_id "CUFF.6"; transcript_id "CUFF.6.1"; exon_number "1"; FPKM "0.2508481299"; frac "1.000000"; conf_lo "0.145715"; conf_hi "0.355981"; cov "2.160173";
chr1    Cufflinks    transcript    751542    752795    1000    -    .    gene_id "CUFF.8"; transcript_id "CUFF.8.1"; FPKM "0.4704529712"; frac "0.426471"; conf_lo "0.290214"; conf_hi "0.650692"; cov "4.324710"; full_read_support "yes";
chr1    Cufflinks    exon    751542    752795    1000    -    .    gene_id "CUFF.8"; transcript_id "CUFF.8.1"; exon_number "1"; FPKM "0.4704529712"; frac "0.426471"; conf_lo "0.290214"; conf_hi "0.650692"; cov "4.324710";
chr1    Cufflinks    transcript    755134    756272    1000    -    .    gene_id "CUFF.9"; transcript_id "CUFF.9.1"; FPKM "0.3258973591"; frac "0.264706"; conf_lo "0.168058"; conf_hi "0.483737"; cov "2.995861"; full_read_support "yes";
chr1    Cufflinks    exon    755134    756272    1000    -    .    gene_id "CUFF.9"; transcript_id "CUFF.9.1"; exon_number "1"; FPKM "0.3258973591"; frac "0.264706"; conf_lo "0.168058"; conf_hi "0.483737"; cov "2.995861";

In transcripts.gtf file, in some line, gene_id was "CUFF" and the transcript_id was the same as the annotation file, but both gene_ids and transcript_ids others were the form of "CUFF" in some lines, and both were the same as the annotation file in other lines(i.e. genes.gtf). Why? What does the "CUFF*" mean? And I should how to interpret Cufflinks results correctly.

In addition, I also noted that when the gene_id was "CUFF*" and the transcript_id was the same as the annotation file(such as

chr1    Cufflinks    exon    134773    139696    1000    -    .    gene_id "CUFF.31"; transcript_id "NR_039983"; exon_number "1"; FPKM "0.7545033517"; frac "1.000000"; conf_lo "0.651984"; conf_hi "0.857022"; cov "5.628816";

the start and end position(i.e. 134773 and 139693, respectively) were the same as the annotation file.

Thanks!

RNA-seq • 2.7k views
ADD COMMENT
0
Entering edit mode
9.2 years ago
Vishaka Datta ▴ 100

I'm fairly new to Cufflinks myself. From what I've seen, a newly discovered transcript or exon ends up being tagged as CUFF.*

In the example given by you :

chr1    Cufflinks    exon    134773    139696    1000    -    .    gene_id "CUFF.31"; transcript_id "NR_039983"; exon_number "1"; FPKM "0.7545033517"; frac "1.000000"; conf_lo "0.651984"; conf_hi "0.857022"; cov "5.628816"

this corresponds to a new exon, whose boundary (134774-139696) differs from the exon in the annotation (134774-139693).

ADD COMMENT

Login before adding your answer.

Traffic: 2177 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6