Finding Novel Splicing Events/Transcripts Using Tophat And Cufflinks
1
2
Entering edit mode
11.1 years ago
Sahel ▴ 310

Hi There,

I would like to identify novel splicing events occurring in 4 human paired-end RNA-seq samples. From literature I figured tophat and cufflink can do such thing. So I used Tophat to assemble and Cufflink to find all transcripts. Next I used cuffcompare to identify novel transcripts from known ones (using gene.gtf downloaded from UCSC table browser). And then I got the ones with class_code = "j", which according to manual should be novel.

So now I am left with a list like this:

chr1    Cufflinks       exon    885636  886043  .       +       .       gene_id "XLOC_000025"; transcript_id "TCONS_00000033"; exon_number "1"; gene_name "ENST00000379410"; oId "CUFF.20.1"; nearest_ref "ENST00000379410"; class_code "j"; tss_id "TSS33";
chr1    Cufflinks       exon    886536  887714  .       +       .       gene_id "XLOC_000025"; transcript_id "TCONS_00000033"; exon_number "2"; gene_name "ENST00000379410"; oId "CUFF.20.1"; nearest_ref "ENST00000379410"; class_code "j"; tss_id "TSS33";
chr1    Cufflinks       exon    887947  888496  .       +       .       gene_id "XLOC_000025"; transcript_id "TCONS_00000033"; exon_number "3"; gene_name "ENST00000379410"; oId "CUFF.20.1"; nearest_ref "ENST00000379410"; class_code "j"; tss_id "TSS33";
chr1    Cufflinks       exon    888580  888747  .       +       .       gene_id "XLOC_000025"; transcript_id "TCONS_00000033"; exon_number "4"; gene_name "ENST00000379410"; oId "CUFF.20.1"; nearest_ref "ENST00000379410"; class_code "j"; tss_id "TSS33";
chr1    Cufflinks       exon    889163  889251  .       +       .       gene_id "XLOC_000025"; transcript_id "TCONS_00000033"; exon_number "5"; gene_name "ENST00000379410"; oId "CUF

But I do not know how to interpret them? How I can get the sequence of the novel transcripts? Is there a way to figure what happened that result in a novel junction? like if it is an insertion, deletion and if cause frameshift?

This is my first time doing such analysis, any help would be greatly appreciated, :-)

Sahel

cufflinks tophat • 5.2k views
ADD COMMENT
1
Entering edit mode
11.1 years ago

You can get the sequences of the novel transcripts (or the exons, to be more precise) with a tool like BEDTools ("bedtools getfasta"). It can accept BED, GFF and other formats and might work straight away on your example, which looks like GFF.

For the other questions, I don't have as straightforward a reply. I would import the regions to some genome browser like UCSC Genome Browser or IGV, compare them to existing annotation and take it from there.

ADD COMMENT

Login before adding your answer.

Traffic: 1993 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6