Question: Finding Novel Splicing Events/Transcripts Using Tophat And Cufflinks
2
gravatar for Sahel
6.1 years ago by
Sahel250
CANADA
Sahel250 wrote:

Hi There,

I would like to identify novel splicing events occurring in 4 human paired-end RNA-seq samples. From literature I figured tophat and cufflink can do such thing. So I used Tophat to assemble and Cufflink to find all transcripts. Next I used cuffcompare to identify novel transcripts from known ones (using gene.gtf downloaded from UCSC table browser). And then I got the ones with class_code = "j", which according to manual should be novel.

So now I am left with a list like this:

chr1    Cufflinks       exon    885636  886043  .       +       .       gene_id "XLOC_000025"; transcript_id "TCONS_00000033"; exon_number "1"; gene_name "ENST00000379410"; oId "CUFF.20.1"; nearest_ref "ENST00000379410"; class_code "j"; tss_id "TSS33";
chr1    Cufflinks       exon    886536  887714  .       +       .       gene_id "XLOC_000025"; transcript_id "TCONS_00000033"; exon_number "2"; gene_name "ENST00000379410"; oId "CUFF.20.1"; nearest_ref "ENST00000379410"; class_code "j"; tss_id "TSS33";
chr1    Cufflinks       exon    887947  888496  .       +       .       gene_id "XLOC_000025"; transcript_id "TCONS_00000033"; exon_number "3"; gene_name "ENST00000379410"; oId "CUFF.20.1"; nearest_ref "ENST00000379410"; class_code "j"; tss_id "TSS33";
chr1    Cufflinks       exon    888580  888747  .       +       .       gene_id "XLOC_000025"; transcript_id "TCONS_00000033"; exon_number "4"; gene_name "ENST00000379410"; oId "CUFF.20.1"; nearest_ref "ENST00000379410"; class_code "j"; tss_id "TSS33";
chr1    Cufflinks       exon    889163  889251  .       +       .       gene_id "XLOC_000025"; transcript_id "TCONS_00000033"; exon_number "5"; gene_name "ENST00000379410"; oId "CUF

But I do not know how to interpret them? How I can get the sequence of the novel transcripts? Is there a way to figure what happened that result in a novel junction? like if it is an insertion, deletion and if cause frameshift?

This is my first time doing such analysis, any help would be greatly appreciated, :-)

Sahel

tophat cufflinks • 4.1k views
ADD COMMENTlink modified 4.8 years ago by Ann2.2k • written 6.1 years ago by Sahel250
1
gravatar for Mikael Huss
6.1 years ago by
Mikael Huss4.6k
Stockholm
Mikael Huss4.6k wrote:

You can get the sequences of the novel transcripts (or the exons, to be more precise) with a tool like BEDTools ("bedtools getfasta"). It can accept BED, GFF and other formats and might work straight away on your example, which looks like GFF.

For the other questions, I don't have as straightforward a reply. I would import the regions to some genome browser like UCSC Genome Browser or IGV, compare them to existing annotation and take it from there.

ADD COMMENTlink written 6.1 years ago by Mikael Huss4.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 892 users visited in the last hour