Question: tophat GTF 2.2 format question
2
gravatar for Ann
4.7 years ago by
Ann2.2k
Concord NC USA
Ann2.2k wrote:

I want to run tophat using the -G option:

"Supply TopHat with a set of gene model annotations and/or known transcripts, as a GTF 2.2 or GFF3 formatted file. If this option is provided, TopHat will first extract the transcript sequences and use Bowtie to align reads to this virtual transcriptome first."

This sounds great.

According to this GTF 2.2 spec -  http://mblab.wustl.edu/GTF22.html - a GTF file can use exon or 3UTR or 5UTR features to represent exons. It also includes stuff about start_codon and CDS features. There are also gene and transcript id name-value pairs in the extra features field.

I don't think tophat cares about translations, so I'm guessing it can work just fine if I give it GTF with exon features only. Probably it doesn't need the "gene" extra feature attribute either.

Does anyone know the minimal data tophat needs to align reads onto a virtual transcriptome?

Would this work?

chr1 BLAH  exon         150   200   .   +   .  transcript_id "X";
chr1 BLAH  exon         300   401   .   +   .  transcript_id "X";
chr1 BLAH  exon         501   650   .   +   .  transcript_id "X";
chr1 BLAH  exon         700   800   .   +   .  transcript_id "X";
chr1 BLAH  exon         900  1000   .   +   .  transcript_id "X";

Also, how would I test this?

Does the tophat code contain unit tests I could use to make sure a given GTF file is correctly read?

rna-seq tophat • 1.6k views
ADD COMMENTlink modified 4.5 years ago by Ben Ernest0 • written 4.7 years ago by Ann2.2k

Follow-up: Is the source code hosted publicly or should I just get the source code from the tarball on the tophat site?
 

ADD REPLYlink written 4.7 years ago by Ann2.2k

Is this a really dumb question?

 

ADD REPLYlink written 4.6 years ago by Ann2.2k
0
gravatar for Ben Ernest
4.5 years ago by
Ben Ernest0
United States
Ben Ernest0 wrote:

Maybe you've figured this out.  But I did something similar and it seems to have worked.  I made a gtf file where each feature is a 600bp region of the Arabidopsis chloroplast genome.  I named each feature so I know its location later on down the pipeline.  

I called every feature "protein_coding" and "exon" but I don't know if that matters.

Pt      protein_coding  exon    1       600     .       +       .       exon_number 1; gene_id CPt_1.600.pos; gene_name CPt_1.600.pos; seqedit false; transcript_id CPt_1.600.pos.1; transcript_name CPt_1.600.pos; tss_id CPt_1.600.pos

Pt      protein_coding  exon    1       600     .       -       .       exon_number 1; gene_id CPt_1.600.neg; gene_name CPt_1.600.neg; seqedit false; transcript_id CPt_1.600.neg.1; transcript_name CPt_1.600.neg; tss_id CPt_1.600.neg

Pt      protein_coding  exon    601     1200    .       +       .       exon_number 1; gene_id CPt_601.1200.pos; gene_name CPt_601.1200.pos; seqedit false; transcript_id CPt_601.1200.pos.1; transcript_name CPt_601.1200.pos; tss_id CPt_601.1200.pos
Pt      protein_coding  exon    601     1200    .       -       .       exon_number 1; gene_id CPt_601.1200.neg; gene_name CPt_601.1200.neg; seqedit false; transcript_id CPt_601.1200.neg.1; transcript_name CPt_601.1200.neg; tss_id CPt_601.1200.neg

Pt      protein_coding  exon    1201    1800    .       +       .       exon_number 1; gene_id CPt_1201.1800.pos; gene_name CPt_1201.1800.pos; seqedit false; tr anscript_id CPt_1201.1800.pos.1; transcript_name CPt_1201.1800.pos; tss_id CPt_1201.1800.pos

Pt      protein_coding  exon    1201    1800    .       -       .       exon_number 1; gene_id CPt_1201.1800.neg; gene_name CPt_1201.1800.neg; seqedit false; transcript_id CPt_1201.1800.neg.1; transcript_name CPt_1201.1800.neg; tss_id CPt_1201.1800.neg

So I would say the minimal information Tophat needs is a genome or chromosome and if supplied, a gtf file with valid coordinates.

By the way, I know you!  I'm Ben, a student in the UT-Knoxville GST program, and I came to your workshop on metabolomics and RNA-seq a couple of years ago.  

 

 

ADD COMMENTlink modified 4.5 years ago • written 4.5 years ago by Ben Ernest0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 835 users visited in the last hour