Question: GTF to use with RSEM
0
gravatar for victor.gambarini
2.5 years ago by
victor.gambarini30 wrote:

Hi.

I'm trying to use RSEM to calculate gene expression of my RNA-Seq experiment. I have assembled the reads with IDBA-UD and got the transcripts with Prodigal. So, i have a fasta with the contigs from IDBA-UD, another fasta with the transcripts from Prodigal and, also, a GFF generated by Prodigal. I tried using the GFF file in RSEM with no succes, then I converted to a GTF file and it's not working as well. My last attempt was the following:

rsem-prepare-reference contigs.fasta reference_name --gtf prodigal.gtf --bowtie2

My GTF file looks like this:

contig-100_0 Prodigal_v2.6.3 CDS 3 503 10.6 + 0 gene_id "contig-100_0_1"; transcript_id "contig-100_0_1";

contig-100_0 Prodigal_v2.6.3 CDS 507 776 19.2 + 0 gene_id "contig-100_0_2"; transcript_id "contig-100_0_2";

contig-100_0 Prodigal_v2.6.3 CDS 848 1201 37.4 + 0 gene_id "contig-100_0_3"; transcript_id "contig-100_0_3";

contig-100_0 Prodigal_v2.6.3 CDS 1198 1464 44.3 + 0 gene_id "contig-100_0_4"; transcript_id "contig-100_0_4";

contig-100_0 Prodigal_v2.6.3 CDS 1461 1655 10.7 + 0 gene_id "contig-100_0_5"; transcript_id "contig-100_0_5";

And, finally, the error is this:

Parsed 200000 lines

Parsed 400000 lines

The reference contains no transcripts! failed! Plase check if you provide correct parameters/options for the pipeline!

rna-seq • 1.5k views
ADD COMMENTlink modified 2.5 years ago by h.mon28k • written 2.5 years ago by victor.gambarini30
2
gravatar for h.mon
2.5 years ago by
h.mon28k
Brazil
h.mon28k wrote:

My guess is your GTF is not properly formatted, see this thread for two suggestion which may help you:

Currently, RSEM only accepts GTF formatted annotation files, which are admittedly more targeted towards eukaryotic gene annotations in that they require “transcript” and “exon” lines. You could either add those lines in (for each gene, simply add a transcript and exon line with identical start and end coordinates), making sure you adhere to the GTF standard: http://mblab.wustl.edu/GTF22.html

or you could simply extract the sequences of the genes in a multi-fasta file and pass that directly to RSEM, in lieu of a genome+annotation.

It could be easier just using the gene fasta output from Prodigal - at least, this would be my first attempt.

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by h.mon28k

It worked!

I just created a GTF file with the exon and transcript lines and it worked perfectly!

Thank you, h.mon!

ADD REPLYlink written 2.5 years ago by victor.gambarini30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2192 users visited in the last hour