GTF to use with RSEM
1
0
Entering edit mode
6.9 years ago

Hi.

I'm trying to use RSEM to calculate gene expression of my RNA-Seq experiment. I have assembled the reads with IDBA-UD and got the transcripts with Prodigal. So, i have a fasta with the contigs from IDBA-UD, another fasta with the transcripts from Prodigal and, also, a GFF generated by Prodigal. I tried using the GFF file in RSEM with no succes, then I converted to a GTF file and it's not working as well. My last attempt was the following:

rsem-prepare-reference contigs.fasta reference_name --gtf prodigal.gtf --bowtie2

My GTF file looks like this:

contig-100_0 Prodigal_v2.6.3 CDS 3 503 10.6 + 0 gene_id "contig-100_0_1"; transcript_id "contig-100_0_1";

contig-100_0 Prodigal_v2.6.3 CDS 507 776 19.2 + 0 gene_id "contig-100_0_2"; transcript_id "contig-100_0_2";

contig-100_0 Prodigal_v2.6.3 CDS 848 1201 37.4 + 0 gene_id "contig-100_0_3"; transcript_id "contig-100_0_3";

contig-100_0 Prodigal_v2.6.3 CDS 1198 1464 44.3 + 0 gene_id "contig-100_0_4"; transcript_id "contig-100_0_4";

contig-100_0 Prodigal_v2.6.3 CDS 1461 1655 10.7 + 0 gene_id "contig-100_0_5"; transcript_id "contig-100_0_5";

And, finally, the error is this:

Parsed 200000 lines

Parsed 400000 lines

The reference contains no transcripts! failed! Plase check if you provide correct parameters/options for the pipeline!

RNA-Seq • 3.7k views
ADD COMMENT
2
Entering edit mode
6.9 years ago
h.mon 35k

My guess is your GTF is not properly formatted, see this thread for two suggestion which may help you:

Currently, RSEM only accepts GTF formatted annotation files, which are admittedly more targeted towards eukaryotic gene annotations in that they require “transcript” and “exon” lines. You could either add those lines in (for each gene, simply add a transcript and exon line with identical start and end coordinates), making sure you adhere to the GTF standard: http://mblab.wustl.edu/GTF22.html

or you could simply extract the sequences of the genes in a multi-fasta file and pass that directly to RSEM, in lieu of a genome+annotation.

It could be easier just using the gene fasta output from Prodigal - at least, this would be my first attempt.

ADD COMMENT
0
Entering edit mode

It worked!

I just created a GTF file with the exon and transcript lines and it worked perfectly!

Thank you, h.mon!

ADD REPLY

Login before adding your answer.

Traffic: 2357 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6