Question: Yeast GTF file needed for RNA-SeQC run
0
gravatar for Lina F
7 months ago by
Lina F100
Boston, MA
Lina F100 wrote:

Hi all,

I have some yeast RNAseq data and I would like to run RNA-SeQC to get an overview of the quality of the run.

I got both the reference fasta sequence and the GTF file from Ensembl here:

# fasta 
ftp://ftp.ensembl.org/pub/current_fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa.gz

# gtf
ftp://ftp.ensembl.org/pub/current_gtf/saccharomyces_cerevisiae/Saccharomyces_cerevisiae.R64-1-1.86.gtf.gz

However, it seems that there is no rRNA information in the GTF file. When I run RNA-SeQC, I get the following output in my log:

RNA-SeQC v1.1.8.1 07/11/14
Creating rRNA Interval List based on given GTF annotations
Retriving contig names from reference
     contig names in reference: 17
Loading GTF for Read Counting
Converting to refGene
Transcript objects to RefGen format:    0 s
java.lang.RuntimeException: No rRNA found in GTF transcript_type field
    at org.broadinstitute.cga.rnaseq.TranscriptList.toRRNAIntervalList(TranscriptList.java:414)
    at org.broadinstitute.cga.rnaseq.RNASeqMetrics.createRefGeneAndRRNAFiles(RNASeqMetrics.java:1306)
    at org.broadinstitute.cga.rnaseq.RNASeqMetrics.prepareFiles(RNASeqMetrics.java:196)
    at org.broadinstitute.cga.rnaseq.RNASeqMetrics.execute(RNASeqMetrics.java:170)
    at org.broadinstitute.cga.rnaseq.RNASeqMetrics.main(RNASeqMetrics.java:139)
No information for rRNA available. Continuing without rRNA calculations. (Using the -BWArRNA flag for best results)
...

I would like the rRNA information if possible. Does anyone know where to get a GTF file with rRNA information for yeast?

Thanks!

rna-seqc rna-seq yeast qc gtf • 392 views
ADD COMMENTlink modified 7 months ago by apa@stowers320 • written 7 months ago by Lina F100
2
gravatar for apa@stowers
7 months ago by
apa@stowers320
Kansas City
apa@stowers320 wrote:

RNA-SeQC appears to be looking for an annotation field "transcript_type" which does not exist in Ensembl GTFs, Ensembl uses "transcript_biotype" and "gene_biotype".

I think you can use the "transcript.type.field" parameter to specify which GTF field you want to use instead of "transcript_type".

Otherwise, you could run something like "perl -i -pe 's/transcript_biotype/transcript_type/' Saccharomyces_cerevisiae.R64-1-1.86.gtf" to change all instances of "transcript_biotype" to "transcript_type", then RNA-SeQC should recognize it?

ADD COMMENTlink written 7 months ago by apa@stowers320

Ah, this makes sense -- thanks for the help!

ADD REPLYlink written 7 months ago by Lina F100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 490 users visited in the last hour