Question: Yeast GTF file needed for RNA-SeQC run
0
gravatar for Lina F
10 weeks ago by
Lina F80
Boston, MA
Lina F80 wrote:

Hi all,

I have some yeast RNAseq data and I would like to run RNA-SeQC to get an overview of the quality of the run.

I got both the reference fasta sequence and the GTF file from Ensembl here:

# fasta 
ftp://ftp.ensembl.org/pub/current_fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa.gz

# gtf
ftp://ftp.ensembl.org/pub/current_gtf/saccharomyces_cerevisiae/Saccharomyces_cerevisiae.R64-1-1.86.gtf.gz

However, it seems that there is no rRNA information in the GTF file. When I run RNA-SeQC, I get the following output in my log:

RNA-SeQC v1.1.8.1 07/11/14
Creating rRNA Interval List based on given GTF annotations
Retriving contig names from reference
     contig names in reference: 17
Loading GTF for Read Counting
Converting to refGene
Transcript objects to RefGen format:    0 s
java.lang.RuntimeException: No rRNA found in GTF transcript_type field
    at org.broadinstitute.cga.rnaseq.TranscriptList.toRRNAIntervalList(TranscriptList.java:414)
    at org.broadinstitute.cga.rnaseq.RNASeqMetrics.createRefGeneAndRRNAFiles(RNASeqMetrics.java:1306)
    at org.broadinstitute.cga.rnaseq.RNASeqMetrics.prepareFiles(RNASeqMetrics.java:196)
    at org.broadinstitute.cga.rnaseq.RNASeqMetrics.execute(RNASeqMetrics.java:170)
    at org.broadinstitute.cga.rnaseq.RNASeqMetrics.main(RNASeqMetrics.java:139)
No information for rRNA available. Continuing without rRNA calculations. (Using the -BWArRNA flag for best results)
...

I would like the rRNA information if possible. Does anyone know where to get a GTF file with rRNA information for yeast?

Thanks!

rna-seqc rna-seq yeast qc gtf • 219 views
ADD COMMENTlink modified 10 weeks ago by apa@stowers320 • written 10 weeks ago by Lina F80
2
gravatar for apa@stowers
10 weeks ago by
apa@stowers320
Kansas City
apa@stowers320 wrote:

RNA-SeQC appears to be looking for an annotation field "transcript_type" which does not exist in Ensembl GTFs, Ensembl uses "transcript_biotype" and "gene_biotype".

I think you can use the "transcript.type.field" parameter to specify which GTF field you want to use instead of "transcript_type".

Otherwise, you could run something like "perl -i -pe 's/transcript_biotype/transcript_type/' Saccharomyces_cerevisiae.R64-1-1.86.gtf" to change all instances of "transcript_biotype" to "transcript_type", then RNA-SeQC should recognize it?

ADD COMMENTlink written 10 weeks ago by apa@stowers320

Ah, this makes sense -- thanks for the help!

ADD REPLYlink written 10 weeks ago by Lina F80
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2022 users visited in the last hour