ORFik: error making Txdb from GTF and fasta files
7 weeks ago
Estefania ▴ 10


I would like to use ORFik to map Ribo-reads to different ORFs in the maize genome. The latest version of the genome is Zm-B73-REFERENCE-NAM-5.0.fa. The annotation file is a GFF3. I have the genome fasta file, the fasta fai file, and the GFF3 file. The ORFik package uses GTF instead of GFF3, so I used gffread to convert my GFF3 to a GTF. Below you can see my code:

# Import packages ----
library("ORFik", lib = "~/Rlibs") # Loads the package
library("GenomicRanges", lib = "~/Rlibs")
library("GenomicFeatures", lib = "~/Rlibs")

# Specify files locations
where_to_save_config <- "~/Documents/R/Ribo-seq/ORFik_config.csv"

parent_folder <- "~/Documents/R/Ribo-seq"
fastq.dir <- file.path(parent_folder, "raw_data")
bam.dir <- file.path(parent_folder, "processed_data")
reference.dir <- file.path(parent_folder, "references")
            fastq.dir, bam.dir, reference.dir)

# Check ORFik config structure

# Folder structure for uORF experiment
conf <- config.exper(experiment = "uORF_maize", # short experiment info
                     assembly = "Zm-B73-REFERENCE-NAM-5.0", # In reference folder
                     type = c("Ribo-seq")) # fastq and bam type

# Assign local annotation files
gtf <- "/home/R/Ribo-seq/references/Zea_mays.Zm-B73-REFERENCE-NAM-5.0.55.gtf"
genome <- "/home/R/Ribo-seq/references/Zm-B73-REFERENCE-NAM-5.0.fa"
makeTxdbFromGenome(gtf, genome, organism = "Zea mays")

When I run makeTxdbFromGenome(gtf, genome, organism = "Zea mays")

I get the following output:

Making txdb of GTF
Import genomic features from the file as a GRanges object ... Error in .tidy_seqinfo(gr, circ_seqs, chrominfo) : 
  'chrominfo' must describe at least all the chromosomes of the genomic features imported from the file. Chromosomes missing
  from 'chrominfo': 1, 2, 3, 4, 5, 6, 7, 8, 9, 10

Could the conversion have messed up the GTF? I see the chromosome info in both the GFF3 and the GTF files.

Thanks ahead for any guidance.

gff3 gtf ORFik • 271 views
7 days ago
hauken_heyken ▴ 100

This was answered on email:

First update to github devel version of ORFik (to get all new features):

# Restart R after this (in rstudio ctrl + shift + f10)

#So your txdb is not properly made, this happen by chance some times.
#So to fix it do this:

txdb <- makeTxdbFromGenome("Path/to/gtf.gtf", "Path/to/fasta_genome.fasta", organism = "Zea mays", optimize = TRUE, return = TRUE)
# You will now get a valid txdb file saved in same directory as gtf called the same as gtf but with a ".db" extension
# gtf and genome must be stored in a folder and fasta genome must have a fasta index
# Organism is scientific name
# optimize gives you files to load annotation much faster (try afterwards: loadRegion(txdb, "leaders"), to see the speedup)
# Now you can detect p-shifts for ribo-seq
detectRibosomeShifts(ear1GR, txdb)

