ORFik: error making Txdb from GTF and fasta files
1
0
Entering edit mode
15 months ago
Estefania ▴ 30

Hello,

I would like to use ORFik to map Ribo-reads to different ORFs in the maize genome. The latest version of the genome is Zm-B73-REFERENCE-NAM-5.0.fa. The annotation file is a GFF3. I have the genome fasta file, the fasta fai file, and the GFF3 file. The ORFik package uses GTF instead of GFF3, so I used gffread to convert my GFF3 to a GTF. Below you can see my code:

# Import packages ----
library("ORFik", lib = "~/Rlibs") # Loads the package
library("GenomicRanges", lib = "~/Rlibs")
library("GenomicFeatures", lib = "~/Rlibs")

# Specify files locations
where_to_save_config <- "~/Documents/R/Ribo-seq/ORFik_config.csv"

parent_folder <- "~/Documents/R/Ribo-seq"
fastq.dir <- file.path(parent_folder, "raw_data")
bam.dir <- file.path(parent_folder, "processed_data")
reference.dir <- file.path(parent_folder, "references")
config.save(where_to_save_config,
            fastq.dir, bam.dir, reference.dir)

# Check ORFik config structure
config()

# Folder structure for uORF experiment
conf <- config.exper(experiment = "uORF_maize", # short experiment info
                     assembly = "Zm-B73-REFERENCE-NAM-5.0", # In reference folder
                     type = c("Ribo-seq")) # fastq and bam type
conf

# Assign local annotation files
gtf <- "/home/R/Ribo-seq/references/Zea_mays.Zm-B73-REFERENCE-NAM-5.0.55.gtf"
genome <- "/home/R/Ribo-seq/references/Zm-B73-REFERENCE-NAM-5.0.fa"
makeTxdbFromGenome(gtf, genome, organism = "Zea mays")

When I run makeTxdbFromGenome(gtf, genome, organism = "Zea mays")

I get the following output:

Making txdb of GTF
Import genomic features from the file as a GRanges object ... Error in .tidy_seqinfo(gr, circ_seqs, chrominfo) : 
  'chrominfo' must describe at least all the chromosomes of the genomic features imported from the file. Chromosomes missing
  from 'chrominfo': 1, 2, 3, 4, 5, 6, 7, 8, 9, 10

Could the conversion have messed up the GTF? I see the chromosome info in both the GFF3 and the GTF files.

Thanks ahead for any guidance.

gff3 gtf ORFik • 863 views
ADD COMMENT
2
Entering edit mode
13 months ago
hauken_heyken ▴ 130

This was answered on email:

First update to github devel version of ORFik (to get all new features):

devtools::install_github("Roleren/ORFik") 
# Restart R after this (in rstudio ctrl + shift + f10)
library(ORFik)

#So your txdb is not properly made, this happen by chance some times.
#So to fix it do this:

txdb <- makeTxdbFromGenome("Path/to/gtf.gtf", "Path/to/fasta_genome.fasta", organism = "Zea mays", optimize = TRUE, return = TRUE)
# You will now get a valid txdb file saved in same directory as gtf called the same as gtf but with a ".db" extension
# gtf and genome must be stored in a folder and fasta genome must have a fasta index
# Organism is scientific name
# optimize gives you files to load annotation much faster (try afterwards: loadRegion(txdb, "leaders"), to see the speedup)
# Now you can detect p-shifts for ribo-seq
detectRibosomeShifts(ear1GR, txdb)
ADD COMMENT
0
Entering edit mode

Thanks again.

ADD REPLY

Login before adding your answer.

Traffic: 2155 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6