I would like to use ORFik to determine the coverage of the different ORFs across the maize genome. I have ribo-seq data, the latest annotation file (a GFF3), and the v5 genome fasta file for B73.
After running my code, three Large CompressedGRangesLists are created and none of them have any values in seqlengths. The lack of values in seqlengths seems to be causing the error. I downloaded the GFF3 from ensemblPlants, so I expected it to work. Can I manually modify the Large CompressedGRangesLists directly from the GFF3 file?
Here is my code (the error messages I get are right after):
# Import packages ---- library("ORFik", lib = "~/Rlibs") # Loads the package library("GenomicRanges", lib = "~/Rlibs") library("GenomicFeatures", lib = "~/Rlibs") # Specify files locations #where_to_save_config <- "~/Bio_data/ORFik_config.csv" #parent_folder <- "~/Bio_data" bams <- "~/Documents/R/Ribo-seq/processed_data/" gtf <- "~/Documents/R/Ribo-seq/references/Zea_mays.Zm-B73-REFERENCE-NAM-5.0.55.chr.gtf" genome <- "~/Documents/R/Ribo-seq/references/Zm-B73-REFERENCE-NAM-5.0.fa" exper.name <- "ORF_maize" # Create the experiment: template <- create.experiment(dir = bams, # directory of the NGS files for the experiment exper.name, # Experiment name txdb = gtf, # gtf / gff / gff.db annotation fa = genome, # Fasta genome organism = "Zea mays", # Scientific naming types = "bam", stage = c("V12","V12","14d","14d","14d","14d"), rep = c("1","2","1","2","1","2"), condition = c("ear1", "ear2", "leaf1", "leaf2", "root1", "root2"), fraction = c("OTHER","OTHER","OTHER","OTHER","OTHER","OTHER"), saveDir = NULL, # Create template instead of ready experiment ) df <- read.experiment(template)# read experiment from template save.experiment(df, file = "~/Bio_data/ORFik_config.csv") df # PATH to bam files filepath(df, type = "default") # Loading NGS data to a specified environment envExp(df) #This will be the environment # Determining the library names in the ORFik experiment bamVarName(df) #This will be the names # Auto-loading the libraries to the environment outputLibs(df) # With default output.mode = "envir". # Loading genomic annotations txdb <- loadTxdb(df) # transcript annotation # Make 100-bases-sized meta window for each library in experiment transcriptWindow(leaders, cds, trailers, df, outdir = "~/Bio_data/", windowSize = 100) shiftFootprintsByExperiment(df)
I get two error messages because of the same reason (I believe). I get this one when I run
transcriptWindow(leaders, cds, trailers, df, outdir = "~/Bio_data/", windowSize = 100) /home/R/Ribo-seq/processed_data/ear2.unique.bam Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... OK Sorting shifted footprints... Error: BiocParallel errors 4 remote errors, element index: 1, 3, 4, 5 1 unevaluated and other errors first remote error: Error in covRleFromGR(x, weight = weight, ignore.strand = ignore.strand): Seqlengths of x contains NA values!
I get this error when I run
shiftFootprintsByExperiment(df) Shifting reads in experiment: ORF_maize Saving ofst files to: /home/R/Ribo-seq/processed_data/pshifted/ Saving wig files to: /home/R/Ribo-seq/processed_data/pshifted/ Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... OK /home/R/Ribo-seq/processed_data/ear1.unique.bam Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... OK Sorting shifted footprints... RFP_ear1_V12_r1 Error in x$.self$finalize() : attempt to apply non-function In addition: Warning message: In .merge_two_Seqinfo_objects(x, y) : The 2 combined objects have no sequence levels in common. (Use suppressWarnings() to suppress this warning.) RFP_ear2_V12_r2 RFP_leaf1_14d_r1 RFP_leaf2_14d_r2 Error: BiocParallel errors 0 remote errors, element index: 2 unevaluated and other errors first remote error:
Thank you for your help!