Question: Rsubread: ERROR: cannot finish the SAM file
0
gravatar for TheCheshireCat
15 months ago by
TheCheshireCat0 wrote:

Hello everyone, I want to get the read counts for RNA-Seq data by using the package Rsubread. I got my RNA-Seq data from sra and all other files (genes, genome,...) from NCBI assembly. I tried the following code on 4 different organisms and for 2 it worked perfectly fine but for the other two I got the following error:

ERROR: cannot finish the SAM file! Please check the disk space in the output directory. No output file was generated.

Fehler in .load.delete.summary(output_file[i]) : ERROR: Summary file Bacillus_amyloliquefaciens_samfile.sam.summary was not generated! The program terminated wrongly!

There is enough disk space available. Here is my code:

rna_counts <- function(path, organism, genome_gff, genome_fna, feature_table, reads_1, reads_2=NULL){
  require(Rsubread)
  require(rtracklayer)
  require(biomaRt)
  RNAseqDATADIR <- path
  setwd(RNAseqDATADIR)
  dir(RNAseqDATADIR)

  REF_GENOME <- paste0("ref_data/",genome_fna)
  RSUBREAD_INDEX_PATH <- "ref_data"
  RSUBREAD_INDEX_BASE <- organism
  buildindex(basename=file.path(RSUBREAD_INDEX_PATH,RSUBREAD_INDEX_BASE), reference=REF_GENOME)

  inputfilefwd <- file.path(RNAseqDATADIR, paste0("raw_data/", reads_1))
  if (!is.na(reads_2)) inputfilervs <- file.path(RNAseqDATADIR, paste0("/raw_data/", reads_2))
  else inputfilervs <- NA

  align(index=file.path(RSUBREAD_INDEX_PATH,RSUBREAD_INDEX_BASE), readfile1=inputfilefwd, readfile2=inputfilervs, output_file=paste0(organism,"_samfile.sam"), output_format="SAM",nthreads = 4)

  outputsamfile <- paste0(path,"/", organism, "_samfile.sam")
  propmap <- propmapped(outputsamfile)

  feature_table <- read.table(paste0(path,"/ref_data/",feature_table),header=TRUE,sep = c("\t","\n"))
  GENOME_GFF <- readGFF(paste0(path,"/ref_data/",genome_gff))

  saf_file = gff_to_saf(GENOME_GFF)       # own function for creating a saf file
  if is.na(reads_2)) paired = FALSE else paired = TRUE
  feature_counts <-featureCounts(outputsamfile, annot.ext=saf_file, isPairedEnd=paired)
  return(feature_counts)
}

Input:

genome_gff  = "GCA_000221645.1_ASM22164v1_genomic.gff"

genome_fna = "GCA_000221645.1_ASM22164v1_genomic.fna"

feature_table = GCA_000221645.1_ASM22164v1_feature_table.txt"

reads_1 =   "SRR1916792_1.fastq"

reads_2="SRR1916792_2.fastq"

During the calculations there were also following print outs which look a bit buggy:

|| 311% completed, 17 mins elapsed, rate=7,8k fragments per second ||
|| 363% completed, 17 mins elapsed, rate=8,8k fragments per second ||
||365% completed, 18 mins elapsed, rate=8,7k fragments per second ||

Thanks for your help!

rna-seq rsubread • 491 views
ADD COMMENTlink modified 2 days ago by mariana31310 • written 15 months ago by TheCheshireCat0
1

Since you have enough space, I wonder if the problem is that something is wrong with your output directory. Maybe its not being named properly?

ADD REPLYlink written 15 months ago by swbarnes27.4k

It is correct since there is a temporary SAM file in the directory while it is calculating and it works till a cerain point where it throws the error message. For the one organism it was always at 51% (tried it multiple times) and for the other 365% (which is a bit strange).

ADD REPLYlink written 15 months ago by TheCheshireCat0

Hi! I hope you were able to figure out what was going on. Can I ask what your feature table input is? Is it like the metadata file? I'm learning this type of analysis and would love to understand this. Thanks a lot!

ADD REPLYlink written 2 days ago by mariana31310
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 849 users visited in the last hour