Rsubread: ERROR: cannot finish the SAM file
0
0
Entering edit mode
5.5 years ago

Hello everyone, I want to get the read counts for RNA-Seq data by using the package Rsubread. I got my RNA-Seq data from sra and all other files (genes, genome,...) from NCBI assembly. I tried the following code on 4 different organisms and for 2 it worked perfectly fine but for the other two I got the following error:

ERROR: cannot finish the SAM file! Please check the disk space in the output directory. No output file was generated.

Fehler in .load.delete.summary(output_file[i]) : ERROR: Summary file Bacillus_amyloliquefaciens_samfile.sam.summary was not generated! The program terminated wrongly!

There is enough disk space available. Here is my code:

rna_counts <- function(path, organism, genome_gff, genome_fna, feature_table, reads_1, reads_2=NULL){
  require(Rsubread)
  require(rtracklayer)
  require(biomaRt)
  RNAseqDATADIR <- path
  setwd(RNAseqDATADIR)
  dir(RNAseqDATADIR)

  REF_GENOME <- paste0("ref_data/",genome_fna)
  RSUBREAD_INDEX_PATH <- "ref_data"
  RSUBREAD_INDEX_BASE <- organism
  buildindex(basename=file.path(RSUBREAD_INDEX_PATH,RSUBREAD_INDEX_BASE), reference=REF_GENOME)

  inputfilefwd <- file.path(RNAseqDATADIR, paste0("raw_data/", reads_1))
  if (!is.na(reads_2)) inputfilervs <- file.path(RNAseqDATADIR, paste0("/raw_data/", reads_2))
  else inputfilervs <- NA

  align(index=file.path(RSUBREAD_INDEX_PATH,RSUBREAD_INDEX_BASE), readfile1=inputfilefwd, readfile2=inputfilervs, output_file=paste0(organism,"_samfile.sam"), output_format="SAM",nthreads = 4)

  outputsamfile <- paste0(path,"/", organism, "_samfile.sam")
  propmap <- propmapped(outputsamfile)

  feature_table <- read.table(paste0(path,"/ref_data/",feature_table),header=TRUE,sep = c("\t","\n"))
  GENOME_GFF <- readGFF(paste0(path,"/ref_data/",genome_gff))

  saf_file = gff_to_saf(GENOME_GFF)       # own function for creating a saf file
  if is.na(reads_2)) paired = FALSE else paired = TRUE
  feature_counts <-featureCounts(outputsamfile, annot.ext=saf_file, isPairedEnd=paired)
  return(feature_counts)
}

Input:

genome_gff  = "GCA_000221645.1_ASM22164v1_genomic.gff"

genome_fna = "GCA_000221645.1_ASM22164v1_genomic.fna"

feature_table = GCA_000221645.1_ASM22164v1_feature_table.txt"

reads_1 =   "SRR1916792_1.fastq"

reads_2="SRR1916792_2.fastq"

During the calculations there were also following print outs which look a bit buggy:

|| 311% completed, 17 mins elapsed, rate=7,8k fragments per second ||
|| 363% completed, 17 mins elapsed, rate=8,8k fragments per second ||
||365% completed, 18 mins elapsed, rate=8,7k fragments per second ||

Thanks for your help!

RNA-Seq rna-seq rsubread • 1.9k views
ADD COMMENT
1
Entering edit mode

Since you have enough space, I wonder if the problem is that something is wrong with your output directory. Maybe its not being named properly?

ADD REPLY
0
Entering edit mode

It is correct since there is a temporary SAM file in the directory while it is calculating and it works till a cerain point where it throws the error message. For the one organism it was always at 51% (tried it multiple times) and for the other 365% (which is a bit strange).

ADD REPLY
0
Entering edit mode

Hi! I hope you were able to figure out what was going on. Can I ask what your feature table input is? Is it like the metadata file? I'm learning this type of analysis and would love to understand this. Thanks a lot!

ADD REPLY

Login before adding your answer.

Traffic: 2574 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6