Question: TopHat2: "IOError: [Errno 28] No space left on device" with large input
0
gravatar for antoinefelden
2.2 years ago by
antoinefelden20 wrote:

I'm trying to run TopHat2 on my RNA-seq sample, but while trial job with a small subset of my samples worked fine, I can't run it with my full dataset. The trial run took around 30 GB as input, and everything went smooth, but when I tried with the full dataset of ~250 GB, then I got the following errors:

Traceback (most recent call last):
  File "/srv/global/scratch/groups/sbs/TopHat/2.1.1/tophat", line 4107, in <module>
    sys.exit(main())
  File "/srv/global/scratch/groups/sbs/TopHat/2.1.1/tophat", line 4081, in main
    params.gff_annotation)
  File "/srv/global/scratch/groups/sbs/TopHat/2.1.1/tophat", line 2757, in compile_reports
    print >> run_log, " ".join(bamsort_cmd)
IOError: [Errno 28] No space left on device

Do you have any estimation of the temporary files TopHat2 would produce with that input? I'm trying to figure out what could take so much space, as our local work area has a capacity of 645 GB.

Thanks, Antoine

rna-seq memory usage tophat2 • 1.3k views
ADD COMMENTlink written 2.2 years ago by antoinefelden20

What operating system are you running on? And how is it set up - are there different partitions? What command are you using to run TopHat - specifically where are you directing the output to? How quickly does it give the error - straight away or after a while?

645GB capacity with a 250GB input FASTQ sounds like its going to be tight - given SAM/BAM files will be created etc - is the 645GB completely free - or is there other data in there? i.e. how much of the 645GB is actually free before Tophat starts

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Tonor420

I work on my university HPC, on Linux. I'm running TopHat to align 15*2 read files (~8 GB each), with the --read-realign-edit-dist option which is known to increase computing time and maybe memory requirements as well? All the raw files are copied into the working space before the TopHat run starts, so effectively the free space in the local work area would be 645 - 250 = ~ 400 GB. The global scratch partition (shared by everyone I think) is 2TB, but the error occurred within the TopHat run (the last checkpoints were "Joining segment hits" and "Reporting output tracks"), not when copying the output so I'm guessing the problem is in the local work area.

ADD REPLYlink written 2.2 years ago by antoinefelden20

During its normal run tophat will create some rather large temp files, which will be written during things like "Reporting output tracks". Hopefully those aren't uncompressed, though tophat is old enough that I wouldn't be surprised. Can you use a newer/faster/better aligner instead? hisat2 and STAR are typically the go-to RNAseq aligners these days.

ADD REPLYlink written 2.2 years ago by Devon Ryan88k

I will use others but I wanted to compare outputs (namely with BBTools, I haven't looked at hisat2 or STAR). Since temporary files seem to be the issue here, is there any way to tell TopHat where to stick them somewhere else than the local work area?

ADD REPLYlink written 2.2 years ago by antoinefelden20

Not to my knowledge.

ADD REPLYlink written 2.2 years ago by Devon Ryan88k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1732 users visited in the last hour