Question

"Segmentation fault" error from running align function in Rsubread package

0

Entering edit mode

6.9 years ago

wangdp123 ▴ 340

Hi there,

When running align function in Rsubread package for one particular RNA-Seq samples, an error message "Segmentation fault" came up and the program stopped at "50% completed". It works for other samples but only doesn't work for this sample.

Does anybody know how to tackle this problem?

Thank you very much,

Regards,

Tom

RNA-Seq Rsubread • 3.2k views

ADD COMMENT • link updated 6.9 years ago by i.sudbery 21k • written 6.9 years ago by wangdp123 ▴ 340

0

Entering edit mode

Check if this BAM file is corrupt.

ADD REPLY • link 6.9 years ago by GenoMax 152k

0

Entering edit mode

Input to align is fastq, not BAM.

ADD REPLY • link 6.9 years ago by i.sudbery 21k

score 1 · Answer 1 · 2018-08-16

Segmentation faults are caused by a program asking for memory at an address that it outside the region allocated to it by the operating system. The two most common reasons for this to happen are:

Running out of memory. I don't know how much memory Subjunc (the aligner that align calls) uses, but aligners can use anything from 4Gb (Hisat) to 32Gb (STAR). However, memory usage is usually determined by the size of the index, not the size of the input, and so should be the same for every sample. Still, worth checking.
An unexpected input being processed to produce an invalid memory address. This seems more likely. I would start with a quick manual inspection of the fastq file, so check that everything looks okay. I'd then check that the number of lines in the fastq file is a multiple of 4. If neither of these reveal the problem, you'll need to do a binary search for the record causing the problem.

Binary search for problem reads

Start by dviding your input into two equally sized files. For example, if my fastq contains 1,000,000 reads, i'd divide it in two with:

zcat myreads.fastq.gz | head -n2000000 > half1.fastq.gz
zcat myreads.fastq.gz | tail -n2000000 > half2.fastq.gz

Now try mapping each half. If neither cause an error, then the problem isn't your input file. If half1 causes the problem, divide that into two:

zcat half1.fastq.gz | head -n1000000 > quarter1.fastq.gz
zcat half1.fastq.gz | tail -n1000000 > quarter2.fastq.gz

otherwise if half2 causes the problem divide that (if they both do, just pick one).

Repeat this proceedure until you've only got a small enough number of reads left to inspect manually. If you can't see anything, you might try pipelining through cat -A to see if there are hidden characters. If every read is causing a problem, then there is something other than an invalid read(s) causing the problem.