8 months ago by
Segmentation faults are caused by a program asking for memory at an address that it outside the region allocated to it by the operating system. The two most common reasons for this to happen are:
Running out of memory. I don't know how much memory
Subjunc (the aligner that
align calls) uses, but aligners can use anything from 4Gb (Hisat) to 32Gb (STAR). However, memory usage is usually determined by the size of the index, not the size of the input, and so should be the same for every sample. Still, worth checking.
An unexpected input being processed to produce an invalid memory address. This seems more likely. I would start with a quick manual inspection of the
fastq file, so check that everything looks okay. I'd then check that the number of lines in the
fastq file is a multiple of 4. If neither of these reveal the problem, you'll need to do a binary search for the record causing the problem.
Binary search for problem reads
Start by dviding your input into two equally sized files. For example, if my fastq contains 1,000,000 reads, i'd divide it in two with:
zcat myreads.fastq.gz | head -n2000000 > half1.fastq.gz
zcat myreads.fastq.gz | tail -n2000000 > half2.fastq.gz
Now try mapping each half. If neither cause an error, then the problem isn't your input file. If half1 causes the problem, divide that into two:
zcat half1.fastq.gz | head -n1000000 > quarter1.fastq.gz
zcat half1.fastq.gz | tail -n1000000 > quarter2.fastq.gz
otherwise if half2 causes the problem divide that (if they both do, just pick one).
Repeat this proceedure until you've only got a small enough number of reads left to inspect manually. If you can't see anything, you might try pipelining through
cat -A to see if there are hidden characters. If every read is causing a problem, then there is something other than an invalid read(s) causing the problem.