Question: STAR alignment - segmentation fault error
0
gravatar for mgmohsen
13 months ago by
mgmohsen0
Stanford
mgmohsen0 wrote:

Hi all,

I submitted the following job star_align.sh) using slurm to align a fastq read to a reference genome that I generated using GenCode v30:

STAR --runThreadN 16 --readFilesCommand zcat --quantMode GeneCounts --genomeDir ~/directory/to/genome/ --readFilesIn ~/directory/to/file.fastq.gz

Here is the slurm submission script that I've used to submit the job:

#SBATCH --job-name=star_align.sh
#SBATCH --time=24:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=16
#SBATCH --mem-per-cpu=8G
#
srun star_align.sh
srun sleep 60

And this is the output that I see:

Jul 08 13:07:23 ..... Started STAR run
Jul 08 13:07:23 ..... Loading genome
Jul 08 13:08:11 ..... Started mapping
star_align.sh: line 2: 11554 Segmentation fault      (core dumped)

Does anyone have an idea about what might be going wrong here? Thanks in advance

rna-seq alignment • 1.1k views
ADD COMMENTlink written 13 months ago by mgmohsen0
1

Difficult to tell. Please make a subset of the fastq files (maybe 1000 reads) and then align it with your script and the identical command. That will help to see if it is a general problem or rather a memory issue.

ADD REPLYlink written 13 months ago by ATpoint36k
1

Does GenCode make a proper STAR index? What files do you have there besides the genome itself? And are you sure you don't want to be including a gtf file in there?

ADD REPLYlink modified 13 months ago • written 13 months ago by swbarnes28.1k

Sorry, I meant to say that I used GenCode files (fasta, gtf) to generate the genome, which I did using STAR.

ADD REPLYlink written 13 months ago by mgmohsen0

Trying the alignment again with a subset of the first 1000 reads, I get the following error message:

ReadAlignChunk_processChunks.cpp:115:processChunks EXITING because of FATAL ERROR in input reads: unknown file format: the read ID should start with @ or > 

Jul 08 15:09:01 ...... FATAL ERROR, exiting
ADD REPLYlink written 13 months ago by mgmohsen0
1

Okay, so something wrong with the fastq. What do the first 10 lines look like? If the first 10 lines look fine, maybe the fastq is garbled further down.

ADD REPLYlink written 13 months ago by swbarnes28.1k

Here are the first 10 lines:

@COOPER:276:H2HTMBBXY:7:1101:10003:1209 1:N:0:NTTGTACT
NTGATGAGTGAGTGTCTAGATCGGAAGAGCACACGTCTGAACTCCAGTCACCTTGTACTATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAACAA
+
#AAFFJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ<<<----
--
@COOPER:276:H2HTMBBXY:7:1101:10003:1349 1:N:0:NTTGTACT
NACATGAGTATTAGGCCAGATCGGAAGAGCACACGTCTGAACTCCAGTCACCTTGTACTATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAAAAA
+
#AAAFJJJFJJJJJJJJJFJJJJJJJJFJFJJJJJJJJJJJJJJJJJJFAJJJJJJJJJJFFJJJJJJJJJJFJJJJJJJFJJJJJJJJJJJJJ<-<----
--
ADD REPLYlink written 13 months ago by mgmohsen0
1

Those look okay, but you can confirm by making a baby fastq with just those two sequences, see if that runs.

ADD REPLYlink written 13 months ago by swbarnes28.1k
1

How did you make the subset? Do something like zcat your.fastq.gz | head -n 4000 > subset.fq and be sure that you only use multipliers of 4 as a fastq read consists of four lines.

ADD REPLYlink written 13 months ago by ATpoint36k

OK, that was definitely the issue with the subset of 1000 that I generated. Trying again, I got the same error as I did with the original fastq read. This seems to indicate that the issue isn't with memory usage.

ADD REPLYlink written 13 months ago by mgmohsen0
3

SBATCH --mem-per-cpu=8G

I would recommend against using that option. STAR needs at least 30G of RAM for human sized genomes. So allocate more RAM to the entire job using #SBATCH --mem=40g option.

ADD REPLYlink modified 13 months ago • written 13 months ago by genomax87k

Thanks for the recommendation, I will be sure to increase RAM to 40G for future job submissions.. However, I'm still getting the same error for this alignment, even after setting it to 40G of RAM.

ADD REPLYlink written 13 months ago by mgmohsen0
1

Did you make these indexes with the version of STAR currently installed on your cluster? There was no error during that process? Have they been tested and are known to work?

ADD REPLYlink written 13 months ago by genomax87k

Yes, I generated the indexes with the same version of STAR and there was no error during the process. However, this alignment is my first attempt to test them, so they are not known to work. Is there any standard test that I can do to make sure there's no problem with my indexes?

ADD REPLYlink written 13 months ago by mgmohsen0
1

Can you post the command used to generate STAR index?

Compare the files your have in your STAR index with the following listing to make sure you have most of these files in your index and they are of similar size (use du -shc * to determine file sizes).

34K     chrLength.txt
66K     chrNameLength.txt
34K     chrName.txt
34K     chrStart.txt
49M     exonGeTrInfo.tab
20M     exonInfo.tab
2.1M    geneInfo.tab
3.9G    Genome
34K     genomeParameters.txt
354K    Log.out
30G     SA
1.8G    SAindex
12M     sjdbInfo.txt
13M     sjdbList.fromGTF.out.tab
11M     sjdbList.out.tab
16M     transcriptInfo.tab
ADD REPLYlink modified 13 months ago • written 13 months ago by genomax87k

Command used to generate STAR index:

STAR --runThreadN 16 --runMode genomeGenerate --genomeDir ./star_index --genomeFastaFiles ./GRCh38.p12.genome.fa --sjdbGTFfile ./gencode.v30.annotation.gtf

Output of du -shc *

26K     chrLength.txt
50K     chrNameLength.txt
26K     chrName.txt
26K     chrStart.txt
66M     exonGeTrInfo.tab
27M     exonInfo.tab
1.7M    geneInfo.tab
4.7G    Genome
26K     genomeParameters.txt
37G     SA
2.2G    SAindex
16M     sjdbInfo.txt
14M     sjdbList.fromGTF.out.tab
14M     sjdbList.out.tab
19M     transcriptInfo.tab
44G     total
ADD REPLYlink modified 13 months ago • written 13 months ago by mgmohsen0
1

Those files look to be of close enough in size but I don't see a Log.out file. Did you save it somewhere else? It should say something like this at the end.

Mar 04 17:05:04 ... writing SAindex to disk
Writing 8 bytes into ./SAindex ; empty space on disk = 1925055853363200 bytes ... done
Writing 120 bytes into ./SAindex ; empty space on disk = 1925055853363200 bytes ... done
Writing 1565873491 bytes into ./SAindex ; empty space on disk = 1925055853363200 bytes ... done
Mar 04 17:05:11 ..... finished successfully
DONE: Genome generation, EXITING

Are you using a pre-compiled version of STAR or did the admins of this cluster compile/install from source?

ADD REPLYlink modified 13 months ago • written 13 months ago by genomax87k

Yes, I saved it somewhere else, and it has a very similar ending to yours. I'm using a pre-compiled version of STAR (version 2.7.1a).

Jul 03 12:10:33 ... writing SAindex to disk
Writing 8 bytes into ./star_index/SAindex ; empty space on disk = 172171319050240 bytes ... done
Writing 120 bytes into ./star_index/SAindex ; empty space on disk = 172171319050240 bytes ... done
Writing 1565873491 bytes into ./star_index/SAindex ; empty space on disk = 172171319050240 bytes ... done
Jul 03 12:10:40 ..... Finished successfully
DONE: Genome generation, EXITING
ADD REPLYlink written 13 months ago by mgmohsen0

Yes I agree @genomax because I faced same issue when I was trying to submit my batch script on server. In my case I used 31 GB for human genome.

ADD REPLYlink written 13 months ago by archana.bioinfo87180
1
gravatar for genomax
13 months ago by
genomax87k
United States
genomax87k wrote:

Looks like you have paired-end data. Is that correct? These files should be provided to STAR as --readFilesIn /path_to/R1_file.gz /path_to/R2_file.gz. Can you explicitly use a pair of R1/R2 files when you submit a job.

ADD COMMENTlink modified 13 months ago • written 13 months ago by genomax87k

Yes! This was the issue, both R1/R2 files need to be supplied at once. Thank you for all your help.

ADD REPLYlink written 13 months ago by mgmohsen0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1075 users visited in the last hour