Query about memory size usage of Hisat2
0
0
Entering edit mode
13 months ago
boymin2020 ▴ 20

Hi Guys, recently I have been dealing with a batch of silkworm(Bombyx mori) RNAseq data. An error arose which I cannot debug. Below is my workflow.

1.the genomic sequence of the silkworm (silkDB 3.0) is about 468.3Mb, 28 chromosomes.

2.The Linux server I am using has 288 cores and 1Tb memory size.

3.No problem arose when I created INDEX files with hisat2-build functionality.

4.An error always exits when hisat2 alignment. The following is an example. The memory size usage (%mem) continued to increase after the job submitted.

hisat2 -t -p 30 --dta -x /home/RNAseq_2/source/silkworm/index/silkworm_tran -1 /data/storage04/RNAseq_2/silkworm/majorbio/data4antivirus/cleandata/306D3D1a_R1-clean.fastq.gz -2 /data/storage04/RNAseq_2/silkworm/majorbio/data4antivirus/cleandata/306D3D1a_R2-clean.fastq.gz -S /data/storage04/RNAseq_2/silkworm/majorbio/data4antivirus/alignedFromHisat2Results/306D3D1a.sam

5.the size of the targeted sam file is expected to be 22Gb. But now, %mem is 55 when the sam file is just 7.8Gb.

6.I had tried to run similar 5 jobs with 8cores/job，resulting in the following error message:

(ERR): hisat2-align died with signal 9 (KILL)

I have googled a lot without any progress. Could you please figure out the issue and speed up the job?

RNA-Seq Hisat2 silkworm • 479 views
0
Entering edit mode

The node should be more than capable of dealing with this task based on the specs. Did you use a scheduler such as SLURM? If so please post the header lines of the submission script. Probably you did not allocate enough memory and the scheduler might have killed it.

0
Entering edit mode

Thanks for so fast comment. No scheduler was installed on the server. Therefore, I submit the job with nohup. Below is an example. nohup bash ${id}_hisat2.sh >${outDir}/shell/logerr/\${id}_hisat2.nohup-logerr 2>&1 &

0
Entering edit mode

Please run it on a single file with a plain bash command outside of that script and without any nohup, not sending it to background and without redirecting any streams. This will show where to start debugging. You can also run it just on a subset of the entire file for testing purposes.

0
Entering edit mode

I re-run it on a single file with a plain bash command without nohup at my laptop (8 cores, 16G memory size), resulting in the same error. Then I tried to check the original fastq files. The fastp tool for QC shows a big difference at the ADAPTER box between successful (~4Gb) and failed samples (~6Gb) in the alignment step.

INFO of a successful sample

Sequence Occurrences 1. A 3051 2. G 2336 3. T 4015 4. other adapter sequences 206857

Sequence Occurrences 1. A 3086 2. G 2277 3. T 4063 4. other adapter sequences 206838

INFO of a failed sample

Sequence Occurrences

1. A 14240
2. AG 13894
3. AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCGTGGAAATCTCGTATGCCGTCTTCTGCTT 42850
4. AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCGTGGAAATCTCGTATGCCGTCTTCTGCTTGAAAA 47767
5. AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCGTGGAAATCTCGTATGCCGTCTTCTGCTTGAAAAA 15307

Sequence Occurrences

1. A 14233
2. AG 13953
3. AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTGCGTCTATGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAA 15624

What I can tell is that these two samples were derived from different batches, one of which was trimmed before my handling. But I still do not know how to debug it. Appreciate any advice.

0
Entering edit mode

Did you verify that the index you created was good? Is this install of HISAT2 known to otherwise work well? You have more than adequate hardware capacity (assuming nothing else is consuming that capacity when you are running these jobs) for this to work.

0
Entering edit mode

Yes, I have successfully run three samples from the same batch. PS: they have similar file sizes and pre-processed by fastp.

0
Entering edit mode

Do you get anything else printed after:

hisat2-align died with signal 9 (KILL)

Sigkill 9 indicates that something is not right and the program needs to abort. If you have other samples that have worked well with HISAT2 on this machine then I would suggest that you investigate if your fastq files for this particular sample are corrupt. It may be best to re-process the originals and see if you have better luck with newly made files. Hope you are trimming paired-end data files together.

0
Entering edit mode

Good advice. I am checking the original fastq files. Maybe remote transportation from my laptop at USA to the Linux server at China is the reason.