Question

Query about memory size usage of Hisat2

0

Entering edit mode

4.0 years ago

boymin2020 ▴ 80

Hi Guys, recently I have been dealing with a batch of silkworm(Bombyx mori) RNAseq data. An error arose which I cannot debug. Below is my workflow.

1.the genomic sequence of the silkworm (silkDB 3.0) is about 468.3Mb, 28 chromosomes.

2.The Linux server I am using has 288 cores and 1Tb memory size.

3.No problem arose when I created INDEX files with hisat2-build functionality.

4.An error always exits when hisat2 alignment. The following is an example. The memory size usage (%mem) continued to increase after the job submitted.

hisat2 -t -p 30 --dta -x /home/RNAseq_2/source/silkworm/index/silkworm_tran -1 /data/storage04/RNAseq_2/silkworm/majorbio/data4antivirus/cleandata/306D3D1a_R1-clean.fastq.gz -2 /data/storage04/RNAseq_2/silkworm/majorbio/data4antivirus/cleandata/306D3D1a_R2-clean.fastq.gz -S /data/storage04/RNAseq_2/silkworm/majorbio/data4antivirus/alignedFromHisat2Results/306D3D1a.sam

5.the size of the targeted sam file is expected to be 22Gb. But now, %mem is 55 when the sam file is just 7.8Gb.

6.I had tried to run similar 5 jobs with 8cores/job，resulting in the following error message:

(ERR): hisat2-align died with signal 9 (KILL)

I have googled a lot without any progress. Could you please figure out the issue and speed up the job?

Thanks in advance,

RNA-Seq Hisat2 silkworm • 2.1k views

ADD COMMENT • link 4.0 years ago by boymin2020 ▴ 80

0

Entering edit mode

The node should be more than capable of dealing with this task based on the specs. Did you use a scheduler such as SLURM? If so please post the header lines of the submission script. Probably you did not allocate enough memory and the scheduler might have killed it.

ADD REPLY • link 4.0 years ago by ATpoint 81k

0

Entering edit mode

Thanks for so fast comment. No scheduler was installed on the server. Therefore, I submit the job with nohup. Below is an example. nohup bash ${id}_hisat2.sh > ${outDir}/shell/logerr/${id}_hisat2.nohup-logerr 2>&1 &

ADD REPLY • link 4.0 years ago by boymin2020 ▴ 80

0

Entering edit mode

Please run it on a single file with a plain bash command outside of that script and without any nohup, not sending it to background and without redirecting any streams. This will show where to start debugging. You can also run it just on a subset of the entire file for testing purposes.

ADD REPLY • link 4.0 years ago by ATpoint 81k

0

Entering edit mode

I re-run it on a single file with a plain bash command without nohup at my laptop (8 cores, 16G memory size), resulting in the same error. Then I tried to check the original fastq files. The fastp tool for QC shows a big difference at the ADAPTER box between successful (~4Gb) and failed samples (~6Gb) in the alignment step.

INFO of a successful sample

Adapter or bad ligation of read1. The input has little adapter percentage (~0.247438%), probably it's trimmed before.

Sequence Occurrences 1. A 3051 2. G 2336 3. T 4015 4. other adapter sequences 206857

Adapter or bad ligation of read2. The input has little adapter percentage (~0.246621%), probably it's trimmed before.

Sequence Occurrences 1. A 3086 2. G 2277 3. T 4063 4. other adapter sequences 206838

INFO of a failed sample

Sequence Occurrences

A 14240
AG 13894
AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCGTGGAAATCTCGTATGCCGTCTTCTGCTT 42850
AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCGTGGAAATCTCGTATGCCGTCTTCTGCTTGAAAA 47767
AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCGTGGAAATCTCGTATGCCGTCTTCTGCTTGAAAAA 15307
other adapter sequences 1232207

Adapter or bad ligation of read2

Sequence Occurrences

A 14233
AG 13953
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTGCGTCTATGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAA 15624
other adapter sequences 1322715

What I can tell is that these two samples were derived from different batches, one of which was trimmed before my handling. But I still do not know how to debug it. Appreciate any advice.

ADD REPLY • link 3.9 years ago by boymin2020 ▴ 80

0

Entering edit mode

Did you verify that the index you created was good? Is this install of HISAT2 known to otherwise work well? You have more than adequate hardware capacity (assuming nothing else is consuming that capacity when you are running these jobs) for this to work.

ADD REPLY • link 4.0 years ago by GenoMax 141k

0

Entering edit mode

Yes, I have successfully run three samples from the same batch. PS: they have similar file sizes and pre-processed by fastp.

ADD REPLY • link 4.0 years ago by boymin2020 ▴ 80

0

Entering edit mode

Do you get anything else printed after:

hisat2-align died with signal 9 (KILL)

Sigkill 9 indicates that something is not right and the program needs to abort. If you have other samples that have worked well with HISAT2 on this machine then I would suggest that you investigate if your fastq files for this particular sample are corrupt. It may be best to re-process the originals and see if you have better luck with newly made files. Hope you are trimming paired-end data files together.

ADD REPLY • link 4.0 years ago by GenoMax 141k

0

Entering edit mode

Good advice. I am checking the original fastq files. Maybe remote transportation from my laptop at USA to the Linux server at China is the reason.

ADD REPLY • link 4.0 years ago by boymin2020 ▴ 80