How to align zebrafish RNA-seq data against a reference genome with splice sites using hisat2?
0
0
Entering edit mode
3.2 years ago
tara ▴ 30

Hey,

I want to align RNA-seq fastq files against a reference genome with splice sites using hisat2. First, I downloaded a reference genome and built an index using the hisat2-build function.

wget ftp://ftp.ensembl.org/pub/release-102/fasta/danio_rerio/dna/Danio_rerio.GRCz11.dna.primary_assembly.fa.gz
gzip -d Danio_rerio.GRCz11.dna.primary_assembly.fa.gz
mv Danio_rerio.GRCz11.dna.primary_assembly.fa genome.fa
hisat2-build -p 16 genome.fa genome

Then I downloaded the corresponding gtf file and created the hisat2 specific splice sites file:

wget ftp://ftp.ensembl.org/pub/release-102/gtf/danio_rerio/Danio_rerio.GRCz11.102.gtf.gz
gzip -d Danio_rerio.GRCz11.102.gtf.gz
mv Danio_rerio.GRCz11.102.gtf genome.gtf 
hisat2_extract_splice_sites.py genome.gtf > genome.txt

Finally, I tried to align the paired end fastq files against the index.

hisat2 --dta --known-splicesite-infile genome.txt -x genome -1 fastq/mut_1_1.fastq.gz -2 fastq/mut_1_2.fastq.gz > hisat2/mut_1.sam

The output file is generated. But when it reaches a size of 4.0 GB, there is always this error message:

Error while flushing and closing output  
terminate called after throwing an instance of 'int' 
Aborted (core dumped)
(ERR): hisat2-align exited with value 134

I searched what this error could mean, and it might be due to too less memory space. I checked the storage, and the Hard Drive has 75 GB of free space. I have all my files on an external media and there are nearly 400 GB space left. This should be enough.

Can you help me in understanding and solving the error?

hisat2 zebrafish alignment RNA-Seq fastq • 1.3k views
ADD COMMENT
0
Entering edit mode

Is it intentional that the splice file ends with .ss but you feed a .txt to hisat?

hisat2_extract_splice_sites.py genome.gtf > genome.ss
hisat2 --dta --known-splicesite-infile => genome.txt <= right there
ADD REPLY
0
Entering edit mode

Thanks for this hint. I corrected the code above. I was wondering if the splice site file is occuring the error and tried to align the fastq files against the reference genome without the file:

hisat2 --dta -x genome -1 fastq/mut_1_1.fastq.gz -2 fastq/mut_1_2.fastq.gz > hisat2/mut_1.sam

But this leads to the same error like above.

ADD REPLY

Login before adding your answer.

Traffic: 1734 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6