Question

hisat2 alignment error

0

Entering edit mode

3.3 years ago

Marco Pannone ▴ 790

Hello everybody

I have paired-end fastq files from RNA-seq which I am aligning with hisat2. Among all my 18 pairs of fastq files, I encountered an error for only one of them, which shows the message below:

Error: Read V300055969L2C002R0400381671/1 has more read characters than quality values.
libc++abi.dylib: terminating with uncaught exception of type int
(ERR): hisat2-align died with signal 6 (ABRT)

when I go looking at such V300055969L2C002R0400381671/1 read into the specific fastq file, it looks like this:

+
FFFFFFBFF>FEFFFAGFFG=GGGFDGDGGGEDFEFFFCFCGCGFFFGGFCFFGEEG2BGG<GFFEFFAF4FFF=GF8FEFG@FFFGGFFFGGF<FGEGF
@V300055969L2C002R0400381671/1
CATGGAAAAGGTTTTCAGCCCTAGTGGGTTTTGCTGGTTGAACTGGAGGCTGCCCAGAGGAGACAGTGAGGCTCCATTTACGACTCAGCGATCCAAGAGA
+

It's the first time I encountered an error like this and I am not sure what is the cause. I also tried to re-download the file but nothing changed.

Hope some of you can explain to me what is possibly wrong. I would be very grateful!

Thanks!

hisat2 RNA-Seq alignment software error • 1.6k views

ADD COMMENT • link 3.3 years ago by Marco Pannone ▴ 790

0

Entering edit mode

What is the output of gzip -cd your.fastq.gz | grep -A 3 "@V300055969L2C002R0400381671/1" ? The error tells you what is wrong, lets see whether this is true with the above command. If so there is some corruption the int file which could be repaired with repair.sh from BBMap suite.

ADD REPLY • link 3.3 years ago by ATpoint 81k

0

Entering edit mode

Thanks for the reply! This is the output I got executing the command you wrote:

@V300055969L2C002R0400381671/1
CATGGAAAAGGTTTTCAGCCCTAGTGGGTTTTGCTGGTTGAACTGGAGGCTGCCCAGAGGAGACAGTGAGGCTAATAG@FD1E0
@V3AATCCTAGGCCTTTEDFFFCCTACCTCAAATGTTGCTTGCTTG70038109FF
@TATCTGTTATTGGTTAAGCTCAACAAGGCTTGGFBGFFFFFF@F5969L2GTTGGAACGCCTAATCAACAACCGTCTCCATTCTTTBFF8FF;FG82DCTTAGAAFFFTATAG=@6FFF9FFF;DFFGGGGG063215TFACCCTATE?E381670/1GTTTCA6DDBE@DDF>DEFD=EEE2D:A@DFEFEFDFFFAEEFFF8FFBFFFFFTACTTCTGGTAA5GCTCTGEGGGFFEDDDDDDDDDDDDDDDDDDDDDDDDDDF@FFFCGGAGCCCCTAATTG02R04009L2C00><FEFDDFF?FD>F?E@DF<FGEGFATCTTGGCCCCTTACTTTAACF3EFD@GFFFEFFFGFAACTTCCA

ADD REPLY • link 3.3 years ago by Marco Pannone ▴ 790

2

Entering edit mode

Looks quite malformed, you'll want to contact who ever uploaded it.

ADD REPLY • link 3.3 years ago by Devon Ryan 104k

1

Entering edit mode

It should actually be the read name (line1) followed by the nucleotide sequence (2), a "+" (3) and the quality line (4). 2 and 4 must have identical length. Something is wrong there. As suggested, contact the facility that created the files.