Question

TopHat Error Qual length differs from seq length

0

Entering edit mode

7.6 years ago

williamsbrian5064 ▴ 540

I am getting this error when I try to run TopHat on some sequencing data. I was wondering if anyone had any solutions to the problem?

    ./tophat -p 1 -G dmel-all-r6.18.gtf -o test.bam dmel_genome_6.18  read_1.fastq read_2.fastq



[2017-11-14 14:59:57] Beginning TopHat run (v2.1.0)

-----------------------------------------------

[2017-11-14 14:59:57] Checking for Bowtie

  Bowtie 2 not found, checking for older version..

  Bowtie version:   1.1.2.0

[2017-11-14 14:59:57] Checking for Bowtie index files (genome)..

[2017-11-14 14:59:57] Checking for reference FASTA file

Warning: Could not find FASTA file dmel_genome_6.18.fa

[2017-11-14 14:59:57] Reconstituting reference FASTA file from Bowtie index

  Executing: /Users/kmmeurs/Desktop/Programs/tophat-2.1.0.OSX_x86_64/bowtie-inspect dmel_genome_6.18 > test.bam/tmp/dmel_genome_6.18.fa

[2017-11-14 15:00:07] Generating SAM header for dmel_genome_6.18

[2017-11-14 15:00:07] Reading known junctions from GTF file

[2017-11-14 15:00:12] Preparing reads

[FAILED]

Error running 'prep_reads'

Error: qual length (95) differs from seq length (125) for fastq record !

Here is the header as well for one of the fastq files:

@HISEQ:249:C9MM3ANXX:7:1101:1733:2241 1:N:0:CTATAC
CGACAATCTTGCATGGCCGCGACTTCAGCNNNNNNNNNNNGTTTTTGCGCAATGCCGAACATTGCATGGGATAGGTCGTCGATGCGCCGGAATCCGTGGTCTCGAAATGATCGTCCAACTCAGCC
+
A=3BBGGGGGGGGGGGGGGGGDGGGGGGF###########==<EFGGEGG@GGGEDGGGGGGGCFCGGGD0ECBFGDGGGGGFGGGBGGG@AGG@CGGDEEB@D/6.C8EDEGGGD<EGGGGGGG
@HISEQ:249:C9MM3ANXX:7:1101:1803:2233 1:N:0:CTATAC
CTTAAAATAATTAATGTGTGTATTNNNNNNNNNNNNNNNNNNCACACACTAGAAATATACTTTGCCATCCATTAGGTGAAGGCCTAATCCAAGGCCTCCCTACCATGGATTGGCACAGATAAATT
+
CCCCCGGGGGGGGGGGGGGGGGGG##################===FGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGFGEFGGGGGGGGGGEGGGGGGGGGGGGGGGGFGGGGGGGGGGGDGGG
@HISEQ:249:C9MM3ANXX:7:1101:1772:2234 1:N:0:CTATAC
TTCTCCTCCTCGGAGTCGCTGTAAANNNNNNNNNNNNNNNNTGACGGCTTTTGTTTACAATCCACCTTCTTTTTAATTTCTTCCTCATTGTAACCCGGAGGTGGAACGGGGGTAAGAGAGCGCCT

docsmb17:tophat-2.1.0.OSX_x86_64 kmmeurs$ head A31P_MYBPC3_Female_1_week_CTATAC_L007_R1_C9MM3ANXX.fastq -C ==> A31P_MYBPC3_Female_1_week_CTATAC_L007_R1_C9MM3ANXX.fastq <==

@HISEQ:249:C9MM3ANXX:7:1101:1733:2241 1:N:0:CTATAC
CGACAATCTTGCATGGCCGCGACTTCAGCNNNNNNNNNNNGTTTTTGCGCAATGCCGAACATTGCATGGGATAGGTCGTCGATGCGCCGGAATCCGTGGTCTCGAAATGATCGTCCAACTCAGCC
+
A=3BBGGGGGGGGGGGGGGGGDGGGGGGF###########==<EFGGEGG@GGGEDGGGGGGGCFCGGGD0ECBFGDGGGGGFGGGBGGG@AGG@CGGDEEB@D/6.C8EDEGGGD<EGGGGGGG
@HISEQ:249:C9MM3ANXX:7:1101:1803:2233 1:N:0:CTATAC
CTTAAAATAATTAATGTGTGTATTNNNNNNNNNNNNNNNNNNCACACACTAGAAATATACTTTGCCATCCATTAGGTGAAGGCCTAATCCAAGGCCTCCCTACCATGGATTGGCACAGATAAATT
+
CCCCCGGGGGGGGGGGGGGGGGGG##################===FGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGFGEFGGGGGGGGGGEGGGGGGGGGGGGGGGGFGGGGGGGGGGGDGGG
@HISEQ:249:C9MM3ANXX:7:1101:1772:2234 1:N:0:CTATAC
TTCTCCTCCTCGGAGTCGCTGTAAANNNNNNNNNNNNNNNNTGACGGCTTTTGTTTACAATCCACCTTCTTTTTAATTTCTTCCTCATTGTAACCCGGAGGTGGAACGGGGGTAAGAGAGCGCCT

I saw another post similar to this but I couldn't figure out what they did to fix the problem (https://www.biostars.org/p/110412/). Any help would be fantastic! Thanks!!

RNA-Seq Assembly software error alignment • 4.7k views

ADD COMMENT • link updated 6.9 years ago by h.mon 35k • written 7.6 years ago by williamsbrian5064 ▴ 540

1

Entering edit mode

The error indicates that something is wrong with your fastq file.

You should know that the old 'Tuxedo' pipeline of Tophat and Cufflinks is no longer the "advisable" tool for RNA-seq analysis. The software is deprecated/ in low maintenance and should be replaced by HISAT2, StringTie and ballgown. See this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. (If you can't get access to that publication, let me know and I'll -cough- help you.) There are also other alternatives, including alignment with STAR and bbmap, or pseudo-alignment using salmon.

ADD REPLY • link 7.6 years ago by WouterDeCoster 48k

0

Entering edit mode

Is there any way to fix the fastq file? Thanks for the advice by the way! I would have been struggling with "Tuxedo" nonsense for days.

ADD REPLY • link 7.6 years ago by williamsbrian5064 ▴ 540

0

Entering edit mode

You'll first have to figure out which file is corrupt, and then why. Do you have the original data available? Which steps were taken before this attempted alignment?

ADD REPLY • link 7.6 years ago by WouterDeCoster 48k

0

Entering edit mode

I'm not entirely sure about that one. I am helping someone out on a project. They ran samples on an Illumina HiSeq so I'm assuming they got a large file that was then demultiplexed. It looks like the barcodes have been trimmed as well. The files were transferred to my external hard drive and I then transferred the files to my computer.

I could try getting the data again from my colleague?

ADD REPLY • link 7.6 years ago by williamsbrian5064 ▴ 540

0

Entering edit mode

That's worth trying indeed.

ADD REPLY • link 7.6 years ago by WouterDeCoster 48k

0

Entering edit mode

You were right about the file being corrupt. I took it out of the command line and TopHat started working. That is nice to know when I try running HISAT2. Thanks for all the help!

ADD REPLY • link 7.6 years ago by williamsbrian5064 ▴ 540

0

Entering edit mode

I tried the HISAT, StringTie, and Ballgown method today but I got a bit stuck at the R portion of it. I can't find much about the method really. I was wondering if you had any links?

ADD REPLY • link 7.6 years ago by williamsbrian5064 ▴ 540

0

Entering edit mode

The paper contains a lot of R code, is that helpful? Or did you already check that?

ADD REPLY • link 7.6 years ago by WouterDeCoster 48k

0

Entering edit mode

I tried their R script and got to step 9 and got blocked. They even have the troubleshooting sections that identifies the same error that I'm getting (The Ballgown function results in an error that the first column of pData does not match the names of the folders containing the ballgown data). I couldn't get passed it... I felt like R studio was a bit more corporative which could have given me a bit more problems?

ADD REPLY • link 7.6 years ago by williamsbrian5064 ▴ 540

0

Entering edit mode

I would suggest opening a separate question, containing your problem, the code you used and the errors you get. Please be as complete as possible.

ADD REPLY • link 7.6 years ago by WouterDeCoster 48k

1

Entering edit mode

Try validateFiles from Kent Utilities to find out the broken fastq record.

ADD REPLY • link 7.6 years ago by GenoMax 152k

0

Entering edit mode

Does it have to do with the index file? I had to generate my own?

ADD REPLY • link 7.6 years ago by williamsbrian5064 ▴ 540

0

Entering edit mode

Hi

I am also getting similar error like this when running tophat

Error: qual length (114) differs from seq length (126) for fastq record !

Please suggest some solution. Any help is much appreciated.

Thanks

ADD REPLY • link 6.9 years ago by archana.bioinfo87 ▴ 210

1

Entering edit mode

Please do not use SUBMIT ANSWER window unless you are providing an answer to the original question.

It looks like your fastq file has at least one record which seems to be malformed (where the number of bases and Q scores don't match). I suggest that you run fastQValidator.

ADD REPLY • link 6.9 years ago by GenoMax 152k