Question: TopHat Error Qual length differs from seq length
0
gravatar for williamsbrian5064
8 days ago by
williamsbrian506440 wrote:

I am getting this error when I try to run TopHat on some sequencing data. I was wondering if anyone had any solutions to the problem?

    ./tophat -p 1 -G dmel-all-r6.18.gtf -o test.bam dmel_genome_6.18  read_1.fastq read_2.fastq



[2017-11-14 14:59:57] Beginning TopHat run (v2.1.0)

-----------------------------------------------

[2017-11-14 14:59:57] Checking for Bowtie

  Bowtie 2 not found, checking for older version..

  Bowtie version:   1.1.2.0

[2017-11-14 14:59:57] Checking for Bowtie index files (genome)..

[2017-11-14 14:59:57] Checking for reference FASTA file

Warning: Could not find FASTA file dmel_genome_6.18.fa

[2017-11-14 14:59:57] Reconstituting reference FASTA file from Bowtie index

  Executing: /Users/kmmeurs/Desktop/Programs/tophat-2.1.0.OSX_x86_64/bowtie-inspect dmel_genome_6.18 > test.bam/tmp/dmel_genome_6.18.fa

[2017-11-14 15:00:07] Generating SAM header for dmel_genome_6.18

[2017-11-14 15:00:07] Reading known junctions from GTF file

[2017-11-14 15:00:12] Preparing reads

[FAILED]

Error running 'prep_reads'

Error: qual length (95) differs from seq length (125) for fastq record !

Here is the header as well for one of the fastq files:

@HISEQ:249:C9MM3ANXX:7:1101:1733:2241 1:N:0:CTATAC

CGACAATCTTGCATGGCCGCGACTTCAGCNNNNNNNNNNNGTTTTTGCGCAATGCCGAACATTGCATGGGATAGGTCGTCGATGCGCCGGAATCCGTGGTCTCGAAATGATCGTCCAACTCAGCC

+

A=3BBGGGGGGGGGGGGGGGGDGGGGGGF###########==<EFGGEGG@GGGEDGGGGGGGCFCGGGD0ECBFGDGGGGGFGGGBGGG@AGG@CGGDEEB@D/6.C8EDEGGGD<EGGGGGGG

@HISEQ:249:C9MM3ANXX:7:1101:1803:2233 1:N:0:CTATAC

CTTAAAATAATTAATGTGTGTATTNNNNNNNNNNNNNNNNNNCACACACTAGAAATATACTTTGCCATCCATTAGGTGAAGGCCTAATCCAAGGCCTCCCTACCATGGATTGGCACAGATAAATT

+

CCCCCGGGGGGGGGGGGGGGGGGG##################===FGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGFGEFGGGGGGGGGGEGGGGGGGGGGGGGGGGFGGGGGGGGGGGDGGG

@HISEQ:249:C9MM3ANXX:7:1101:1772:2234 1:N:0:CTATAC

TTCTCCTCCTCGGAGTCGCTGTAAANNNNNNNNNNNNNNNNTGACGGCTTTTGTTTACAATCCACCTTCTTTTTAATTTCTTCCTCATTGTAACCCGGAGGTGGAACGGGGGTAAGAGAGCGCCT

docsmb17:tophat-2.1.0.OSX_x86_64 kmmeurs$ head A31P_MYBPC3_Female_1_week_CTATAC_L007_R1_C9MM3ANXX.fastq -C

==> A31P_MYBPC3_Female_1_week_CTATAC_L007_R1_C9MM3ANXX.fastq <==

@HISEQ:249:C9MM3ANXX:7:1101:1733:2241 1:N:0:CTATAC

CGACAATCTTGCATGGCCGCGACTTCAGCNNNNNNNNNNNGTTTTTGCGCAATGCCGAACATTGCATGGGATAGGTCGTCGATGCGCCGGAATCCGTGGTCTCGAAATGATCGTCCAACTCAGCC

+

A=3BBGGGGGGGGGGGGGGGGDGGGGGGF###########==<EFGGEGG@GGGEDGGGGGGGCFCGGGD0ECBFGDGGGGGFGGGBGGG@AGG@CGGDEEB@D/6.C8EDEGGGD<EGGGGGGG

@HISEQ:249:C9MM3ANXX:7:1101:1803:2233 1:N:0:CTATAC

CTTAAAATAATTAATGTGTGTATTNNNNNNNNNNNNNNNNNNCACACACTAGAAATATACTTTGCCATCCATTAGGTGAAGGCCTAATCCAAGGCCTCCCTACCATGGATTGGCACAGATAAATT

+

CCCCCGGGGGGGGGGGGGGGGGGG##################===FGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGFGEFGGGGGGGGGGEGGGGGGGGGGGGGGGGFGGGGGGGGGGGDGGG

@HISEQ:249:C9MM3ANXX:7:1101:1772:2234 1:N:0:CTATAC

TTCTCCTCCTCGGAGTCGCTGTAAANNNNNNNNNNNNNNNNTGACGGCTTTTGTTTACAATCCACCTTCTTTTTAATTTCTTCCTCATTGTAACCCGGAGGTGGAACGGGGGTAAGAGAGCGCCT

I saw another post similar to this but I couldn't figure out what they did to fix the problem (https://www.biostars.org/p/110412/). Any help would be fantastic! Thanks!!

ADD COMMENTlink written 8 days ago by williamsbrian506440
1

The error indicates that something is wrong with your fastq file.

You should know that the old 'Tuxedo' pipeline of Tophat and Cufflinks is no longer the "advisable" tool for RNA-seq analysis. The software is deprecated/ in low maintenance and should be replaced by HISAT2, StringTie and ballgown. See this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. (If you can't get access to that publication, let me know and I'll -cough- help you.) There are also other alternatives, including alignment with STAR and bbmap, or pseudo-alignment using salmon.

ADD REPLYlink modified 7 days ago • written 8 days ago by WouterDeCoster23k

Is there any way to fix the fastq file? Thanks for the advice by the way! I would have been struggling with "Tuxedo" nonsense for days.

ADD REPLYlink written 8 days ago by williamsbrian506440

You'll first have to figure out which file is corrupt, and then why. Do you have the original data available? Which steps were taken before this attempted alignment?

ADD REPLYlink written 8 days ago by WouterDeCoster23k

I'm not entirely sure about that one. I am helping someone out on a project. They ran samples on an Illumina HiSeq so I'm assuming they got a large file that was then demultiplexed. It looks like the barcodes have been trimmed as well. The files were transferred to my external hard drive and I then transferred the files to my computer.

I could try getting the data again from my colleague?

ADD REPLYlink written 8 days ago by williamsbrian506440

That's worth trying indeed.

ADD REPLYlink written 8 days ago by WouterDeCoster23k

You were right about the file being corrupt. I took it out of the command line and TopHat started working. That is nice to know when I try running HISAT2. Thanks for all the help!

ADD REPLYlink written 8 days ago by williamsbrian506440

I tried the HISAT, StringTie, and Ballgown method today but I got a bit stuck at the R portion of it. I can't find much about the method really. I was wondering if you had any links?

ADD REPLYlink written 7 days ago by williamsbrian506440

The paper contains a lot of R code, is that helpful? Or did you already check that?

ADD REPLYlink written 7 days ago by WouterDeCoster23k

I tried their R script and got to step 9 and got blocked. They even have the troubleshooting sections that identifies the same error that I'm getting (The Ballgown function results in an error that the first column of pData does not match the names of the folders containing the ballgown data). I couldn't get passed it... I felt like R studio was a bit more corporative which could have given me a bit more problems?

ADD REPLYlink written 7 days ago by williamsbrian506440

I would suggest opening a separate question, containing your problem, the code you used and the errors you get. Please be as complete as possible.

ADD REPLYlink written 7 days ago by WouterDeCoster23k
1

Try validateFiles from Kent Utilities to find out the broken fastq record.

ADD REPLYlink written 8 days ago by genomax37k

Does it have to do with the index file? I had to generate my own?

ADD REPLYlink written 8 days ago by williamsbrian506440
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1343 users visited in the last hour