Question: TopHat Error Qual length differs from seq length
0
gravatar for williamsbrian5064
7 months ago by
williamsbrian506460 wrote:

I am getting this error when I try to run TopHat on some sequencing data. I was wondering if anyone had any solutions to the problem?

    ./tophat -p 1 -G dmel-all-r6.18.gtf -o test.bam dmel_genome_6.18  read_1.fastq read_2.fastq



[2017-11-14 14:59:57] Beginning TopHat run (v2.1.0)

-----------------------------------------------

[2017-11-14 14:59:57] Checking for Bowtie

  Bowtie 2 not found, checking for older version..

  Bowtie version:   1.1.2.0

[2017-11-14 14:59:57] Checking for Bowtie index files (genome)..

[2017-11-14 14:59:57] Checking for reference FASTA file

Warning: Could not find FASTA file dmel_genome_6.18.fa

[2017-11-14 14:59:57] Reconstituting reference FASTA file from Bowtie index

  Executing: /Users/kmmeurs/Desktop/Programs/tophat-2.1.0.OSX_x86_64/bowtie-inspect dmel_genome_6.18 > test.bam/tmp/dmel_genome_6.18.fa

[2017-11-14 15:00:07] Generating SAM header for dmel_genome_6.18

[2017-11-14 15:00:07] Reading known junctions from GTF file

[2017-11-14 15:00:12] Preparing reads

[FAILED]

Error running 'prep_reads'

Error: qual length (95) differs from seq length (125) for fastq record !

Here is the header as well for one of the fastq files:

@HISEQ:249:C9MM3ANXX:7:1101:1733:2241 1:N:0:CTATAC

CGACAATCTTGCATGGCCGCGACTTCAGCNNNNNNNNNNNGTTTTTGCGCAATGCCGAACATTGCATGGGATAGGTCGTCGATGCGCCGGAATCCGTGGTCTCGAAATGATCGTCCAACTCAGCC

+

A=3BBGGGGGGGGGGGGGGGGDGGGGGGF###########==<EFGGEGG@GGGEDGGGGGGGCFCGGGD0ECBFGDGGGGGFGGGBGGG@AGG@CGGDEEB@D/6.C8EDEGGGD<EGGGGGGG

@HISEQ:249:C9MM3ANXX:7:1101:1803:2233 1:N:0:CTATAC

CTTAAAATAATTAATGTGTGTATTNNNNNNNNNNNNNNNNNNCACACACTAGAAATATACTTTGCCATCCATTAGGTGAAGGCCTAATCCAAGGCCTCCCTACCATGGATTGGCACAGATAAATT

+

CCCCCGGGGGGGGGGGGGGGGGGG##################===FGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGFGEFGGGGGGGGGGEGGGGGGGGGGGGGGGGFGGGGGGGGGGGDGGG

@HISEQ:249:C9MM3ANXX:7:1101:1772:2234 1:N:0:CTATAC

TTCTCCTCCTCGGAGTCGCTGTAAANNNNNNNNNNNNNNNNTGACGGCTTTTGTTTACAATCCACCTTCTTTTTAATTTCTTCCTCATTGTAACCCGGAGGTGGAACGGGGGTAAGAGAGCGCCT

docsmb17:tophat-2.1.0.OSX_x86_64 kmmeurs$ head A31P_MYBPC3_Female_1_week_CTATAC_L007_R1_C9MM3ANXX.fastq -C

==> A31P_MYBPC3_Female_1_week_CTATAC_L007_R1_C9MM3ANXX.fastq <==

@HISEQ:249:C9MM3ANXX:7:1101:1733:2241 1:N:0:CTATAC

CGACAATCTTGCATGGCCGCGACTTCAGCNNNNNNNNNNNGTTTTTGCGCAATGCCGAACATTGCATGGGATAGGTCGTCGATGCGCCGGAATCCGTGGTCTCGAAATGATCGTCCAACTCAGCC

+

A=3BBGGGGGGGGGGGGGGGGDGGGGGGF###########==<EFGGEGG@GGGEDGGGGGGGCFCGGGD0ECBFGDGGGGGFGGGBGGG@AGG@CGGDEEB@D/6.C8EDEGGGD<EGGGGGGG

@HISEQ:249:C9MM3ANXX:7:1101:1803:2233 1:N:0:CTATAC

CTTAAAATAATTAATGTGTGTATTNNNNNNNNNNNNNNNNNNCACACACTAGAAATATACTTTGCCATCCATTAGGTGAAGGCCTAATCCAAGGCCTCCCTACCATGGATTGGCACAGATAAATT

+

CCCCCGGGGGGGGGGGGGGGGGGG##################===FGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGFGEFGGGGGGGGGGEGGGGGGGGGGGGGGGGFGGGGGGGGGGGDGGG

@HISEQ:249:C9MM3ANXX:7:1101:1772:2234 1:N:0:CTATAC

TTCTCCTCCTCGGAGTCGCTGTAAANNNNNNNNNNNNNNNNTGACGGCTTTTGTTTACAATCCACCTTCTTTTTAATTTCTTCCTCATTGTAACCCGGAGGTGGAACGGGGGTAAGAGAGCGCCT

I saw another post similar to this but I couldn't figure out what they did to fix the problem (https://www.biostars.org/p/110412/). Any help would be fantastic! Thanks!!

ADD COMMENTlink written 7 months ago by williamsbrian506460
1

The error indicates that something is wrong with your fastq file.

You should know that the old 'Tuxedo' pipeline of Tophat and Cufflinks is no longer the "advisable" tool for RNA-seq analysis. The software is deprecated/ in low maintenance and should be replaced by HISAT2, StringTie and ballgown. See this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. (If you can't get access to that publication, let me know and I'll -cough- help you.) There are also other alternatives, including alignment with STAR and bbmap, or pseudo-alignment using salmon.

ADD REPLYlink modified 7 months ago • written 7 months ago by WouterDeCoster29k

Is there any way to fix the fastq file? Thanks for the advice by the way! I would have been struggling with "Tuxedo" nonsense for days.

ADD REPLYlink written 7 months ago by williamsbrian506460

You'll first have to figure out which file is corrupt, and then why. Do you have the original data available? Which steps were taken before this attempted alignment?

ADD REPLYlink written 7 months ago by WouterDeCoster29k

I'm not entirely sure about that one. I am helping someone out on a project. They ran samples on an Illumina HiSeq so I'm assuming they got a large file that was then demultiplexed. It looks like the barcodes have been trimmed as well. The files were transferred to my external hard drive and I then transferred the files to my computer.

I could try getting the data again from my colleague?

ADD REPLYlink written 7 months ago by williamsbrian506460

That's worth trying indeed.

ADD REPLYlink written 7 months ago by WouterDeCoster29k

You were right about the file being corrupt. I took it out of the command line and TopHat started working. That is nice to know when I try running HISAT2. Thanks for all the help!

ADD REPLYlink written 7 months ago by williamsbrian506460

I tried the HISAT, StringTie, and Ballgown method today but I got a bit stuck at the R portion of it. I can't find much about the method really. I was wondering if you had any links?

ADD REPLYlink written 7 months ago by williamsbrian506460

The paper contains a lot of R code, is that helpful? Or did you already check that?

ADD REPLYlink written 7 months ago by WouterDeCoster29k

I tried their R script and got to step 9 and got blocked. They even have the troubleshooting sections that identifies the same error that I'm getting (The Ballgown function results in an error that the first column of pData does not match the names of the folders containing the ballgown data). I couldn't get passed it... I felt like R studio was a bit more corporative which could have given me a bit more problems?

ADD REPLYlink written 7 months ago by williamsbrian506460

I would suggest opening a separate question, containing your problem, the code you used and the errors you get. Please be as complete as possible.

ADD REPLYlink written 7 months ago by WouterDeCoster29k
1

Try validateFiles from Kent Utilities to find out the broken fastq record.

ADD REPLYlink written 7 months ago by genomax49k

Does it have to do with the index file? I had to generate my own?

ADD REPLYlink written 7 months ago by williamsbrian506460
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1394 users visited in the last hour