Question: Tophat qual length error.
0
gravatar for Veli Kaan Aydin
4.3 years ago by
Turkey
Veli Kaan Aydin40 wrote:

Hello everyone;

 

I'm trying to rna-seq analysis for Illumina Genome Analyzer (Homo sapiens) and here fastq file ; 

@SRR408754.1 HWI-EAS19:6:1:2:223 length=50

GCAAACCAGAGCTCAGAAAAAAGGGACATCCAGCAGTGGTCATTCGGCAA

+

BB<ACCBA@9@@@@=@==<.872='==>==9-:#################

@SRR408754.2 HWI-EAS19:6:1:3:199 length=50

AAAAACTGGACATTTGCAGGGAAAACACTGACTCTATACTTTTTTTTGGA

+

BBBAB@=@@=5=:<9/:<26@@BAB6;7>#####################

@SRR408754.3 HWI-EAS19:6:1:5:107 length=50

GAATAACCGAATTAATGCATTCAAAGGCACATAGAACACATAAACAGGGG

+

BBBBAB<?>@A?@?>-6*9<=9@=8-7/<66%5,################

@SRR408754.4 HWI-EAS19:6:1:9:352 length=50

CTGAAACCAGCTAGATGAGCATGTCCTTTAGATGCCCAAGCCCCCCCCGT

+

A=6?A@CA5=B#######################################

@SRR408754.5 HWI-EAS19:6:1:9:1065 length=50

GAAAGCCTTAAATTTGAAGACACATTGAGGGGGGACGAGAGACAAGGGAA

+

B6AA=59:>9>@23@=*>5<23############################

@SRR408754.6 HWI-EAS19:6:1:9:1013 length=50

GTGGGTTAGCCACTAAATGGCTTCAGTGAGAATTCTGAAAACTGGTCCAG

+

>3AB?2?B@<>A>>>C@>398'185101<=.;14<=59BB@#########

@SRR408754.7 HWI-EAS19:6:1:10:106 length=50

GAGAACCATATGTCTTGCATTCAGAGATTTTCCATCGGGTGGACCAGGAA

+

BBBBBA?A8:BA;>@B=75::-5@=:9#######################

@SRR408754.8 HWI-EAS19:6:1:10:142 length=50

GCTTCTTCATGTATGTAACAGCATATTAAACTGGAGACAGTGATGTATCA

+

?BCCBBACB?;;=A:;@@>14>@B@BB<1;>:4>>8=:8<##########

@SRR408754.9 HWI-EAS19:6:1:10:70 length=50

AAGCCGTACAAGTCCACCTCATCGCCGCTGCACACGAAGGTCCGTCATGT

+

A>8@@5;A?@A>;8;69=################################

@SRR408754.10 HWI-EAS19:6:1:10:1270 length=50

AAAAAATTGTAAAATAAGTATTTCAACATGGAATTAACAATATTGTTATC

+

BBBCBBBBCA@BC>2>B?;@B:-?8>><28/@@3@@A;;;<@@@A1=80/

@SRR408754.11 HWI-EAS19:6:1:11:1789 length=50

GTAAACCCCAAAGAGTTGGAAGGTAGGTTAAAAGGAGGGTTTGGGGGCCC

+

>:AB?::<<ABBA@@1@>?7?@=,.+/7;65###################

@SRR408754.12 HWI-EAS19:6:1:11:126 length=50

GCCAGGGCCAACTTCTCCTTCTCAAATGCACCGCCCTTCTTTGTACACGT

+

BC@=:@>A?@;>:=70:@5@B=2:257A@*;;%/=###############

@SRR408754.13 HWI-EAS19:6:1:12:508 length=50

GAAAAATGGAAATAAATGGAAAGACCAGAATATTGAAGATCACTACAAAA

+

A:9@A<BB@<@>==B?BAB;@4>96=:>==####################

@SRR408754.14 HWI-EAS19:6:1:12:1312 length=50

AAAAAGCACAGTATTGTAGAGACACGCACAATATGGAATTGTTATTTTTG

+

B@CABBABB:@5:;A###################################

@SRR408754.15 HWI-EAS19:6:1:12:1574 length=50

CGAACATGACCCAATCACAAAAAAAAAAGAAGGGTGATGGGACCGAACCA

+

@BB>B?>==:1>);@6@0?;=;?<3/3/%<(93@*2<%88>1=#######

@SRR408754.16 HWI-EAS19:6:1:12:86 length=50

GGAGAAGGAAAATCCAGGGGGTGGGGGGTCTGTGTTGCAACTGGGGTGAA

+

A69B;79@1>@B?9,27%01>.AA@<########################

@SRR408754.17 HWI-EAS19:6:1:12:429 length=50

TGATTATGCATTATGAGAGGCTTAAAATAGAAAGCGTCAAAATATTGATT

+

BB;@BB>>:@C@=>A:>:81;?=0##########################

@SRR408754.18 HWI-EAS19:6:1:12:2017 length=50

GAGAAAAGGAGAAAGGACACTTTTTTCTTGTTTTGTTGGTGTCGTTTTGT

+

B?B?@?>AB>B???####################################

@SRR408754.19 HWI-EAS19:6:1:12:282 length=50

AACATACATTAAGTGGCACTAATTACACATTAACTATAAGGTAACTAAAA

+

:?CCB=@?C@>>>>@@<4@A59############################

@SRR408754.20 HWI-EAS19:6:1:12:676 length=50

GACAATTTTAATCCTGCATTTTCCTGGAGTAACATCCACTGGTGTGTAGT

+

BA?BA4@A<7;;=A7(79;A##############################

 

 

Command and error is here ;

 

veli@kaanaydin:~/yeni/PMA_stim$ tophat --bowtie1 -p 8 -G genes.gtf -o SRR408754 h_sapiens_asm SRR408754.fastq

 

[2014-08-25 09:46:32] Beginning TopHat run (v2.0.12)

-----------------------------------------------

[2014-08-25 09:46:32] Checking for Bowtie

  Bowtie version:  1.1.0.0

[2014-08-25 09:46:32] Checking for Samtools

Samtools version:  0.1.19.0

[2014-08-25 09:46:32] Checking for Bowtie index files (genome)..

[2014-08-25 09:46:32] Checking for reference FASTA file

Warning: Could not find FASTA file h_sapiens_asm.fa

[2014-08-25 09:46:32] Reconstituting reference FASTA file from Bowtie index

  Executing: /home/veli/calisma/bwt1/bowtie-inspect h_sapiens_asm > SRR408754/tmp/h_sapiens_asm.fa

[2014-08-25 09:48:52] Generating SAM header for h_sapiens_asm

[2014-08-25 09:49:30] Reading known junctions from GTF file

[2014-08-25 09:49:34] Preparing reads

[FAILED]

Error running 'prep_reads'

Error: qual length (49) differs from seq length (50) for fastq record !

 

Do you know what I need to do ? 

rna-seq tophat • 3.9k views
ADD COMMENTlink written 4.3 years ago by Veli Kaan Aydin40

I'm going to just download that dataset and have a look, since it seems to be generally problematic. Am I correct in assuming that you didn't perform any sort of trimming?

ADD REPLYlink written 4.3 years ago by Devon Ryan86k

When I got fastq-dump it was like this:

@SRR408754.1 HWI-EAS19:6:1:2:223 length=50
GCAAACCAGAGCTCAGAAAAAAGGGACATCCAGCAGTGGTCATTCGGCAA
+SRR408754.1 HWI-EAS19:6:1:2:223 length=50
BB<ACCBA@9@@@@=@==<.872='==>==9-:#################
@SRR408754.2 HWI-EAS19:6:1:3:199 length=50
AAAAACTGGACATTTGCAGGGAAAACACTGACTCTATACTTTTTTTTGGA
+SRR408754.2 HWI-EAS19:6:1:3:199 length=50
BBBAB@=@@=5=:<9/:<26@@BAB6;7>#####################
@SRR408754.3 HWI-EAS19:6:1:5:107 length=50
GAATAACCGAATTAATGCATTCAAAGGCACATAGAACACATAAACAGGGG
+SRR408754.3 HWI-EAS19:6:1:5:107 length=50
BBBBAB<?>@A?@?>-6*9<=9@=8-7/<66%5,################
@SRR408754.4 HWI-EAS19:6:1:9:352 length=50
CTGAAACCAGCTAGATGAGCATGTCCTTTAGATGCCCAAGCCCCCCCCGT
+SRR408754.4 HWI-EAS19:6:1:9:352 length=50
A=6?A@CA5=B#######################################
@SRR408754.5 HWI-EAS19:6:1:9:1065 length=50
GAAAGCCTTAAATTTGAAGACACATTGAGGGGGGACGAGAGACAAGGGAA
+SRR408754.5 HWI-EAS19:6:1:9:1065 length=50
B6AA=59:>9>@23@=*>5<23############################
@SRR408754.6 HWI-EAS19:6:1:9:1013 length=50
GTGGGTTAGCCACTAAATGGCTTCAGTGAGAATTCTGAAAACTGGTCCAG
+SRR408754.6 HWI-EAS19:6:1:9:1013 length=50
>3AB?2?B@<>A>>>C@>398'185101<=.;14<=59BB@#########
@SRR408754.7 HWI-EAS19:6:1:10:106 length=50
GAGAACCATATGTCTTGCATTCAGAGATTTTCCATCGGGTGGACCAGGAA
+SRR408754.7 HWI-EAS19:6:1:10:106 length=50
BBBBBA?A8:BA;>@B=75::-5@=:9#######################
@SRR408754.8 HWI-EAS19:6:1:10:142 length=50

 

 

And I just clean the part after the "+"

 

ADD REPLYlink written 4.3 years ago by Veli Kaan Aydin40

What did you do to "clean" that part?

ADD REPLYlink written 4.3 years ago by Devon Ryan86k

I find and change a script ; 


from tempfile import mkstemp
from shutil import move
from os import remove, close

def replace(file_path):
    fh, abs_path = mkstemp()
    new_file = open(abs_path,'w')
    old_file = open(file_path)
    for line in old_file:
        if line[0] == "+":
            new_file.write("+" + "\n")
        else:
            new_file.write(line)
            
    new_file.close()
    close(fh)
    old_file.close()
    
    remove(file_path)
    
    move(abs_path, file_path)

ADD REPLYlink written 4.3 years ago by Veli Kaan Aydin40

Ah, that's probably the problem then. What that script does is look for lines that start with '+' and replace then whole line with '+'. While that will certainly get rid of silly lines like "+SRR408754.4 HWI-EAS19:6:1:9:352 length=50", it'll also mess up any of the QUAL lines that start with '+' (there are 6410 such lines in the file).

I can say that if you don't attempt to clean the file like that (i.e., just fastq-dump --split-3 -F SRR408754.sra and then use the resulting fastq file) then things will work (I just tried, though I had to tweak the tophat2 source code since it doesn't support the most recent version of samtools).

ADD REPLYlink written 4.3 years ago by Devon Ryan86k

Thank you for your answer but I'm still getting this;

Error: qual length (50) differs from seq length (43) for fastq record HWI-EAS19:2:26:1681:1147!

ADD REPLYlink written 4.3 years ago by Veli Kaan Aydin40

What's the output of grep -A 3 "HWI-EAS19:2:26:1681:1147" SRR408754.fastq?

ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by Devon Ryan86k

By the way I just started with first run which means SRR408751

@HWI-EAS19:2:26:1681:1147
TTCGACTCTGCCGTTCCTTGCGCACCACCTCCTCCTCCTCCTGCGCTTC
+HWI-EAS19:2:26:1681:1147
B4=55=;9/;3955/3/533431&8;355523535###############
@HWI-EAS19:2:26:1681:1153
CGATTACAGAACAGGCTCCTCTAGAGGGGTATGAAGCACCGCCAGGTCCT

and also grep for Error: qual length (84) differs from seq length (50) for fastq record HWI-EAS19:6:93:226:1264!

 

 grep -A 3 "HWI-EAS19:6:93:226:1264" SRR408754.fastq

@HWI-EAS19:6:93:226:1264
CGGAAGGCCATGCCAGTGAGCTTCCCGTTCAGCTCAGGGATGACCTTGCC
+HWI-EAS19:6:93:226:1264
A@@A?=@@AA<A=1=3A=51@?@??A??<;7?A=>=??<?=;?==;/?6
@HWI-EAS19:6:93:226:1052
GTTTGATTATCTTTAATACCACAAGGCCATCTATCTGCACTTGCTTCACG

 

Devon I did not do anything else just straight by the way I can not add new post (5 times rule) 

 

ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by Veli Kaan Aydin40

Are those straight from fastq-dump or did you preprocess them at all?

ADD REPLYlink written 4.3 years ago by Devon Ryan86k

Ok I just tried again and again I guess it works now ;

because it still writing like this;

[2014-08-26 22:12:31] Retrieving sequences for splices (for nearly 14 hours)

 

ADD REPLYlink written 4.3 years ago by Veli Kaan Aydin40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1291 users visited in the last hour