Question: problem about SHRiMP2 solving AB SOLID
0
gravatar for laihui126cn
4.6 years ago by
laihui126cn40
Australia
laihui126cn40 wrote:

Hi, everyone. I have ask this question on SEQanswers forum.

I want to do the mapping about AB SOLID data. They are paired-end sequencing. The format of .fastq files show below:
pair1:
@SRR586064.1ugc_357_358_MatePair_2x50bp_solid0032_20100528_MP_ugc_357_854_13_97/1
T30..01121.12.1032100213131122200031222022101302313
+
!AB!!?:>@<!@B!;AB?AA@@<2<@?@@<?AB>A?9?:;@@>;-=>>7=@
pair2:
@SRR586064.1ugc_357_358_MatePair_2x50bp_solid0032_20100528_MP_ugc_357_854_13_97/2
G10330000122222033220201000000220002000000000000000
+
!@BBABBBB@>?@@(.))35.%-.3((%1+%((82-'.*-*/3*14*'696
I try use the tool:SHRiMP2, but it is not friendly to solve my datasets. I run the command:
gmapper-cs -1 pair1.fastq -2 pair2.fastq -L ./test/Sscro --pair-mode opp-in -Q --trim-first --qv-offset 33 >pair12.sam 2>pair12.log
And I got the info in the end of file .log:
note: detected fastq format in input file [pair1.fastq]
- Processing read files [pair1.fastq , pair2.fastq]
note: quality value format not set explicitly; using PHRED+33
done r/hr r/core-hr
There has been a problem reading in the read "SRR586064.1", the quality length exceeds the sequence length!
Are you using the right executable? gmapper-cs for color space? and gmapper-ls for letter space?

I have tried some parameters of SHRiMP2, but failed. And I can't find any similar example from SHRiMP2 website. Does anybody get the same problem and have you solve it ? Or maybe I should use another tool, any suggestion?

cs shrimp2 alignment fastq solid • 1.9k views
ADD COMMENTlink modified 4.5 years ago • written 4.6 years ago by laihui126cn40

grep -A4 SRR586064.1 (I think its the very fast read)  from your fastq file and paste here. The error is pretty obvious and is due to inconsistency in length between the read sequence and its quality score string. Did you trim your reads before using some trimming tool ? Or may be you are not providing the arguments in the right manner. For example, you used --trim-first but didnt mention any integer value. Please read the manual carefully and try again.  

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by Ashutosh Pandey11k

Thank you ,Pandey. I have solved this problem. Have you ever used the tool SHRiMP2? I have no idea to understand his arguments, and there is no example in SHRiMP2 README.

Could you help me?

ADD REPLYlink written 4.6 years ago by laihui126cn40

Ya i have used it a lot when I used to work with color space reads. If you have no idea about the arguments I would advise you to go with default arguments. They have been well-tested  and should produce optimal results unless you are trying to do something really different. 

ADD REPLYlink written 4.6 years ago by Ashutosh Pandey11k

So happy to hear from you. I have tried default argumets, and got the terrible result: reads matched(17.4191%). I used the command:

gmapper-cs SRR_test.fastq --load-mmap /Ssc_mmap --local -Q -N 15 --all-contigs --sigle-best-mapping >SRR_test.sam 2>SRR_test.log 

Another better result:reads matched (54.3101%)   I used the command:

gmapper-cs SRR_test.fastq --load-mmap /Ssc_mmap --local -Q -N 15 -r 45% -v 20% -h 20% --all-contigs --sigle-best-mapping >SRR_test.sam 2>SRR_test.log

Can you give me some suggestion according to my command? Thanks.

ADD REPLYlink written 4.6 years ago by laihui126cn40

hi, I'm using gmapper-cs in SHRiMP 2.2.3, and facing the same problem "the quality length exceeds the sequence length".

I only use the default parameter "-N 28 -E" but it still raising this error.

How you successfully solved this problem?

Thanks~

ADD REPLYlink written 4.6 years ago by jfhuang.dg30
maybe my script could help u. I trimed the first QV ,but the first base isn't changing.you can write a script ,or I give it to u tomorrow morning. sorry to reply by phone
ADD REPLYlink written 4.6 years ago by laihui126cn40
my($file1,$file2)=@ARGV; 
open IN,"<$file1" or die "cant't open the input file";
open OUT,">>$file2" or die "Please enter the output filename";
while(<IN>){
    print OUT;
    $_=<IN>;
    print OUT;
    $_=<IN>;
    print OUT;
    $_=<IN>;
    my $cutTheFirstSubstr=substr($_,1);
    print OUT "$cutTheFirstSubstr";
    }
close IN;
close OUT;
 
ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by laihui126cn40

Thank you very much and sorry to reply late, I understand now.

But why you choose to trim the first QV but not the last? Will the mapping process make use of the QV ?

ADD REPLYlink written 4.5 years ago by jfhuang.dg30

I could let you know that my cs-fastq file is looks like this:

@SRR_2x50bp_solid0032_20100528_MP_ugc_357_854_13_526/1
T11..02300.23.2232000120031323333021202133212323310
+
!%(!!;%&@1!'/!%7%&A?;&(792)'27@'%.@<%%4+<)'(),%%&*-

@SRR_MatePair_2x50bp_solid0032_20100528_MP_ugc_357_854_13_174/1
T12..11220.00.0210230201212023232220021300333211031
+
!AB!!@BBAB!>;!AA@?@B>AABA7>BBB:@AB?>>@?A>>ABB=AB6@;
@SRR_MatePair_2x50bp_solid0032_20100528_MP_ugc_357_854_13_207/1
T30..01011.01.0103220033000022030032232200322022112
+
!B9!!;>5+5!48!4:,??%%+=;6<4%>;2@&%<?)81=+5%)@92?&1=
@SRR_MatePair_2x50bp_solid0032_20100528_MP_ugc_357_854_13_282/1
T21..12213.31.1321000232210220312000223220220031003
+
!%%!!&*(&%!,&!&''&(('&&)'%*)&&()((&(*()%'&('&('(*&&

Significantly,all the first QV is "!" corresponding the first nucleic-color base.The character '!' represents the lowest quality. So this may be a symbol to differ from others.

ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by laihui126cn40

Um,I understand. Differentially, in my files, the sequences are longer than QV, so I also trim them from tail. SOLID format is not intuitive.
 

ADD REPLYlink written 4.5 years ago by jfhuang.dg30
0
gravatar for laihui126cn
4.5 years ago by
laihui126cn40
Australia
laihui126cn40 wrote:

Could you paste some reads of yours? That is easy to see if our SOLID reads are similar.

ADD COMMENTlink written 4.5 years ago by laihui126cn40

I am having similar problems. I have downloaded the fastq files from dbGaP SRAtools.  

This is what my reads look like:  

@SRR957099.1 1_14_37 length=50
T201..133...01....1....2...........................
+SRR957099.1 1_14_37 length=50
!@@@!!@@@!!!@@!!!!@!!!!@!!!!!!!!!!!!!!!!!!!!!!!!!!!
@SRR957099.2 1_14_50 length=50
T311..320...00....0....2...........................
+SRR957099.2 1_14_50 length=50
!@@@!!@@@!!!@@!!!!@!!!!@!!!!!!!!!!!!!!!!!!!!!!!!!!!

 

 

Do I need to take out the "!" at the beginning of each QV line?

 

ADD REPLYlink written 4.4 years ago by lkmklsmn870

maybe you need to filter your csfasta and qual files using the solid pipeline preprocessing (see QC pipeline https://github.com/fls-bioinformatics-core/genomics), the same convert your files in fastq.

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by sanrrone20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 824 users visited in the last hour