trouble parsing problem filterbytile
1
0
Entering edit mode
3.6 years ago

Hello I have a problem with filterbytile : I try to analyze the data from GSE52778 I use filterbytile.sh and It seems I have a "Trouble parsing problem"

when I try : f

ilterbytile.sh in1=SRR1039508_1.fastq in2=SRR1039508_2.fastq out1=SRR1039508_1_filtre.fastq out2=SRR1039508_2_filtre.fastq

I obtain :

java -ea -Xmx6678m -Xms6678m -cp /XXXX/XXXX/bin/bbmap/current/ hiseq.AnalyzeFlowCell in1=SRR1039508_1.fastq in2=SRR1039508_2.fastq out1=SRR1039508_1_filtre.fastq out2=SRR1039508_2_filtre.fastq
Executing hiseq.AnalyzeFlowCell [in1=SRR1039508_1.fastq, in2=SRR1039508_2.fastq, out1=SRR1039508_1_filtre.fastq, out2=SRR1039508_2_filtre.fastq]

Set INTERLEAVED to false
Loading kmers:      205.432 seconds.
Filling tiles:      Trouble parsing header SRR1039508.1.1 HWI-ST177:290:C0TECACXX:1:1101:1225:2130 length=63
java.lang.AssertionError: SRR1039508.1.1 HWI-ST177:290:C0TECACXX:1:1101:1225:2130 length=63
    at hiseq.IlluminaHeaderParser.parseInt(IlluminaHeaderParser.java:149)
    at hiseq.IlluminaHeaderParser.parseCoordinates(IlluminaHeaderParser.java:71)
    at hiseq.IlluminaHeaderParser.parse(IlluminaHeaderParser.java:55)
    at hiseq.FlowCell.getMicroTile(FlowCell.java:144)
    at hiseq.AnalyzeFlowCell.fillTilesInner(AnalyzeFlowCell.java:641)
    at hiseq.AnalyzeFlowCell.fillTiles(AnalyzeFlowCell.java:380)
    at hiseq.AnalyzeFlowCell.process(AnalyzeFlowCell.java:316)
    at hiseq.AnalyzeFlowCell.main(AnalyzeFlowCell.java:51)

thanks for your help

C

software error • 893 views
ADD COMMENT
0
Entering edit mode

I did it (fatq-dump -F XXXXXXXXX) I i obtain the same thing.......

Loading kmers: 7.484 seconds.

**Filling tiles:    Trouble parsing header HWI-ST177:290:C0TECACXX:1:1101:1225:2130**

java.lang.StringIndexOutOfBoundsException: String index out of range: 40
    at java.base/java.lang.StringLatin1.charAt(StringLatin1.java:47)
    at java.base/java.lang.String.charAt(String.java:693)
    at hiseq.IlluminaHeaderParser.goBackSeveralColons(IlluminaHeaderParser.java:133)
    at hiseq.IlluminaHeaderParser.parseCoordinates(IlluminaHeaderParser.java:70)
    at hiseq.IlluminaHeaderParser.parse(IlluminaHeaderParser.java:55)
    at hiseq.FlowCell.getMicroTile(FlowCell.java:144)
    at hiseq.AnalyzeFlowCell.fillTilesInner(AnalyzeFlowCell.java:641)
    at hiseq.AnalyzeFlowCell.fillTiles(AnalyzeFlowCell.java:380)
    at hiseq.AnalyzeFlowCell.process(AnalyzeFlowCell.java:316)
    at hiseq.AnalyzeFlowCell.main(AnalyzeFlowCell.java:51)
ADD REPLY
0
Entering edit mode
3.6 years ago
GenoMax 142k

It appears that this data has been submitted to SRA stripping the index information from the header. So to get around that you should first use reformat.sh to add 1: and 2: to the fastq headers.

You can do that by

 reformat.sh addcolon=t in1=SRR1039508_1.fastq in2=SRR1039508_2.fastq out1=test1.fastq out2=test2.fastq

That will give you

@HWI-ST177:290:C0TECACXX:1:1101:2225:2087 1:
AACAAGAAGAGTTCTCTGAAAGGCAATGAGAAAGAGAAGGAGAAACAACAGCGGGAGAAGGAT
+
HJJJJJJJJJJFIIIJJJJJJJJIJJJJHJJJJJJJJJJJJJJJIJJJJIJJHHFDDDDDDDD

You can then run filterbytile.sh with these intermediate files.

filterbytile.sh in1=test1.fastq in2=test2.fastq out1=final_R1.fastq.gz out2=final_R2.fastq.gz
ADD COMMENT

Login before adding your answer.

Traffic: 3204 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6